CN112257459A

CN112257459A - Language translation model training method, translation method, device and electronic equipment

Info

Publication number: CN112257459A
Application number: CN202011114707.7A
Authority: CN
Inventors: 孙泽维; 王明轩; 李磊
Original assignee: Beijing Youzhuju Network Technology Co Ltd
Current assignee: Beijing Youzhuju Network Technology Co Ltd
Priority date: 2020-10-16
Filing date: 2020-10-16
Publication date: 2021-01-22
Anticipated expiration: 2040-10-16
Also published as: CN112257459B

Abstract

The embodiment of the disclosure discloses a training method, a translation method and a device of a language translation model and electronic equipment. One specific implementation of the method for training the language translation model comprises the following steps: acquiring a training sample set, wherein the training sample set comprises a plurality of training samples; for each training sample, determining a first set of the training sample, wherein the first set comprises the training sample and at least one splitting result, and the splitting result is obtained by splitting the training sample; and training the initial language training model by using the plurality of training samples to the corresponding first set to obtain a trained language translation model. Because the first set of training samples comprises the training samples and the splitting results of the training samples, when the language translation model obtained by the method is used for translating the documents, the accuracy and consistency of the obtained translated documents are high.

Description

Language translation model training method, translation method, device and electronic equipment

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a training method, a translation method, an apparatus, and an electronic device for a language translation model.

Background

Translation is the conversion of a carrier of information from one language to another. Early translation was performed manually.

With the development of computer technology and artificial intelligence, machine translation has been widely used. A user may enter a piece of a document described in one language into a machine translation program to obtain the piece of the document described in another language.

Disclosure of Invention

This disclosure is provided to introduce concepts in a simplified form that are further described below in the detailed description. This disclosure is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

The embodiment of the disclosure provides a training method, a translation method and a device of a language translation model and electronic equipment.

In a first aspect, an embodiment of the present disclosure provides a method for training a language translation model, where the method includes: acquiring a training sample set, wherein the training sample set comprises a plurality of training samples; for each training sample, determining a first set of the training sample, wherein the first set comprises the training sample and at least one splitting result, and the splitting result is obtained by splitting the training sample; and training the initial language training model by using the first set corresponding to each of the plurality of samples to obtain a trained language translation model.

In a second aspect, an embodiment of the present disclosure provides a translation method, including: receiving a text to be translated expressed in a first language, wherein the text to be translated comprises at least one sentence; inputting the text to be translated into a trained language translation model for translation to obtain a translation result text expressed by a second language, wherein the language translation model is obtained by training by using the method of the first aspect.

In a third aspect, an embodiment of the present disclosure provides a training apparatus for a language translation model, including: an obtaining unit, configured to obtain a training sample set, where the training sample set includes a plurality of training samples; a determining unit, configured to determine, for each training sample, a first set of the training sample, where the first set includes the training sample and at least one splitting result, and the splitting result is obtained by splitting the training sample; and the training unit is used for training the initial language training model by using the first set corresponding to the plurality of samples to obtain a trained language translation model.

In a fourth aspect, an embodiment of the present disclosure provides a translation apparatus, including: the translation device comprises a receiving unit, a translation unit and a translation unit, wherein the receiving unit is used for receiving a text to be translated expressed in a first language, and the text to be translated comprises at least one sentence; and the translation unit is used for inputting the text to be translated into the trained language translation model for translation to obtain a translation result text expressed by a second language, wherein the language translation model is obtained by using the device of the third aspect for training.

In a fifth aspect, an embodiment of the present disclosure provides an electronic device, including: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method for training a language translation model according to the first aspect or the method for translating according to the second aspect.

In a sixth aspect, the present disclosure provides a computer readable medium, on which a computer program is stored, which when executed by a processor, implements the method for training a language translation model according to the first aspect, or the method for translating according to the second aspect.

According to the training method, the translation method and the device for the language translation model and the electronic equipment, a training sample set is obtained, and the training sample set comprises a plurality of training samples; for each training sample, determining a first set of the training sample, wherein the first set comprises the training sample and at least one splitting result, and the splitting result is obtained by splitting the training sample; the initial language training model is trained by using the corresponding first sets of the multiple samples to obtain the trained language translation model, and because the first sets of the training samples comprise the splitting results of the training samples and the training samples, at least one splitting result and the splitting results and the training samples have context, context and other incidence relations. Compared with the mode of training the model by using sentence unit training expectation, the model obtained by the training method provided by the scheme can identify the context, context and other relevant information. When the language translation model obtained by the method is used for translating the document, the accuracy and consistency of the obtained translated document are high. In addition, the language translation model is easier to learn when training the language translation model using the methods provided by the present disclosure, as compared to training the language recognition model using only longer documents.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale.

FIG. 1 is a flow diagram of some embodiments of a training method of a language translation model according to the present disclosure;

FIG. 2 is a flow diagram of some embodiments of a translation method according to the present disclosure;

FIG. 3 is a schematic block diagram of some embodiments of a training apparatus for a language translation model according to the present disclosure;

FIG. 4 is a schematic structural diagram of some embodiments of a translation device according to the present disclosure;

FIG. 5 is an exemplary system architecture to which a training method of a language translation model or a translation method provided in accordance with an embodiment of the present disclosure may be applied;

fig. 6 is a schematic diagram of a basic structure of an electronic device provided according to an embodiment of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

In the related technical scheme, when the whole document is translated, the translation of the whole document is realized by taking a sentence as a basic unit. In the scheme, the context information of the document is difficult to utilize in the translation, so that the quality of the obtained translation is not too high.

Referring to FIG. 1, a flow diagram of one embodiment of a method for training a language translation model according to the present disclosure is shown. As shown in fig. 1, the method for training the language translation model includes the following steps:

step 101, a training sample set is obtained, wherein the training sample set comprises a plurality of training samples.

The training samples may include source language text and target language text corresponding to the source language text. The target language text may be translated manually from the source language text. The source and target languages may be different languages.

The set of training samples may be pre-generated. The training sample set may be stored locally, and the execution subject of the training method of the language translation model may obtain the training sample set locally. The training sample set may also be stored in other electronic devices or storage media, and the execution subject of the training method for the language translation model may obtain the training sample set from other electronic devices or storage media through a communication connection.

Step 102, for each training sample, determining a first set of the training sample, where the first set includes the training sample and at least one splitting result, and the splitting result is obtained by splitting the training sample.

Each split result herein may include at least one statement of the training sample.

The first set includes a training sample and at least one split result of the training sample.

In some optional implementations of the present embodiment, each splitting result is obtained by splitting the training sample. The splitting result can be obtained based on the following steps:

firstly, determining the number of splitting segments corresponding to the splitting.

And finally, splitting the training sample according to the number of the splitting segments to obtain a splitting result.

The number of the split segments can be a number preset by a user.

In some application scenarios, a user may set rules that determine the number of split segments. The principal performing the training method of the language translation model may determine the number of split segments according to the rules described above.

In these alternative implementations, the language translation model is trained using the training samples and the split results obtained by splitting the training samples, and during the training process, the language translation model can learn the consistency of word translations in the training data according to the whole training samples and the split results of the training samples, add words omitted from the source training samples to the translation text according to the context, determine the tense according to the context, and the like. The language translation model obtained by training can obtain a better translation result for a longer text.

In some alternative implementations, the training sample may include N sentences. Wherein N is an integer of 1 or more. That is, the training sample may include a single sentence, or the training sample may include more than two sentences.

The training sample may correspond to a plurality of split results. The mth split result may be characterized in terms of the mth split number. The mth splitting result can be obtained according to the following steps:

firstly, determining the number of the splitting segments corresponding to the splitting as 2^M。

Secondly, the training sample is split into 2 evenly^MObtaining the M splitting result; wherein N is an integer of 1 or more, M is an integer of 0 or more, and 2^MIs less than or equal to N.

M is an integer of 0 or more, and 2^MIs less than or equal to N.

That is, the training samples of the N sentences may be split M times to obtain M split results.

As an illustrative illustration, for a training sample with 16 sentences. The number of splitting segments corresponding to the first splitting is 2¹2 segments. The training sample may be split into first split results that each include 8 statements.

For the second split, the corresponding number of split stages is 2²4. The training sample may be split into second split results that each include 4 statements.

For the third split, the corresponding number of split stages is 2³8. The training sample may be split into third split results that each include 2 statements.

For the fourth split, the corresponding number of split segments is 2⁴16. The training sample may be split into fourth split results that each include 1 statement.

After four splits, the splitting of the training sample is completed.

For the above scheme, each time the split paragraph is determined, N/2 is used^MThe number of paragraphs for that split is determined. When according to N/2^MWhen the paragraph cannot be divided completely, the divided paragraph can be the quotient of the above division formula. As an illustrative illustration, when N is 17, the first split can be split into 2 segments. The sentence corresponding to each segment may be 8 sentences and 9 sentences. The second split can be split into 4 segments. The sentence corresponding to each segment may be 4 sentences, 5 sentences, and so on.

It will be appreciated that the training samples described above include source language text and target language text corresponding to the source language text. Correspondingly, the split result also comprises a split result of the source language and a split result of the target language.

Further optionally, each of the at least one split result includes at least two split sub-results. In these alternative implementations, the determining the first set of training samples includes:

and performing mixed arrangement on the training sample and at least two splitting sub-results corresponding to the at least one splitting result to obtain a first set.

As a schematic illustration, for training sample a with 8 sentences. The first split result includes 2 first split results a1 and a2 each including 4 statements. The second split result includes 4 second split results A3, a4, a5, and a6 each including 2 statements. The third split result includes 8 third split results a7, A8, a9, a10, a11, a12, a13, and a14, each of which includes 1 statement.

In the first group, A, A1, a2, A3, a4, a5, a7, A8, a9, a10, a11, a12, a13, and a14 may be mixed and arranged. As an example, the obtained shuffling result may include, for example, A, A1, A8, A3, a12, a5, a7, a2, a9, a10, a11, a4, a13, a14, and the like. That is, in the first set, the split sub-results of the training samples may be discontinuous.

The language translation model may be trained using each training data in the first set in turn.

In these alternative implementations, the training samples are split multiple times. The last split result includes a statement. When the language translation model is trained by using the training sample, the language translation model obtained after training can obtain a translation result of a document of a longer sentence with higher quality, and can also realize higher accurate translation of the shorter sentence.

And 103, training the initial language training model by using a plurality of training samples to the corresponding first set to obtain a trained language translation model.

For each training sample, the initial language training model may be trained using the first set to which the sample corresponds.

For each first set, the initial language model may be trained using the training samples in the first set and the splitting results of the training samples, respectively.

In the training process, a source language training sample in the training samples is used as a training input sample and is input to the input end of the initial language model, and a target language training sample is used as a target and is output.

In some application scenarios, after a preset round of training the initial language translation model, the training of the initial language translation model may be stopped to obtain the trained language translation model. The number of the above-mentioned rounds may be 1000, 10000, etc.

In other application scenarios, a loss function may be set. The output of each training may be compared to a target output. And determining whether the value of the loss function meets a preset condition according to the comparison result. When the value of the loss function satisfies a preset condition, the training of the initial language translation model may be stopped. The loss function here may be, for example, a mean square error function, a mean absolute value error function, a cross loss entropy function, or the like. It should be noted that the loss function is a well-known technology widely studied and applied at present, and is not described herein in detail.

The language translation model may be any machine learning model, such as a neural network model, a deep learning model, and the like.

The above embodiments of the present disclosure provide a training method of a language translation model, where a training sample set is obtained, where the training sample set includes a plurality of training samples; for each training sample, determining a first set of the training sample, wherein the first set comprises the training sample and at least one splitting result, and the splitting result is obtained by splitting the training sample; and training the initial language training model by using the plurality of training samples to the corresponding first set to obtain a trained language translation model. Since the first set of training samples includes the training samples and the split results of the training samples, there are context, and other association relationships between at least one split result and between the split result and the training samples. Compared with the mode of training the model by using sentence unit training expectation, the model obtained by the training method provided by the scheme can identify the context, context and other relevant information. When the language translation model obtained by the method is used for translating the document, the accuracy and consistency of the obtained translated document are high. In addition, the language translation model is easier to learn when training the language translation model using the methods provided by the present disclosure, as compared to training the language recognition model using only longer documents.

In some optional implementations of this embodiment, the training sample set may further include a verification sample. After the initial language model is trained by using a plurality of training samples in the training sample set, the method for training the language translation model may further include the following steps:

firstly, using a verification sample to verify a language training model trained by a plurality of training samples;

secondly, in response to the verification result meeting the preset condition, the training of the language translation model is ended.

In these alternative implementations, the preset condition may be, for example, whether the degree of overlap between the output text of the language translation model and the target output text is greater than a preset threshold.

And if the verification result does not meet the preset condition, continuing to train the language translation model by using the first set of training samples in the training sample set until the verification result meets the preset condition.

In these alternative implementations, the language translation model trained by the training samples is validated using the validation samples to determine whether to end the training. The method can avoid the situation that the language translation model only obtains higher accuracy for the training sample, and can improve the applicability of the language translation model.

Continuing to refer to FIG. 2, a flow of one embodiment of a translation method according to the present disclosure is shown. The translation method comprises the following steps:

step 201, receiving a text to be translated expressed in a first language, wherein the text to be translated comprises at least one sentence.

The text to be translated may include one sentence, two sentences, or three or more sentences.

That is, the text to be translated may be a short text or a long text.

Step 202, inputting the text to be translated into the trained language translation model for translation to obtain a translation result text expressed by the second language.

The language translation model can be obtained by training the language translation model shown in fig. 1.

The first language may be a language corresponding to the source language text in the language translation model training sample, and the second language may be a language corresponding to the target language sample.

In the translation method provided in this embodiment, since the translation text is translated by using the language translation model obtained by the training method of the language translation model shown in fig. 1, a translation result with higher accuracy can be obtained.

Referring to fig. 3, a schematic structural diagram of some embodiments of a training apparatus for a language translation model according to the present disclosure is shown.

As shown in fig. 3, the training apparatus for a language translation model includes: an acquisition unit 301, a determination unit 302 and a training unit 303. The acquiring unit 301 is configured to acquire a training sample set, where the training sample set includes a plurality of training samples; a determining unit 302, configured to determine, for each training sample, a first set of the training sample, where the first set includes the training sample and at least one splitting result, and the splitting result is obtained by splitting the training sample; the training unit 303 is configured to train the initial language training model with the first set corresponding to each of the plurality of samples to obtain a trained language translation model.

In some optional implementation manners, the training device of the language translation model further includes a splitting unit (not shown in the figure), where the splitting unit is configured to split the training sample at least once to obtain a splitting result. For each splitting result, splitting the result by a splitting unit based on the following steps: determining the number of splitting segments corresponding to the splitting; and splitting the training sample according to the number of the splitting segments to obtain a splitting result.

In some optional implementations, the training sample includes N sentences; the splitting unit is further used for splitting the training sample for the Mth time according to the following steps to obtain an Mth splitting result: determining the number of the splitting segments corresponding to the splitting to be 2^M(ii) a The training sample is evenly split into 2^MObtaining the M splitting result; wherein N is an integer of 1 or more, M is an integer of 0 or more, and 2^MIs less than or equal to N.

In some optional implementations, each split result of the at least one split result includes at least two split sub-results; the determining unit 302 is further configured to: and performing mixed arrangement on the training sample and at least two splitting sub-results corresponding to the at least one splitting result to obtain the first set.

In some optional implementations, the training samples include training input samples and training output samples, the training input samples are source language text, and the training output samples are target language samples.

In some optional implementations, the set of training samples further includes validation samples. The training apparatus of the language translation model further includes a verification unit (not shown in the figure). The verification unit is configured to, after training the initial language model using a plurality of training samples of the set of training samples: verifying the language training model trained by the training samples by using verification samples; and finishing the training of the language translation model in response to the verification result meeting the preset condition.

Referring to fig. 4, a schematic structural diagram of some embodiments of a translation device according to the present disclosure is shown.

As shown in fig. 4, the translation apparatus includes: a receiving unit 401 and a translating unit 402. The receiving unit 401 is configured to receive a text to be translated expressed in a first language, where the text to be translated includes at least one sentence; a translation unit 402, configured to input the text to be translated into a trained language translation model for translation, so as to obtain a translation result text expressed by a second language, where the language translation model is obtained by training with a device using the language translation model in fig. 3.

Referring to fig. 5, fig. 5 illustrates an exemplary system architecture to which the language translation model training method, the translation method, and the apparatus of one embodiment of the present disclosure may be applied.

As shown in fig. 4, the system architecture may include

terminal devices

401, 402, 403, a network 404, and a server 405. The network 404 serves as a medium for providing communication links between the

terminal devices

401, 402, 403 and the server 405. Network 404 may include various types of connections, such as wire, wireless communication links, or fiber optic cables, to name a few.

The

terminal devices

401, 402, 403 may interact with a server 405 over a network 404 to receive or send messages or the like. The

terminal devices

401, 402, 403 may have various client applications installed thereon, such as a web browser application, a search-type application, news information, a multimedia conference-type application, and the like. The client application in the

terminal device

401, 402, 403 may receive the instruction of the user and complete the corresponding function according to the instruction of the user, for example, according to the text input by the user.

The

terminal devices

401, 402, and 403 may be hardware or software. When the

terminal devices

401, 402, and 403 are hardware, they may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, mpeg compression standard Audio Layer 3), MP4 players (Moving Picture Experts Group Audio Layer IV, mpeg compression standard Audio Layer 4), laptop portable computers, desktop computers, and the like. When the

terminal devices

401, 402, and 403 are software, they can be installed in the electronic devices listed above. It may be implemented as multiple pieces of software or software modules (e.g., software or software modules used to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The server 405 may be a server that provides various services, for example, receives translation request information transmitted by the

terminal devices

401, 402, and 403, analyzes the information stream, and transmits the result of the analysis (for example, the translation result) to the

terminal devices

401, 402, and 403.

It should be noted that the training method or the translation method of the language translation model provided in the embodiment of the present disclosure may be executed by the terminal device, and accordingly, the training apparatus or the translation apparatus of the language translation model may be disposed in the

terminal devices

401, 402, and 403. In addition, the training method or the translation method of the language translation model provided by the embodiment of the present disclosure may also be executed by the server 405, and accordingly, the training device or the translation device of the language translation model may be disposed in the server 405.

It should be understood that the number of terminal devices, networks, and servers in fig. 4 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Referring now to fig. 6, shown is a schematic diagram of an electronic device (e.g., a terminal device or a server of fig. 5) suitable for use in implementing embodiments of the present disclosure. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 6, the electronic device may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 601, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the electronic apparatus are also stored. The processing device 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 illustrates an electronic device having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of the embodiments of the present disclosure.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a training sample set, wherein the training sample set comprises a plurality of training samples; for each training sample, determining a first set of the training sample, wherein the first set comprises the training sample and at least one splitting result, and the splitting result is obtained by splitting the training sample; and training the initial language training model by using the plurality of training samples to the corresponding first set to obtain a trained language translation model. Or

Receiving a text to be translated expressed in a first language, wherein the text to be translated comprises at least one sentence; inputting the text to be translated into a trained language translation model for translation to obtain a translation result text expressed by a second language, wherein the language translation model is obtained by training the language translation model through a training method.

Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of a unit does not in some cases constitute a limitation of the unit itself, for example, the acquisition unit may also be described as a "unit for acquiring a set of training samples".

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims

1. A method for training a language translation model, comprising:

acquiring a training sample set, wherein the training sample set comprises a plurality of training samples;

for each training sample, determining a first set of the training sample, wherein the first set comprises the training sample and at least one splitting result, and the splitting result is obtained by splitting the training sample;

and training the initial language training model by using the plurality of training samples to the corresponding first set to obtain a trained language translation model.

2. The method of claim 1, wherein each split result is obtained based on the following steps:

determining the number of splitting segments corresponding to the splitting;

and splitting the training sample according to the number of the splitting segments to obtain a splitting result.

3. The method of claim 2, wherein the training samples comprise N sentences; the Mth splitting result is obtained according to the following steps:

determining this splitThe corresponding number of split stages is 2^M；

The training sample is evenly split into 2^MObtaining the M splitting result; wherein N is an integer of 1 or more, M is an integer of 0 or more, and 2^MIs less than or equal to N.

4. The method of claim 3, wherein each split result of the at least one split result comprises at least two split sub-results; the determining the first set of training samples comprises:

and performing mixed arrangement on the training sample and at least two splitting sub-results corresponding to the at least one splitting result to obtain the first set.

5. The method of any one of claims 1-4, wherein the training samples include training input samples and training output samples, the training input samples being source language text and the training output samples being target language samples.

6. The method of claim 4, wherein the set of training samples further comprises validation samples, and after training the initial language model using a plurality of training samples in the set of training samples, the method further comprises:

verifying the language training model trained by the training samples by using verification samples;

and finishing the training of the language translation model in response to the verification result meeting the preset condition.

7. A method of translation, comprising:

receiving a text to be translated expressed in a first language, wherein the text to be translated comprises at least one sentence;

inputting the text to be translated into a trained language translation model for translation to obtain a translation result text expressed by a second language, wherein the language translation model is obtained by training by using the method of any one of claims 1 to 6.

8. An apparatus for training a language translation model, comprising:

an obtaining unit, configured to obtain a training sample set, where the training sample set includes a plurality of training samples;

a determining unit, configured to determine, for each training sample, a first set of the training sample, where the first set includes the training sample and at least one splitting result, and the splitting result is obtained by splitting the training sample;

and the training unit is used for training the initial language training model by using the first set corresponding to the plurality of samples to obtain a trained language translation model.

9. A translation apparatus, comprising:

the translation device comprises a receiving unit, a translation unit and a translation unit, wherein the receiving unit is used for receiving a text to be translated expressed in a first language, and the text to be translated comprises at least one sentence;

a translation unit, configured to input the text to be translated into a trained language translation model for translation, so as to obtain a translation result text expressed in a second language, where the language translation model is obtained by using the apparatus according to claim 8.

10. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-6 or the method of claim 7.

11. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method of any one of claims 1 to 6 or carries out the method of claim 7.