CN111160049B

CN111160049B - Text translation method, apparatus, machine translation system, and storage medium

Info

Publication number: CN111160049B
Application number: CN201911244875.5A
Authority: CN
Inventors: 李良友; 王龙跃; 刘群; 陈晓
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2019-12-06
Filing date: 2019-12-06
Publication date: 2023-06-06
Anticipated expiration: 2039-12-06
Also published as: CN111160049A

Abstract

The application discloses a text translation method and a text translation device in the field of artificial intelligence, wherein the method comprises the following steps: obtaining candidate translations; selecting constraints in a preset constraint set according to the attention weight calculated by the text translation model; and expanding the candidate translations according to the target constraint when the target constraint for expanding the candidate translations is selected, or expanding the candidate translations according to a preset candidate word set when the target constraint for expanding the candidate translations is not selected. When the candidate translations are expanded, the method can select or filter the preset constraint set, can avoid using all constraints each time the candidate translations are expanded, and can accelerate the expansion of the candidate translations, so that the translation speed is improved.

Description

Text translation method, apparatus, machine translation system, and storage medium

Technical Field

The present application relates to the field of machine translation technology, and more particularly, to a text translation method, apparatus, machine translation system, and storage medium.

Background

Artificial intelligence (artificial intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is a branch of computer science that attempts to understand the nature of intelligence and to produce a new intelligent machine that can react in a similar manner to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

With the continuous development of artificial intelligence technology, natural language man-machine interaction systems that enable man-machine interaction through natural language are becoming more and more important. The man-machine interaction can be performed through natural language, and the system is required to recognize the specific meaning of the natural language of the human. Generally, the system recognizes a specific meaning of a sentence by employing key information extraction of the sentence in natural language.

In recent years, neural machine translation has been rapidly developed, and has become a mainstream machine translation technique beyond the conventional statistical machine translation. Many companies, such as google, hundred degrees, microsoft, etc., have applied neuro-machine translation to their translation products. In order to enable certain phrases or words in an input source language sentence to be translated correctly, current neural machine translation supports manual intervention on the translation result of the neural machine translation. One way is to add the known correct translation as a constraint to the neural machine translation and ensure that the target end of the constraint must appear in the final output translation.

Therefore, how to efficiently and accurately use these constraints is a problem to be solved.

Disclosure of Invention

The application provides a text translation method, a text translation device, a machine translation system and a storage medium, which can efficiently and accurately use constraint when performing machine translation, thereby improving translation speed.

In a first aspect, the present application provides a text translation method, including: obtaining candidate translations corresponding to the source text; selecting a constraint in a preset constraint set according to the attention weight calculated by the text translation model, wherein the constraint characterizes the correct translation of at least part of the source text; when a target constraint for expanding the candidate translation is selected, expanding the candidate translation according to the target constraint; or when the target constraint for expanding the candidate translation is not selected, expanding the candidate translation according to a preset candidate word set, wherein the candidate word set comprises words of a plurality of target languages, and the target language is the language to which the candidate translation belongs.

Optionally, when a target constraint for expanding the candidate translation is selected, expanding the candidate translation according to the target constraint and the preset candidate word set.

The candidate translations are intermediate or final results of translating the source translation. For example, in the process of translating "I graduate into" I graduated from Hefei University of Technology ", I", "I graduate from", etc. are intermediate results of translating "I graduate into" I graduate ", when" I "is used as a candidate translation, the new candidate translation obtained by expanding" I "may include" I graduate ", further, the obtained candidate translation" I graduate "may be further expanded, and the new candidate translation obtained by expanding" I graduate "may include" I graduate from "; and the candidate translation "I graduated from Hefei University of Technology" is obtained and is not expanded, and the candidate translation is the final result of translating the source text.

It should be understood that, in the present application, the number of candidate translations may be one or more, and when each candidate translation in the one or more candidate translations is expanded, one or more new candidate translations may be obtained.

The text translation model of the present application may be a neural network based translation model that includes a portion related to an attention mechanism that may be calculated during translation to obtain a corresponding attention weight. Alternatively, the neural network model may include an encoder for reading the source text and generating a digitized representation for each source language word included in the source text, a decoder for generating a translation of the source text, i.e., a sentence in the target language, and a portion associated with the attention mechanism for dynamically providing attention weights to the decoder when generating the target word at different times based on the output of the encoder and the state of the decoder.

The attention weight may be a measure characterizing the degree of relevance of each source language word in the source text to the decoder state at the current time. For example, at a certain moment, the attention weights of the 4 source language words corresponding to the source text are respectively 0.5, 0.3, 0.1 and 0.1, which may indicate that the correlation degrees between the 4 source language words and the decoder state are respectively 0.5, 0.3, 0.1 and 0.1, and the correlation degree between the first source language word and the decoder state is the highest, so that the possibility that the decoder is currently generating the target word corresponding to the first source language word is the highest.

The candidate word set includes words in a plurality of target languages, and the target language is the language to which the candidate translation belongs. The candidate word set may be a preset candidate word library, and the text translation model may score candidate words in the candidate word library at each moment, so as to determine candidate words for expanding the candidate translations according to the scores of the candidate words. For example, candidate translations may be extended using candidate words whose scores exceed a preset threshold.

Constraints in this application may characterize the correct translation or correct translation of at least part of the source text. Alternatively, the at least part of the source text may be a source language word or a source language phrase comprised by the source text. Alternatively, the constraint may include source location information and a target word corresponding to the source location information, where the source refers to a source text input and the target word is a correct translation of the source word at the source location indicated by the source location information. Alternatively, the form of the constraint may be: [ position of Source language words in Source text ]: target words, for example, [4]: hefei University of Technology. Alternatively, the form of the constraint may be: source language words: target words, etc., for example, co-worker: hefei University of Technology.

Both the source text and the candidate translation may belong to a natural language, which generally refers to a language that naturally evolves with culture. Optionally, the source text belongs to a first natural language, the candidate translation belongs to a second natural language, and the first and second languages are different types of natural languages. The source text belonging to the first natural language may refer to that the source text is a text expressed in the first natural language, and the candidate translation belongs to the second natural language, and may refer to that the candidate translation is a text expressed in the second natural language. The source text and the candidate translation may belong to any two different types of natural language.

It should be understood that the subject of execution of the text translation method of the present application may be a text translation device or a machine translation system.

In the above technical solution, when expanding the candidate translations, selecting or filtering constraints in a preset constraint set according to the attention weight calculated by the text translation model. And when the target constraint is selected, the target constraint is used for expanding the candidate translation, and when the target constraint is not selected, the constraint in a constraint set preset by the constraint is not used for expanding the candidate translation, namely the candidate translation is only expanded according to the preset candidate word set. Therefore, all constraints can be avoided from being used each time candidate translation expansion is carried out, and the expansion of candidate translations can be quickened; and because the attention weight can represent the correlation degree of each source language word in the source text at the current moment and the state of the decoder, the preset constraint is selected according to the attention weight, the constraint with lower correlation degree with the current decoder can be ignored, and the influence on the quality of the candidate translation is reduced. Therefore, the technical scheme can accelerate the expansion of the candidate translations while ensuring the quality of the candidate translations, thereby improving the translation speed.

In one possible implementation, the selecting the constraint in the preset constraint set according to the attention weight calculated by the text translation model includes: according to a source end position corresponding to each constraint in the preset constraint set, attention weights respectively corresponding to each constraint are obtained from the text translation model, wherein the source end position is a position of a word corresponding to each constraint in the source text; and selecting the constraint in the preset constraint set according to the attention weight corresponding to each constraint.

It should be understood that the words corresponding to the constraints are words of the source language corresponding to the constraints.

The source location to which the constraint corresponds may be determined by the source location information in the constraint.

In one possible implementation manner, the selecting, according to the attention weight corresponding to each constraint, a constraint in the preset constraint set includes: processing the attention weight corresponding to each constraint to obtain a heuristic signal of each constraint, wherein the heuristic signal is used for indicating whether the constraint corresponding to the heuristic signal is used when expanding the candidate translation; and selecting the constraint in the preset constraint set according to the heuristic signal corresponding to each constraint.

In one possible implementation, the selecting the constraint in the preset constraint set according to the attention weight calculated by the text translation model includes: selecting a target attention weight meeting preset requirements from the attention weights; selecting a constraint in the preset constraint set according to the source end position corresponding to the target attention weight and the source end position corresponding to each constraint in the preset constraint set, wherein the source end position corresponding to the target attention weight is the position of a word corresponding to the target attention weight in the source text, and the source end position corresponding to each constraint is the position of the word corresponding to each constraint in the source text.

It should be understood that, here, the words corresponding to the target attention weights are the source words corresponding to the target attention weights, and similarly, the words corresponding to the constraints are the source words corresponding to the constraints.

In one possible implementation manner, the selecting the constraint in the preset constraint set according to the attention weight calculated by the text translation model includes: selecting constraints in a preset constraint set according to the attention weight calculated by a text translation model and the states of the candidate translations, wherein the states of the candidate translations are included in the constraints and are not in the constraints, and the states of the candidate translations are in the constraints when the candidate translations are obtained by using partial word expansion of target phrases, and the target phrases are target end phrases corresponding to the constraints in the preset constraint set; the target constraint satisfies at least one of the following conditions: the attention weight corresponding to the target constraint meets the preset requirement; the states of the candidate translations are in a constraint.

In some cases, the constraint may include a plurality of qualifiers, for example, constraint [4] corresponding to "co-worker big": hefei University of Technology includes Hefei, university, of and technical 4 qualifiers. In the scenario where the candidate translations are expanded word by word, there are some constraint-defining words that may have been used in the candidate translations to be expanded at the current time, and the state of the candidate translations may be referred to as being in the constraint, for example, "I graduated from Hefei" for the candidate translations at the current time, constraint [4] has been used: hefei University of Technology, "Hefei". In consideration of the above situation, the above technical solution may combine the attention weight calculated by the text translation model with the state of the current candidate translation, and select the constraint in the preset constraint set, so that the selection result is more accurate.

Alternatively, when the candidate translation is within the constraint, the candidate translation may be extended based only on the target constraint.

Alternatively, when the candidate translation is in the constraint, the candidate translation may be expanded according to a preset candidate word set and a target constraint.

In a second aspect, the present application provides a text translation apparatus comprising a memory for storing a program; a processor for executing the program stored in the memory, the processor being configured to, when the program stored in the memory is executed by the processor: obtaining candidate translations corresponding to the source text; selecting a constraint in a preset constraint set according to the attention weight calculated by the text translation model, wherein the constraint characterizes the correct translation of at least part of the source text; when a target constraint for expanding the candidate translation is selected, expanding the candidate translation according to the target constraint; or when the target constraint for expanding the candidate translation is not selected, expanding the candidate translation according to a preset candidate word set, wherein the candidate word set comprises words of a plurality of target languages, and the target language is the language to which the candidate translation belongs.

In one possible implementation, the processor is specifically configured to: according to a source end position corresponding to each constraint in the preset constraint set, attention weights respectively corresponding to each constraint are obtained from the text translation model, wherein the source end position is a position of a word corresponding to each constraint in the source text; and selecting the constraint in the preset constraint set according to the attention weight corresponding to each constraint.

In one possible implementation, the processor is specifically configured to: processing the attention weight corresponding to each constraint to obtain a heuristic signal of each constraint, wherein the heuristic signal is used for indicating whether the constraint corresponding to the heuristic signal is used when expanding the candidate translation; and selecting the constraint in the preset constraint set according to the heuristic signal corresponding to each constraint.

In one possible implementation, the processor is specifically configured to: selecting a target attention weight meeting preset requirements from the attention weights; selecting a constraint in the preset constraint set according to the source end position corresponding to the target attention weight and the source end position corresponding to each constraint in the preset constraint set, wherein the source end position corresponding to the target attention weight is the position of a word corresponding to the target attention weight in the source text, and the source end position corresponding to each constraint is the position of the word corresponding to each constraint in the source text.

In one possible implementation, the processor is specifically configured to: selecting constraints in a preset constraint set according to the attention weight calculated by a text translation model and the states of the candidate translations, wherein the states of the candidate translations are included in the constraints and are not in the constraints, and the states of the candidate translations are in the constraints when the candidate translations are obtained by using partial word expansion of target phrases, and the target phrases are target end phrases corresponding to the constraints in the preset constraint set; the target constraint satisfies at least one of the following conditions: the attention weight corresponding to the target constraint meets the preset requirement; the states of the candidate translations are in a constraint.

In a third aspect, the present application provides a text translation apparatus, the apparatus comprising a memory for storing a program; a processor for executing the program stored in the memory, the text translation device executing the method of the first aspect or any one of the possible implementation manners of the first aspect when the program stored in the memory is executed by the processor.

Optionally, the apparatus further comprises a data interface, and the processor reads the program stored on the memory through the data interface.

In a fourth aspect, the present application provides a machine translation system comprising the text translation device of the second aspect or any one of the possible implementation manners of the second aspect, wherein the text translation device is configured to perform the method of the first aspect or any one of the possible implementation manners of the first aspect.

The text translation device may be an electronic device (or a module located in an electronic device), which may specifically be a mobile terminal (e.g. a smart phone), a computer, a personal digital assistant, a wearable device, a vehicle-mounted device, an internet of things device, or other devices capable of performing natural language processing.

In a fifth aspect, the present application provides a computer readable medium storing program code for execution by a device, the program code comprising instructions for performing the method of the first aspect or any one of the possible implementations of the first aspect.

In a sixth aspect, the present application provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of the first aspect or any one of the possible implementations of the first aspect.

In a seventh aspect, the present application provides a chip, the chip including a processor and a data interface, the processor reading instructions stored on a memory through the data interface, and executing the method in the first aspect or any one of the possible implementation manners of the first aspect.

Optionally, as an implementation manner, the chip may further include a memory, where the memory stores instructions, and the processor is configured to execute the instructions stored on the memory, where the instructions, when executed, are configured to perform the method in the first aspect.

In an eighth aspect, the present application provides an electronic device, which includes the text translation apparatus in any one of the possible implementation manners of the second aspect or the second aspect, or the text translation apparatus in the third aspect, or the machine translation system in the fourth aspect.

Drawings

Fig. 1 is a schematic diagram of an application scenario of natural language processing according to an embodiment of the present application.

Fig. 2 is a schematic diagram of an application scenario of another natural language processing provided in an embodiment of the present application.

Fig. 3 is a schematic diagram of a related device for natural language processing according to an embodiment of the present application.

Fig. 4 is a schematic diagram of a system architecture according to an embodiment of the present application.

Fig. 5 is a schematic diagram of an RNN model according to an embodiment of the present application.

Fig. 6 is a schematic diagram of a hardware structure of a chip according to an embodiment of the present application.

Fig. 7 is a schematic diagram of a text translation model based on a neural network according to an embodiment of the present application.

Fig. 8 is a flowchart of a text translation process provided in an embodiment of the present application.

Fig. 9 is a schematic diagram of another text translation process provided in an embodiment of the present application.

Fig. 10 is a schematic flowchart of a text translation method provided in an embodiment of the present application.

FIG. 11 is a schematic flow chart diagram of a method for selecting constraints provided by an embodiment of the present application.

Fig. 12 is a schematic structural diagram of a text translation apparatus provided in an embodiment of the present application.

Fig. 13 is a schematic block diagram of a text translation apparatus according to another embodiment of the present application.

Fig. 14 is a schematic diagram of a machine translation system provided by an embodiment of the present application.

Detailed Description

The technical solutions in the present application will be described below with reference to the accompanying drawings.

In order to better understand the schemes of the embodiments of the present application, a possible application scenario of the embodiments of the present application will be briefly described with reference to fig. 1 to 3. The technical scheme of the embodiment of the application can be applied to various scenes as long as the sequence generation task of limited decoding is needed in the scenes. Such as machine translation of a scene, a process of automatically generating a text excerpt, etc. The following describes the technical solution of the present application by taking a machine translation scenario as an example.

Fig. 1 shows a natural language processing system comprising a user device and a data processing device. The user equipment comprises intelligent terminals such as a mobile phone, a personal computer or an information processing center. The user equipment is an initiating terminal of natural language data processing, and is used as an initiating party of a request such as a language question answer or a query, and the user usually initiates the request through the user equipment.

The data processing device may be a device or a server having a data processing function, such as a cloud server, a web server, an application server, and a management server. The data processing equipment receives inquiry sentences, voice, text and other questions from the intelligent terminal through the interactive interface, and then carries out language data processing in the modes of machine learning, deep learning, searching, reasoning, decision making and the like through a memory for storing data and a processor link for data processing. The memory in the data processing device may be a generic term comprising a database storing the history data locally, which may be on the data processing device or on another network server.

In the natural language processing system shown in fig. 1, a user device may receive an instruction from a user to request machine translation of a source text (e.g., the source text may be a piece of chinese input by the user) to obtain a machine translation (e.g., the machine translation may be english obtained by machine translation), and then send the source text to a data processing device, so that the data processing device translates the source text to obtain the machine translation.

In fig. 1, a data processing apparatus may perform the text translation method of the embodiments of the present application.

Fig. 2 shows another natural language processing system, in fig. 2, a user device directly serves as a data processing device, and the user device can directly receive input from a user and directly process the input by hardware of the user device, and a specific process is similar to that of fig. 1, and reference is made to the above description and will not be repeated herein.

In the natural language processing system shown in fig. 2, the user equipment may receive an instruction of a user, and the user equipment itself performs machine translation on the source text to obtain a machine translation.

In fig. 2, the user device itself may perform the text translation method according to the embodiment of the present application.

The user device in fig. 1 and fig. 2 may be specifically the local device 301 or the local device 302 in fig. 3, and the data processing device in fig. 1 may be specifically the execution device 210 in fig. 3, where the data storage system 250 may store data to be processed of the execution device 210, and the data storage system 250 may be integrated on the execution device 210, or may be disposed on a cloud or other network server.

The processors in fig. 1 and 2 may perform data training/machine learning/deep learning through a neural network model or other models (e.g., a model based on a support vector machine), and translate the source text using the model resulting from the data final training or learning, thereby obtaining machine translations.

Fig. 4 illustrates a system architecture 100 provided by an embodiment of the present application. In fig. 4, the data collection device 160 is configured to collect training data, where the training data in this embodiment includes training source text and training machine translation (a translation obtained by translating the training source text through the machine translation system).

After the training data is collected, the data collection device 160 stores the training data in the database 130 and the training device 120 trains the target model/rule 101 based on the training data maintained in the database 130.

The training device 120 will be described below as obtaining the target model/rule 101 based on the training data, where the training device 120 processes the input training source text, and compares the output machine translation with the training machine translation until the difference between the machine translation output by the training device 120 and the training machine translation is less than a certain threshold, thereby completing training of the target model/rule 101.

The target model/rule 101 can be used to implement the text translation method according to the embodiments of the present application, that is, the source text is input into the target model/rule 101 after being subjected to related preprocessing (which may be performed by the preprocessing module 113 and/or the preprocessing module 114), so as to obtain the machine translation. The target model/rule 101 in the embodiment of the present application may be specifically a neural network. In practical applications, the training data maintained in the database 130 is not necessarily collected by the data collecting device 160, but may be received from other devices. It should be noted that the training device 120 is not necessarily completely based on the training data maintained by the database 130 to perform training of the target model/rule 101, and it is also possible to obtain the training data from the cloud or other places to perform model training, which should not be taken as a limitation of the embodiments of the present application.

The target model/rule 101 obtained by training according to the training device 120 may be applied to different systems or devices, such as the execution device 110 shown in fig. 4, where the execution device 110 may be a terminal, such as a mobile phone terminal, a tablet computer, a notebook computer, an augmented reality (augmented reality, AR)/Virtual Reality (VR), a vehicle-mounted terminal, or may also be a server or cloud. In fig. 4, the execution device 110 configures an input/output (I/O) interface 112 for data interaction with an external device, and a user may input data to the I/O interface 112 through the client device 140, where the input data may include in embodiments of the present application: the source text entered by the client device.

The preprocessing module 113 and the preprocessing module 114 are configured to perform preprocessing (specifically, may process a source text to obtain a word vector) according to input data (such as a source text) received by the I/O interface 112, and in this embodiment of the present application, the preprocessing module 113 and the preprocessing module 114 may not be provided (or only one of the preprocessing modules may be provided), and the computing module 111 may be directly used to process the input data.

In preprocessing input data by the execution device 110, or in performing processing related to computation or the like by the computation module 111 of the execution device 110, the execution device 110 may call data, codes or the like in the data storage system 150 for corresponding processing, or may store data, instructions or the like obtained by corresponding processing in the data storage system 150.

Finally, the I/O interface 112 feeds back the processing results, e.g., machine translations, to the client device 140.

It should be noted that, the training device 120 may generate, for different downstream systems, the target model/rule 101 corresponding to the downstream system, and the corresponding target model/rule 101 may be used to achieve the above target or complete the above task, thereby providing the user with the desired result.

In the case shown in FIG. 4, the user may manually give input data (e.g., enter a piece of text) that may be manipulated through an interface provided by the I/O interface 112. In another case, the client device 140 may automatically send input data (e.g., enter a piece of text) to the I/O interface 112, and if the client device 140 is required to automatically send the input data requiring authorization from the user, the user may set the corresponding permissions in the client device 140. The user may view the results output by the execution device 110 at the client device 140, and the particular presentation may be in the form of a display, sound, action, etc. (e.g., the output results may be machine translations). The client device 140 may also be used as a data collection terminal to collect input data of the input I/O interface 112 and output results of the output I/O interface 112 as new sample data as shown in the figure, and store the new sample data in the database 130. Of course, instead of being collected by the client device 140, the I/O interface 112 may directly store the input data input to the I/O interface 112 and the output result output from the I/O interface 112 as new sample data into the database 130.

It should be noted that fig. 4 is only a schematic diagram of a system architecture provided in the embodiments of the present application, and the positional relationship between devices, apparatuses, modules, and the like shown in the drawings does not constitute any limitation. For example, in FIG. 4, data storage system 150 is external memory to execution device 110, and in other cases, data storage system 150 may be located within execution device 110.

As shown in fig. 4, the training device 120 trains to obtain a target model/rule 101, where the target model/rule 101 may be a neural machine translation model in an embodiment of the present application, and specifically, the neural network provided in the embodiment of the present application may be a convolutional neural network (convolutional neural network, CNN), a deep convolutional neural network (deep convolutional neural network, DCNN), a cyclic neural network (recurrent neural network, RNN), and so on.

Since RNN is a very common neural network, the following details of the structure of RNN are focused on in connection with fig. 5.

Fig. 5 is a schematic structural diagram of an RNN model according to an embodiment of the present application. Wherein each circle can be seen as a unit and the things each unit does are the same, so it can be folded into the left half of the figure. RNN is interpreted in a sentence, which is the reuse of a unit structure.

RNN is a sequence-to-sequence model, assuming x _t-1 ，x _t ，x _t+1 Is an input: "I are China", then o _t-1 ，o _t What should be predicted the next word to be the most likely corresponding to the two "yes", "china"? O is _t+1 The probability that it should be "human" is relatively large.

Thus, we can define as follows:

x _t : input representing time t, o _t : representing the output at time t, s _t : the memory at time t is shown. Because the output at the current time is determined by memory and the output at the current time, just as you are now four, your knowledge is a combination of knowledge learned by four universities (current inputs) and what was learned before three universities (memories), RNN is also similar in this regard, neural networks are best at integrating many things together through a series of parameters, and then learning this parameter, thus defining the basis of RNN:

s _t ＝f(U*x _t +W*s _t-1 )

the f () function is an activation function in the neural network, but why is it added? For example, if a very good solution is obtained at university, do that at that time at junior middle school? Obviously, it is not used. The idea of RNN is also that, since it can memorize, it is of course only important information, and others that are not important must be forgotten. But what is the most appropriate filtering information in the neural network is? It is an activation function, so here, an activation function is applied to make a nonlinear mapping to filter information, and the activation function may be tanh or ReLU, or may be other.

If the four-time graduation is performed, ask to take part in the examination, whether to remember what you have learned before going to the examination, or directly take a few books to take part in the examination? It is evident that the idea of the mani, that RNN is to predict the memory s with the current moment _t And predicting. If you want to predict the probability of the next word in "i am chinese", it is already obvious here that it is not appropriate to use softmax to predict the probability of each word, but the prediction cannot be directly predicted with a matrix, and all predictions are also formulated with a weight matrix V:

ot＝softmax(V*s _t )

wherein o is _t The output at time t is indicated.

It should be noted that RNNs shown in fig. 5 are only examples of a cyclic neural network, and the cyclic neural network may also exist in the form of other network models in specific applications.

Fig. 6 is a schematic diagram of a hardware structure of a chip according to an embodiment of the present application. The chip includes a neural network processor (neural processing unit, NPU) 50. The chip may be provided in an execution device 110 as shown in fig. 4 for performing the calculation of the calculation module 111. The chip may also be provided in the training device 120 as shown in fig. 4 for completing the training work of the training device 120 and outputting the target model/rule 101. The algorithm in the recurrent neural network as shown in fig. 5 may be implemented in a chip as shown in fig. 6.

The text translation method of the embodiment of the present application may be specifically executed in the operation circuit 503 and/or the vector calculation unit 507 in the NPU50, so as to obtain a machine translation.

The various modules and units within the NPU50 are briefly described below.

The NPU50 may be mounted as a coprocessor to a host CPU (host CPU) which distributes tasks. The core part of the NPU50 is an arithmetic circuit 503, and when the NUP 50 is in operation, the controller 504 in the NPU50 can control the arithmetic circuit 503 to extract data in a memory (a weight memory or an input memory) and perform operation.

In some implementations, the arithmetic circuitry 503 internally includes a plurality of processing units (PEs). In some implementations, the operational circuitry 503 is a two-dimensional systolic array. The arithmetic circuit 503 may also be a one-dimensional systolic array or other electronic circuitry capable of performing mathematical operations such as multiplication and addition. In some implementations, the operation circuit 503 is a general-purpose matrix processor.

For example, assume that there is an input matrix a, a weight matrix B, and an output matrix C. The arithmetic circuit fetches the data corresponding to the matrix B from the weight memory 502 and buffers the data on each PE in the arithmetic circuit. The arithmetic circuit takes matrix a data from the input memory 501 and performs matrix operation with matrix B, and the obtained partial result or final result of the matrix is stored in an accumulator (accumulator) 508.

The vector calculation unit 507 may further process the output of the operation circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, magnitude comparison, and the like. For example, the vector calculation unit 507 may be used for network calculations of non-convolutional/non-fully-connected (fully connected layers, FC) layers in a neural network, such as pooling, batch normalization (batch normalization), local response normalization (local response normalization), and the like.

In some implementations, the vector computation unit 507 can store the vector of processed outputs to the unified buffer 506. For example, the vector calculation unit 507 may apply a nonlinear function to an output of the operation circuit 503, such as a vector of accumulated values, to generate an activation value. In some implementations, the vector calculation unit 507 generates a normalized value, a combined value, or both. In some implementations, the vector of processed outputs can be used as an activation input to the operational circuitry 503, for example for use in subsequent layers in a neural network.

The unified memory 506 is used for storing input data and output data.

The weight data is directly transferred to the input memory 501 and/or the unified memory 506 through the memory cell access controller 505 (direct memory access controller, DMAC), the weight data in the external memory is stored in the weight memory 502, and the data in the unified memory 506 is stored in the external memory.

A bus interface unit (bus interface unit, BIU) 510 for interfacing between the main CPU, DMAC and finger memory 509 via a bus.

An instruction fetch memory (instruction fetch buffer) 509 coupled to the controller 504 for storing instructions for use by the controller 504.

And a controller 504 for calling the instruction cached in the instruction memory 509 to control the operation of the operation accelerator.

In general, unified memory 506, input memory 501, weight memory 502, and finger memory 509 may all be on-chip (on-chip) memory. The external memory of the NPU may be memory external to the NPU, which may be double data rate synchronous dynamic random access memory (double data rate synchronous dynamic random access memory, DDR SDRAM), high bandwidth memory (high bandwidth memory, HBM), or other readable and writable memory.

FIG. 7 is a schematic diagram of a neural network based text translation model. As shown in fig. 7, the text translation model includes an encoder 710, a decoder 720, and a portion 730 related to the attention mechanism. Wherein encoder 710 is configured to read the source text (e.g., I am a student) and generate a digitized representation for each source language word (e.g., I, am, a, student) included in the source text; through the decoder 720, the source text is translated into a translation, i.e., a sentence in the target language (e.g., je suis tuzant); the section 730 associated with the attention mechanism is used to dynamically provide attention weights to the decoder when generating the target word at different times based on the output of the encoder and the state of the decoder, which can be characterized by the intermediate output of the decoder.

Wherein the attention weight may be a degree of correlation characterizing each source language word in the source text with the decoder state at the current time. For example, at a certain moment, the attention weights of the 4 source language words corresponding to the source text are respectively 0.5, 0.3, 0.1 and 0.1, which may indicate that the correlation degrees between the 4 source language words and the decoder state are respectively 0.5, 0.3, 0.1 and 0.1, and the correlation degree between the first source language word and the decoder state is the highest, so that the possibility that the decoder is currently generating the target word corresponding to the first source language word is the highest.

In order to enable certain source language words in an input source text to be translated correctly, current neural machine translation supports manual intervention on the translation results of the neural machine translation. One way is to add the known correct translation as a constraint to the neural machine translation and to ensure that the target word included in the constraint must appear in the final output translation.

Fig. 8 is a flow chart of a text translation process. The text translation model shown in fig. 8 may be the text translation model shown in fig. 7, in which process known correct translations may be added as constraints to neural machine translations. Specifically, K candidate translations are used as input at each moment, and each candidate translation is decoded through a text translation model to obtain the score of each candidate word in a preset candidate word set; expanding the candidate translations according to the obtained scores of the candidate words and a preset constraint set to obtain a new candidate translation set; then selecting a certain number from the new candidate translation set as the input of the next step; if the decoding termination condition is met, a translation result, namely the translation of the source text is output, and if the decoding termination condition is not met, a certain number of selected candidate translations are further expanded.

Wherein the set of constraints includes one or more constraints that characterize at least a portion of the correct translation or correct translation of the source text. Alternatively, the at least part of the source text may be a source language word or a source language phrase comprised by the source text. The set of constraints may be manually given by the user. The candidate translations are intermediate or final results of translating the source translation. For example, "I's graduation in the industry of the university" is translated into English "I graduated from Hefei University of Technology", the constraint set comprises "Hadamard" correct translations "Hefei University of Technology", in the translation process, "I", "I graduated from" and the like are all intermediate results of translating "I's graduation in the industry of the university", when "I" is used as a candidate translation, new candidate translations obtained by expanding "I" can comprise "I graduated", "I Hefei" and the like, and further, the obtained new candidate translations can be further expanded; and the candidate translation "I graduated from Hefei University of Technology" is obtained and is not expanded, and the candidate translation is the final result of translating the source text.

The method for generating the machine translation uses constraint expansion candidate translations in the constraint set, and meanwhile, the coverage condition of the new candidate translations on the constraint set is considered in the selection stage, so that the constraint target words are ensured to appear in the finally output translations. However, in the method for generating machine translations, all constraints in the constraint set are used to extend each candidate translation, for example, "Hefei" is added to the new candidate translation set during the process of generating "I", and "I Hefei", "Hefei University" and the like are generated during the process of generating "I graduted", which obviously causes waste of time and space. Therefore, how to efficiently and accurately use these constraints is a problem to be solved.

Aiming at the problems, the embodiment of the application provides a text translation method and a text translation device, which can efficiently and accurately use constraint when performing machine translation, thereby improving translation speed and reducing space waste.

The text translation method according to the embodiment of the present application is described in detail below with reference to the accompanying drawings. The text translation method of the embodiment of the present application may be executed by the data processing device in fig. 1, the user device in fig. 2, the execution device 210 in fig. 3, and the execution device 110 in fig. 4.

Fig. 9 is a schematic diagram of a text translation process provided in an embodiment of the present application. As shown in FIG. 9, the step of selecting constraints is added to filter the set of constraints as compared to the text translation process shown in FIG. 8. In the process shown in fig. 8, a complete constraint set relative to the source text is used for all candidate translations at all times, while in the process shown in fig. 9, constraints related to candidate translations to be expanded at the current time can be selected at the current time to form a new constraint set, and the new constraint set is a subset of the complete constraint set, so that the candidate translations to be expanded at the current time are expanded by using the new constraint set. Thus, the process shown in FIG. 9 may avoid using all constraints each time a candidate translation expansion is performed, and may accelerate the expansion of candidate translations, thereby increasing translation speed.

Fig. 10 is a schematic flowchart of a text translation method provided in an embodiment of the present application. The text translation method of the embodiment of the present application may be executed by the data processing device in fig. 1, the user device in fig. 2, the execution device 210 in fig. 3, and the execution device 110 in fig. 4.

At 1010, candidate translations corresponding to the source translation are obtained.

The candidate translations are candidate translations to be expanded at the current moment, and can be described by adopting a target language.

In 1020, a constraint in a preset set of constraints is selected according to the attention weight output by the text translation model.

The attention weight may be a degree of correlation used to characterize the state of the decoder with respect to each source language word in the source text. For example, at a certain moment, the attention weights of the 4 source language words corresponding to the source text are respectively 0.5, 0.3, 0.1 and 0.1, which may indicate that the correlation degrees between the 4 source language words and the decoder state are respectively 0.5, 0.3, 0.1 and 0.1, and the correlation degree between the first source language word and the decoder state is the highest, so that the possibility that the decoder is currently generating the target word corresponding to the first source language word is the highest.

The attention weight is the attention weight calculated at the current moment of the text translation model. That is, the constraints in the preset constraint set are selected according to the attention weight output by the text translation model at the current moment.

Constraints in this application may characterize the correct translation or correct translation of at least part of the source text. Alternatively, the at least part of the source text may be a source language word or a source language phrase comprised by the source text.

In some embodiments, the constraints may include source information and destination information, where the source may be a source text input and the destination may be a translation output.

For example, the form of the constraint may be: source language words: the form of the target word, for example, the co-worker: hefei University of Technology.

For another example, the constraint includes source information that is source location information and target information that is a target word corresponding to the source location information. For example, the form of the constraint may be: [ position of Source language words in Source text ]: target words, for example, [4]: hefei University of Technology.

In the present application, there are many ways to select the constraint in the preset constraint set according to the attention weight output by the text translation model, which is not specifically limited in the present application.

In some embodiments, according to the source end position corresponding to each constraint in the preset constraint set, attention weights respectively corresponding to each constraint are obtained from the text translation model, and the constraints in the preset constraint set are selected according to the obtained attention weights. The source end position corresponding to the constraint refers to the position in the source text corresponding to the constraint. The source location information included in the constraint may indicate where in the source text the constraint characterizes the correct translation of the source word.

For example, the constraint set is { [4]: hefei University of Technology, [6]: sushma Swaraj, and two constraints correspond to the source position 4 and the source position 6, respectively, then attention weights corresponding to the source position 4 and the source position 6 are obtained from the text translation model, and the two constraints are selected according to the obtained weights.

Optionally, after the attention weights corresponding to each constraint in the constraint set are obtained from the text translation model, the obtained attention weights may be processed to obtain heuristic signals of each constraint, and the constraint in the preset constraint set is selected according to the heuristic signals of each constraint. Wherein the heuristic signals are used to indicate whether constraints corresponding to the heuristic signals are used in expanding the candidate translations. For example, the obtained attention weights may be compared with preset thresholds, respectively, and heuristic signals determining a constraint corresponding to an attention weight greater than or equal to the preset threshold indicate that the constraint is used when expanding the candidate translation at the current time, and heuristic signals determining a constraint corresponding to an attention weight less than the preset threshold indicate that the constraint is not used when expanding the candidate translation at the current time. For another example, when the text translation model calculates multiple attention weights for the same source location at each time instant, the multiple acquired attention weights corresponding to the same source location may also be processed, e.g., summed, averaged, maximized top Q, or other more complex processing; heuristic signals for each constraint are determined based on the progress of the processing results.

In other embodiments, the attention weight corresponding to each source end position at the current moment may be obtained from the text translation model, and then the target attention weight meeting the preset requirement is selected from the obtained attention weights, so that the constraint in the preset constraint set is selected according to the source end position corresponding to the target attention weight and the source end position corresponding to each constraint in the preset constraint set. The source end position is a position in the source text, specifically, the source end position corresponding to the target attention weight is a position of a word corresponding to the target attention weight in the source text, and the constraint corresponding source end position is a position of a word corresponding to the constraint in the source text.

For example, the constraint set is { [4]: hefei University of Technology, [6]: sushma Swaraj }, the attention weights at the current time corresponding to the source position 1 to the source position 6 obtained from the text translation model are respectively 0.01, 0.95, 0.01 and 0.01, 6 attention weights are compared with a preset threshold value of 0.5, the obtained target attention weights comprise attention weights corresponding to the source position 4, and the source position is selected from a constraint set as the constraint of the source position 4, namely [4]: hefei University of Technology, using constraint [4]: hefei University of Technology extend the candidate translations.

In some embodiments, some constraints may include a plurality of qualifiers, for example, constraint [4] corresponding to "co-worker big": hefei University of Technology includes Hefei, university, of and technical 4 qualifiers. In the scenario where the candidate translations are expanded word by word, there are some constraint-defining words that may have been used in the candidate translations to be expanded at the current time, and the state of the candidate translations may be referred to as being in the constraint, for example, "I graduated from Hefei" for the candidate translations at the current time, constraint [4] has been used: hefei University of Technology, "Hefei". In view of the above, the present application may combine the attention weight calculated by the text translation model with the state of the current candidate translation, and select a constraint in a preset constraint set.

In one possible implementation manner, the constraint in the preset constraint set may be selected according to the attention weight calculated by the text translation model and the state of the candidate translation, so that the target constraint for expanding the candidate translation at the current moment meets at least one of the following conditions: the attention weight corresponding to the target constraint meets the preset requirement; the candidate translations are in the constraint.

When a candidate translation to be expanded at the current moment is obtained by using partial word expansion of the target phrase, the candidate translation to be expanded is in the constraint, and the target phrase is a phrase corresponding to a certain constraint in a preset constraint set.

The above-mentioned preset requirement may be that the heuristic signal of the corresponding constraint indicates that the constraint is used when expanding the candidate translation at the current moment, which is greater than the preset threshold, and the embodiment of the present application is not limited specifically.

The method for selecting constraints provided in the embodiments of the present application is described below with reference to specific examples. FIG. 11 is a schematic flow chart diagram of a method for selecting constraints provided by an embodiment of the present application. N in fig. 11 represents the number of attention weights, and M represents the number of constraints in a preset constraint set.

As shown in fig. 11, at 1110, a confidence level is calculated for each constraint in a set of preset constraints.

Specifically, according to N source end position information included in the kth constraint in the M constraints, from N attention weights { aw ₁ ,...,aw _N Attention weight { aw) of corresponding position is extracted from } _i,j I 1 is less than or equal to i is less than or equal to N,1 is less than or equal to j is less than or equal to N, i represents the ith attention weight, j represents the jth source end position, and k is more than or equal to 1 and less than or equal to M; according to the formula { c } ₁ ,...,c _L }＝f({aw _i,j }) calculate L confidence levels { c } ₁ ,...,c _L }。

Wherein the function f may be a simple function. For example, the function f is a sum function, an average function, or the like. The function f may also be a complex function. For example, the function f may be a neural network or the like.

In 1120, a heuristic for constraint k is computed based on the L confidence levels obtained in 1110.

Specifically, it can be according to formula h _k ＝g({c ₁ ,...,c _L -j) calculating a heuristic for constraint k, where h _k Heuristic representing constraint k. Heuristic signal h _k There may be two values that respectively indicate whether the constraint k is used when expanding the current candidate translation. For example, the two values of the heuristic are 1 and 0, when the heuristic takes the value 1, it indicates that constraint k is used when the current candidate translation is extended, and when the heuristic takes the value 0, it indicates that constraint k is not used when the current candidate translation is extended.

Wherein the function g may be a simple function. E.g. summing, averaging, etc., and then comparing with a preset threshold, returning to 1 if greater than the preset threshold, otherwise returning to 0. The function g may also be a complex function. For example, a neural network outputting 1 or 0, etc.

At 1130, a set of preset constraints is filtered or selected based on the heuristics for each constraint obtained at 1120, and the state of the candidate translations.

Specifically, let s be _k Representing the state of candidate translations, when s _k =0 means that the current candidate translation is in the kth constraint, s _k =1 indicates that the current candidate translation is not in the kth constraint, the new constraint set may be expressed as:

{k|s _k =0 or h _k ＝1}

That is, the kth constraint is a target constraint for expanding the candidate translations at the current time, and is added to the new constraint set if the kth constraint satisfies either of the following two conditions:

condition 1: the current candidate translation is in this constraint;

condition 2: the heuristic for this constraint is 1.

In the technical scheme, before expanding the candidate translations, the constraints in the preset constraint set are selected or filtered, and the constraint with lower correlation degree with the current decoder state can be ignored, so that the translation speed is increased while the quality of the candidate translations is not influenced.

When a target constraint for expanding the candidate translation is selected, the candidate translation is expanded according to the target constraint at 1030. Or when the target constraint for expanding the candidate translation is not selected, expanding the candidate translation according to a preset candidate word set.

Optionally, when the target constraint for expanding the candidate translation is selected, the candidate translation may be expanded only according to the target constraint, or the candidate translation may be expanded according to the target constraint and the preset candidate word set.

For example, the candidate translation at the current time is "I reduced from", and the constraint set is { [4]: hefei University of Technology, [6]: sushma Swaraj, based on the attention weight and the status of the candidate translations, selects a target constraint [4] for expanding the candidate translations: hefei University of Technology, according to the preset candidate word set and target constraint [4]: hefei University of Technology extends the candidate translation "I-reduced from" without using constraint [6]: sushman Swaraj expands the candidate translation "I-gradated from".

For another example, the candidate translation at the current time is "I reduced", and the constraint set is { [4]: hefei University of Technology, [6]: sushman Swaraj, according to the attention weight and the state of the candidate translation, the target constraint for expanding the candidate translation is not selected, and then the candidate translation "I gradated" is expanded only according to the preset candidate word set.

It should be understood that the source text, the candidate translations, and the translations may all belong to a natural language, which generally refers to a language that naturally evolves with culture. Optionally, the source text belongs to a first natural language, the candidate translation and the translation belong to a second natural language, and the first and second languages are different kinds of natural languages. The source text belonging to the first natural language may refer to that the source text is a text expressed in the first natural language, the candidate translation and the translation belong to the second natural language, and the candidate translation and the translation are a text expressed in the second natural language. The source text and the candidate translation may belong to any two different types of natural language.

The text translation method of the present application is described in more detail below in conjunction with specific examples.

Example 1

The source text is "I'm graduation in the" worker's large ", including" I "," graduation "," in "," worker's large "4 source language words.

The constraint set is "{ [4]: hefei University of Technology } ", only one constraint is included that corresponds to the correct translation of the 4 th source language word" composite large ".

The generation of a correct translation "I graduated from Hefei University of Technology" may be as follows.

1) When generating target words corresponding to the first three source language words, the heuristic signals of the constraint are all 0, which means that constraint expansion is not used.

For example, the input candidate translation "I reduced" is expanded, attention weights are obtained, the attention weights corresponding to the 4 source end positions are respectively [0.01,0.01,0.97,0.01], the confidence coefficient of the obtained constraint [4] is 0.01 and is lower than a preset threshold value of 0.5, a heuristic signal 0 is generated, the "I reduced" is expanded according to a preset candidate word set, and a new candidate translation "I reduced from" is obtained after selection.

2) And expanding the input candidate translation 'I-reduced from'.

Attention weights are acquired, the attention weights corresponding to the 4 source end positions are [0.01,0.01,0.01,0.97] respectively, confidence coefficient of constraint [4] is 0.97 and is higher than a preset threshold value of 0.5, heuristic signals 1 are generated, candidate translations "I reduced from" are expanded according to a preset candidate word set and a first qualifier "Hefei" in constraint [4], and "I graduated from Hefei" is selected.

3) The input candidate translation "I graduated from Hefei" is expanded.

Attention weights are acquired, the attention weights corresponding to the 4 source end positions are [0.25,0.25,0.25,0.25] respectively, the confidence coefficient of the constraint [4] is 0.25 and is lower than a preset threshold value of 0.5, and a heuristic signal 0 is generated. However, because the current candidate translation is in the constraint, the candidate translation "I graduated from Hefei" is still expanded according to the second qualifier "University" in constraint [4], and "I graduated from Hefei University" is obtained after selection.

4) Continuing the expansion, because the candidate translation "I graduated from Hefei University" is in the constraint, the third qualifier "of" and the fourth qualifier "Technology" in the constraint [4] are used in sequence to expand the candidate translation, resulting in "I graduated from Hefei University of Technology".

5) Decoding is terminated, and a translation result "I graduated from Hefei University of Technology" is output.

Example 2

1) When the first three target words are generated, the heuristic signals of the constraint are all 0, which means that constraint expansion is not used.

2) And expanding the input candidate translation 'I-reduced from'.

Attention weights are acquired, the attention weights corresponding to the 4 source end positions are [0.01,0.01,0.01,0.97] respectively, confidence coefficient of constraint [4] is 0.97 and is higher than a preset threshold value of 0.5, heuristic signals 1 are generated, candidate translations "I reduced from" are expanded according to a preset candidate word set and all limiting words of constraint [4], and "I graduated from Hefei University of Technology" is selected.

3) Decoding is terminated, and a translation result "I graduated from Hefei University of Technology" is output.

According to the technical scheme, when the candidate translations are expanded, the candidate translations are expanded according to the target constraints obtained after selection or filtering, so that all constraints can be avoided being used when the candidate translations are expanded each time, and when the target constraints are not selected, the candidate translations are expanded only according to the preset candidate word sets, and the situation that the constraints are still used when the constraints are not needed can be avoided. Therefore, the technical scheme of the method and the device can accelerate expansion of candidate translations, so that translation speed is further improved.

Having described embodiments of the method of the present application in detail above with reference to the accompanying drawings, embodiments of the apparatus of the present application will be described below with reference to fig. 12 to 14, and it should be understood that each of the apparatuses described in fig. 12 to 14 is capable of performing each step of the text translation method of the embodiment of the present application, and duplicate descriptions will be omitted appropriately when describing the embodiments of the apparatus of the present application.

Fig. 12 is a schematic structural diagram of a text translation apparatus provided in an embodiment of the present application. The text translation apparatus 1200 may correspond to the data processing device shown in fig. 1 or the user device shown in fig. 2. The text translation apparatus 1200 may correspond to the execution device 210 shown in fig. 3 and the execution device 110 shown in fig. 4.

The apparatus 1200 may include an acquisition module 1210 and a processing module 1220. Wherein, each module included in the apparatus 1200 may be implemented in a software and/or hardware manner.

Alternatively, the acquisition module 1210 may be a communication interface, or the acquisition module 1210 and the processing module 1220 may be the same module.

In this application, apparatus 1200 may be used to perform steps in the method depicted in fig. 10.

For example:

an obtaining module 1210, configured to obtain a candidate translation corresponding to the source text;

a processing module 1220, configured to select, according to the attention weight calculated by the text translation model, a constraint in a preset constraint set, where the constraint characterizes a correct translation of at least a part of the source text;

the processing module 1220 is further configured to, when a target constraint for expanding the candidate translation is selected, expand the candidate translation according to the target constraint; or when the target constraint for expanding the candidate translation is not selected, expanding the candidate translation according to a preset candidate word set.

Optionally, the processing module 1220 is specifically configured to obtain, from the text translation model, attention weights corresponding to each constraint according to a source end position corresponding to each constraint in the preset constraint set, where the source end position is a position of a term corresponding to each constraint in the source text; and selecting the constraint in the preset constraint set according to the attention weight corresponding to each constraint.

Optionally, the processing module 1220 is specifically configured to process the attention weight corresponding to each constraint to obtain a heuristic signal of each constraint, where the heuristic signal is used to indicate whether to use a constraint corresponding to the heuristic signal when expanding the candidate translation; and selecting the constraint in the preset constraint set according to the heuristic signal corresponding to each constraint.

Optionally, the processing module 1220 is specifically configured to select a target attention weight that meets a preset requirement from the attention weights; selecting a constraint in the preset constraint set according to the source end position corresponding to the target attention weight and the source end position corresponding to each constraint in the preset constraint set, wherein the source end position corresponding to the target attention weight is the position of a word corresponding to the target attention weight in the source text, and the source end position corresponding to each constraint is the position of the word corresponding to each constraint in the source text.

Optionally, the processing module 1220 is specifically configured to select a constraint in a preset constraint set according to the attention weight calculated by the text translation model and the state of the candidate translation, where the state of the candidate translation is in the constraint and the target phrase corresponding to the constraint in the preset constraint set when the candidate translation is obtained by using a partial word expansion of the target phrase;

The target constraint satisfies at least one of the following conditions:

the attention weight corresponding to the target constraint meets the preset requirement;

the states of the candidate translations are in a constraint.

It should be understood that the text translation device 1200 shown in fig. 12 is only an example, and the device of the embodiments of the present application may further include other modules or units.

The acquisition module 1210 may be implemented by a communication interface or a processor. The processing module 1220 may be implemented by a processor. Specific functions and advantages of the acquiring module 1210 and the processing module 1220 may be referred to in the related description of the method embodiments, and will not be described herein.

Fig. 13 is a schematic block diagram of a text translation apparatus according to another embodiment of the present application. The text translation apparatus 1300 may be equivalent to the data processing device shown in fig. 1 or the user device shown in fig. 2. The text translation apparatus 1300 may also correspond to the execution device 210 shown in fig. 3 and the execution device 110 shown in fig. 4.

As shown in fig. 13, the text translation device 1300 may include a memory 1310 and a processor 1320. Only one memory and processor is shown in fig. 13. In an actual text translation device product, there may be one or more processors and one or more memories. The memory may also be referred to as a storage medium or storage device, etc. The memory may be provided separately from the processor or may be integrated with the processor, which is not limited by the embodiments of the present application.

The memory 1310 and the processor 1320 communicate with each other via internal communication paths to transfer control and/or data signals.

Specifically, a memory 1310 for storing a program;

a processor 1320 for executing a program stored in the memory 1310, the processor 1320 being configured to, when the program stored in the memory 1310 is executed by the processor 1320:

obtaining candidate translations corresponding to the source text;

selecting a constraint in a preset constraint set according to the attention weight calculated by the text translation model, wherein the constraint characterizes the correct translation of at least part of the source text;

when a target constraint for expanding the candidate translation is selected, expanding the candidate translation according to the target constraint; or when the target constraint for expanding the candidate translation is not selected, expanding the candidate translation according to a preset candidate word set.

The text translation apparatus 1300 may further include an input/output interface 1330, and the text translation apparatus 1300 may be capable of acquiring a source text through the input/output interface 1330, specifically, may be capable of acquiring a source text from another device (for example, a terminal device) through the input/output interface 1330, and may be capable of finally obtaining a machine translation through processing of the processor 1320 after acquiring the source text. The text translation apparatus 1300 can transmit the machine translation to other devices through the input-output interface 1330.

It should be understood that the text translation device 1300 shown in fig. 13 is only an example, and the device of the embodiments of the present application may further include other modules or units.

The specific working process and beneficial effects of the text translation device 1300 may be referred to the related description in the method embodiment, and will not be described herein.

Fig. 14 is a schematic diagram of a machine translation system 1400 provided by an embodiment of the present application.

Wherein machine translation system 1400 may correspond to a data processing device as shown in fig. 1 or a user device as shown in fig. 2. The machine translation system 1400 may also correspond to the execution device 210 shown in fig. 3 and the execution device 110 shown in fig. 4.

As shown in fig. 14, the machine translation system 1400 may include a memory 1410 and a processor 1420. Only one memory and processor is shown in fig. 14. In an actual machine translation system product, there may be one or more processors and one or more memories. The memory may also be referred to as a storage medium or storage device, etc. The memory may be provided separately from the processor or may be integrated with the processor, which is not limited by the embodiments of the present application.

The memory 1410 and the processor 1420 communicate with each other via internal communication paths to transfer control and/or data signals.

Specifically, a memory 1410 for storing a program;

a processor 1420 for executing a program stored in the memory 1410, the processor 1420 being configured to, when the program stored in the memory 1410 is executed by the processor 1420:

obtaining candidate translations corresponding to the source text;

The machine translation system 1400 may further include an input/output interface 1430, and the text translation apparatus 1400 may obtain the source text through the input/output interface 1430, specifically, may obtain the source text from another device (for example, a terminal device) through the input/output interface 1430, and after obtaining the source text, may finally obtain the machine translation through the processing of the processor 1420. The machine translation system 1400 may be capable of transmitting machine translations to other devices via the input-output interface 1430.

It should be appreciated that the machine translation system 1400 shown in fig. 14 is merely an example, and that machine translation systems of embodiments of the present application may also include other modules or units.

The specific operation and advantages of the machine translation system 1400 may be found in the related description of the method embodiments and are not described herein.

It should be appreciated that the processor in embodiments of the present application may be a central processing unit (central processing unit, CPU), but may also be other general purpose processors, digital signal processors (digital signal processor, DSP), application specific integrated circuits (application specific integrated circuit, ASIC), off-the-shelf programmable gate arrays (field programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It should also be appreciated that the memory in embodiments of the present application may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. The volatile memory may be random access memory (random access memory, RAM) which acts as an external cache. By way of example but not limitation, many forms of random access memory (random access memory, RAM) are available, such as Static RAM (SRAM), dynamic Random Access Memory (DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate synchronous dynamic random access memory (DDR SDRAM), enhanced Synchronous Dynamic Random Access Memory (ESDRAM), synchronous Link DRAM (SLDRAM), and direct memory bus RAM (DR RAM).

The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any other combination. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer instructions or computer programs. When the computer instructions or computer program are loaded or executed on a computer, the processes or functions described in accordance with the embodiments of the present application are all or partially produced. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center by wired (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more sets of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. The semiconductor medium may be a solid state disk.

It should be understood that the term "and/or" is merely an association relationship describing the associated object, and means that three relationships may exist, for example, a and/or B may mean: there are three cases, a alone, a and B together, and B alone, wherein a, B may be singular or plural. In addition, the character "/" herein generally indicates that the associated object is an "or" relationship, but may also indicate an "and/or" relationship, and may be understood by referring to the context.

In the present application, "at least one" means one or more, and "a plurality" means two or more. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b, or c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or plural.

It should be understood that, in various embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present application.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.

In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of text translation, comprising:

obtaining candidate translations corresponding to the source text;

when a target constraint for expanding the candidate translation is selected, expanding the candidate translation according to the target constraint; or alternatively, the process may be performed,

and when the target constraint for expanding the candidate translation is not selected, expanding the candidate translation according to a preset candidate word set, wherein the candidate word set comprises words of a plurality of target languages, and the target language is the language to which the candidate translation belongs.

2. The method of claim 1, wherein selecting a constraint in a preset set of constraints according to the attention weight calculated by the text translation model comprises:

According to a source end position corresponding to each constraint in the preset constraint set, attention weights respectively corresponding to each constraint are obtained from the text translation model, wherein the source end position is a position of a word corresponding to each constraint in the source text;

and selecting the constraint in the preset constraint set according to the attention weight corresponding to each constraint.

3. The method of claim 2, wherein selecting a constraint in the preset set of constraints according to the attention weight corresponding to each constraint comprises:

processing the attention weight corresponding to each constraint to obtain a heuristic signal of each constraint, wherein the heuristic signal is used for indicating whether the constraint corresponding to the heuristic signal is used when expanding the candidate translation;

and selecting the constraint in the preset constraint set according to the heuristic signal corresponding to each constraint.

4. The method of claim 1, wherein selecting a constraint in a preset set of constraints according to the attention weight calculated by the text translation model comprises:

Selecting a target attention weight meeting preset requirements from the attention weights;

selecting a constraint in the preset constraint set according to the source end position corresponding to the target attention weight and the source end position corresponding to each constraint in the preset constraint set, wherein the source end position corresponding to the target attention weight is the position of a word corresponding to the target attention weight in the source text, and the source end position corresponding to each constraint is the position of the word corresponding to each constraint in the source text.

5. The method of claim 1, wherein selecting a constraint in a preset set of constraints according to the attention weight calculated by the text translation model comprises:

selecting constraints in a preset constraint set according to the attention weight calculated by a text translation model and the states of the candidate translations, wherein the states of the candidate translations are included in the constraints and are not in the constraints, and the states of the candidate translations are in the constraints when the candidate translations are obtained by using partial word expansion of target phrases, and the target phrases are target end phrases corresponding to the constraints in the preset constraint set;

The target constraint satisfies at least one of the following conditions:

the states of the candidate translations are in a constraint.

6. A text translation device, comprising:

a memory for storing a program;

a processor for executing the program stored in the memory, the processor being configured to, when the program stored in the memory is executed by the processor:

obtaining candidate translations corresponding to the source text;

7. The apparatus of claim 6, wherein the processor is specifically configured to:

8. The apparatus of claim 7, wherein the processor is specifically configured to:

9. The apparatus of claim 6, wherein the processor is specifically configured to:

10. The apparatus of claim 6, wherein the processor is specifically configured to:

the target constraint satisfies at least one of the following conditions:

the states of the candidate translations are in a constraint.

11. A computer readable storage medium, characterized in that the computer readable storage medium stores a program code comprising instructions for performing part or all of the steps of the method according to any of claims 1 to 5.