WO2023011125A1 - 一种同传翻译方法、装置、设备及存储介质 - Google Patents

一种同传翻译方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2023011125A1
WO2023011125A1 PCT/CN2022/105363 CN2022105363W WO2023011125A1 WO 2023011125 A1 WO2023011125 A1 WO 2023011125A1 CN 2022105363 W CN2022105363 W CN 2022105363W WO 2023011125 A1 WO2023011125 A1 WO 2023011125A1
Authority
WO
WIPO (PCT)
Prior art keywords
data unit
output
translation
node
simultaneous interpretation
Prior art date
Application number
PCT/CN2022/105363
Other languages
English (en)
French (fr)
Inventor
刘丹
李小喜
刘俊华
魏思
Original Assignee
科大讯飞股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 科大讯飞股份有限公司 filed Critical 科大讯飞股份有限公司
Publication of WO2023011125A1 publication Critical patent/WO2023011125A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation

Definitions

  • the present application relates to the technical field of translation, and in particular to a simultaneous interpretation method, device, equipment and storage medium.
  • Machine translation also known as automatic translation, is the process of using a computer to convert one natural language (source language) into another natural language (target language). Simultaneous translation (or simultaneous interpretation) means that the production of the target language starts at the same time as the sentence in the source language has not yet ended.
  • the current simultaneous interpretation translation scheme is mainly based on wait-k.
  • the general idea of the wait-k-based simultaneous interpretation translation scheme is to start translation from the input kth data unit (such as a character), that is, input the first When there are k data units, one data unit is output, when the k+1th data unit is input, the second data unit is output, and so on, that is, the input is fixedly delayed by k steps relative to the output.
  • kth data unit such as a character
  • this application provides a simultaneous interpretation method, device, equipment and storage medium to solve the problems existing in the wait-k-based simultaneous interpretation solution.
  • the technical solution is as follows:
  • a simultaneous translation method comprising:
  • the prediction of the data output location and the determination of the output data at the data output location are carried out in the direction of co-optimizing translation quality and translation delay.
  • processing the current input data unit and the currently obtained output data unit to obtain a processing result includes:
  • Predicting whether data output is performed at the position of the current input data unit according to the processing result, and determining and outputting the output data unit when the data output is predicted includes:
  • the context vector corresponding to the current input data unit and the output data prediction vector predict whether data output is performed at the position of the current input data unit, and when it is predicted that data output is performed, determine and output the output data unit.
  • determining the context vector corresponding to the current input data unit according to the encoding result of the current input data unit and the encoding result of the historical input data unit includes:
  • the context vector corresponding to the current input data unit is determined according to the encoding result of the current input data unit and the encoding result of the historical input data unit.
  • the simultaneous interpretation translation model is obtained through training with a training data unit sequence, and the training objective of the simultaneous interpretation translation model is to jointly optimize the translation quality and translation delay of the simultaneous interpretation translation model on the training data unit sequence.
  • the simultaneous interpretation translation model includes: an encoding module, an attention module, a vector prediction module, and an output position and output data prediction module;
  • the encoding module is configured to encode the current input data unit to obtain an encoding result of the current input data unit
  • the attention module is used to determine the weights corresponding to the current input data unit and the historical input data unit respectively, and determine the current input data according to the determined weight, the encoding result of the current input data unit and the encoding result of the historical input data unit
  • the vector prediction module is used to determine a vector for predicting the next output data unit as the output data prediction vector according to the currently obtained output data unit;
  • the output position and output data prediction module is used to predict whether data output is performed at the position of the current input data unit according to the context vector corresponding to the current input data unit and the output data prediction vector, and when it is predicted that the data output is performed , determine the output data unit and output.
  • the process of establishing the simultaneous interpretation model includes:
  • the prediction result corresponding to a data unit in the training data unit sequence includes: the probability of outputting each set data unit at the position of the data unit and the probability of not outputting;
  • the parameters of the simultaneous interpretation translation model are updated.
  • the translation quality of the simultaneous translation model Dimensions of prediction loss and prediction loss in the dimension of translation delay, including:
  • the process of determining the ideal output position of a data unit in the translation result corresponding to the training data unit sequence includes:
  • the length of the training data unit sequence determines the ideal output position of the data unit .
  • the determining the probability sum of all possible simultaneous transmission paths according to the prediction results corresponding to the data units in the training data unit sequence includes:
  • the probability of the path passing through the node is determined as the probability corresponding to the node
  • the sum of the probabilities of all possible simultaneous transmission paths is determined according to the probabilities respectively corresponding to all the nodes passed by the all possible simultaneous transmission paths.
  • all possible simultaneous translation paths are determined.
  • the delay expectation of the transmission path including:
  • the delay expectation and probability corresponding to each node passed by the all possible simultaneous transmission paths, and the sum of the probabilities of the all possible simultaneous transmission paths determine the delay expectations of all possible simultaneous transmission paths, wherein, The probability corresponding to a node is determined according to the probability sum of the forward path passing through the node and the probability sum of the backward path passing through the node.
  • the ideal output position of each data unit in the translation result corresponding to the training data unit sequence and the actual output position of each data unit on the simultaneous transmission path passing through the node determine the Latency expectations for all simultaneous interpretation paths, including:
  • Delay expectations of all paths passing through the node are determined according to delay expectations of all forward paths passing through the node and delay expectations of all backward paths passing through the node.
  • the delay loss corresponding to the path is determined, including:
  • the deviation of the actual output position of the data unit output at the node relative to the corresponding ideal output position is taken as the delay loss corresponding to the node;
  • a simultaneous interpretation device comprising: a data processing module and a data prediction module;
  • the data processing module is configured to process the current input data unit and the currently obtained output data unit to obtain a processing result
  • the data prediction module is used to predict whether data output is performed at the position of the current input data unit according to the processing result, and when the data output is predicted to be performed, determine and output the output data unit;
  • the prediction of the data output position and the determination of the output data at the data output position are carried out in the direction of co-optimizing the translation quality and translation delay.
  • the data processing module and the data prediction module are realized by a simultaneous translation model
  • the simultaneous interpretation translation model is obtained through training with a training data unit sequence, and the training objective of the simultaneous interpretation translation model is to jointly optimize the translation quality and translation delay of the simultaneous interpretation translation model on the training data unit sequence.
  • a simultaneous interpretation device comprising: a memory and a processor
  • the memory is used to store programs
  • the processor is configured to execute the program to realize each step of the simultaneous interpretation method described in any one of the above.
  • a readable storage medium, on which a computer program is stored is characterized in that, when the computer program is executed by a processor, each step of the simultaneous interpretation method described in any one of the above items is realized.
  • the simultaneous interpretation method, device, equipment and storage medium can process the current input data unit and the currently obtained output data unit to obtain the processing result, and can predict the current Whether data output is performed at the position of the input data unit, and when the data output is predicted to be performed, the output data unit is determined and output, the prediction of the output position and the determination of the output data at the output position in this application make translation quality and translation delay common Optimization is performed for directions.
  • the simultaneous interpretation translation method provided by this application can realize the dynamic prediction of translation delay, and because the simultaneous interpretation translation method provided by this application can predict the data output position and output data in the direction of co-optimizing translation quality and translation delay, therefore, It can predict more appropriate translation delay and better quality translation results.
  • Fig. 1 is a schematic flow chart of the simultaneous translation method provided by the embodiment of the present application.
  • Fig. 2 is a schematic structural diagram of the simultaneous interpretation translation model provided by the embodiment of the present application.
  • Fig. 3 is a schematic diagram of two simultaneous interpretation paths from a data unit sequence to its corresponding translation result provided by the embodiment of the present application;
  • Fig. 4 is a schematic flow chart of establishing a simultaneous translation model provided by the embodiment of the present application.
  • Fig. 5 is an example of the RNN-based simultaneous interpretation model provided by the embodiment of the present application.
  • Fig. 6 is an example of the simultaneous interpretation translation model based on Transformer provided by the embodiment of the present application.
  • Fig. 7 is a schematic structural diagram of the simultaneous translation model provided by the embodiment of the present application.
  • Fig. 8 is a schematic structural diagram of a simultaneous interpretation device provided by an embodiment of the present application.
  • the existing wait-k-based simultaneous interpretation scheme is a simultaneous interpretation scheme based on a fixed strategy (where the output data is fixed), and the simultaneous interpretation scheme based on a fixed strategy is prone to delay deficiency (delay deficiency lead to poor translation quality) or excessive delay (there is a waste of delay), in view of this, the inventor of this case thought that a simultaneous interpretation translation scheme based on dynamic strategies could be used, and conducted in-depth research on the basis of this idea, through Continuous research finally came up with a better simultaneous interpretation method, which can dynamically predict the output position and output data at the predicted output position.
  • the simultaneous interpretation translation method provided by this application can be applied to terminals with data processing capabilities.
  • the terminal performs simultaneous interpretation on input data according to the simultaneous interpretation translation method provided by this application.
  • the terminal can include processing components, memory, input/output interfaces and A power supply component.
  • the terminal may also include a multimedia component, an audio component, a sensor component, a communication component, and the like. in:
  • the processing component is used for data processing, and it can perform speech synthesis processing in this case.
  • the processing component may include one or more processors, and the processing component may also include one or more modules to facilitate interaction with other components.
  • the memory is configured to store various types of data, and the memory can be implemented with any type of volatile or non-volatile memory device or a combination of them, such as static random access memory (SRAM), electrically erasable programmable memory One of read memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk, optical disk, etc. or Various combinations.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable memory
  • EPROM erasable programmable read-only memory
  • PROM programmable read-only memory
  • ROM read-only memory
  • magnetic memory flash memory
  • flash memory magnetic disk
  • optical disk etc.
  • the power supply component provides power for various components of the terminal, and the power supply component may include a power management system, one or more power supplies, and the like.
  • the multimedia component can include a screen, preferably, the screen can be a touch display, and the touch display can receive input signals from the user.
  • the multimedia component may also include a front camera and/or a rear camera.
  • the audio component is configured to output and/or input audio signals
  • the audio component may include a microphone configured to receive an external audio signal
  • the audio component may further include a speaker configured to output an audio signal
  • the voice synthesized by the terminal may pass through Speaker output.
  • the input/output interface is the interface between the processing component and the peripheral interface module.
  • the peripheral interface module can be a keyboard, a button, etc., wherein the button can include but not limited to a home button, a volume button, a start button, a lock button, and the like.
  • the sensor component may include one or more sensors for providing status assessment of various aspects of the terminal, for example, the sensor component may detect the open/closed state of the terminal, whether the user is in contact with the terminal, the orientation, speed, temperature, etc. of the device.
  • the sensor component may include, but is not limited to, one or a combination of image sensors, acceleration sensors, gyroscope sensors, pressure sensors, temperature sensors, and the like.
  • the communication component is configured to facilitate wired or wireless communication between the terminal and other devices.
  • the terminal can access wireless networks based on communication standards, such as one or a combination of WiFi, 2G, 3G, 4G, and 5G.
  • the terminal can be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (ASP), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs) ), a controller, a microcontroller, a microprocessor or other electronic components for implementing the simultaneous interpretation method provided in this application.
  • ASICs Application Specific Integrated Circuits
  • ASP Digital Signal Processor
  • DSPDs Digital Signal Processing Devices
  • PLDs Programmable Logic Devices
  • FPGAs Field Programmable Gate Arrays
  • the simultaneous interpretation translation method provided by this application can also be applied to the server, and the server performs simultaneous interpretation on the input data according to the simultaneous interpretation translation method provided by this application.
  • the server may include one or more than one central processing unit and memory, wherein the memory is configured to store various types of data, and the memory may be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read Only Memory (EEPROM), Erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), One or more combinations of magnetic memory, flash memory, magnetic disk, optical disk, etc.
  • the server may also include one or more power supplies, one or more wired network interfaces and/or one or more wireless network interfaces, one or more operating systems.
  • Figure 1 shows a schematic flow chart of the simultaneous interpretation method provided by the present application, which may include:
  • Step S101 Process the current input data unit and the currently obtained output data unit to obtain a processing result.
  • the simultaneous translation method provided in this application can be used to realize text-to-text simultaneous translation, and can also be used to realize voice-to-text simultaneous interpretation, and can also realize voice-to-voice simultaneous translation.
  • the input data unit can be a character, that is, this application deals with character sequences.
  • the simultaneous translation method provided by this application realizes For speech-to-text simultaneous interpretation or speech-to-speech simultaneous translation, the input data unit may be the acoustic feature of a frame of speech, that is, the acoustic feature sequence processed in this application.
  • the process of processing the current input data unit and the currently obtained output data unit to obtain a processing result may include:
  • Step S1011 each time an input data unit is obtained, encode the current input data unit to obtain an encoding result of the current input data unit.
  • Step S1012 if an output decision needs to be made at the position of the current input data unit, determine the context vector corresponding to the current input data unit according to the encoding result of the current input data unit and the encoding result of the historical input data unit.
  • the decision-making step size D can be preset (D is greater than or equal to 2), and the position at which to make an output decision is determined according to the preset decision-making step size D. If the decision-making step is determined according to the preset decision-making step size If an output decision needs to be made at the position of the current input data unit, the context vector corresponding to the current input data unit is determined according to the encoding result of the current input data unit and the encoding result of the historical input data unit.
  • the decision-making step mentioned above can be set according to the specific application scenario. For example, if the simultaneous interpretation translation method provided by this application realizes the speech-text simultaneous translation, the decision-making step D can be set to 32, that is A decision is made every 32 frames. If the simultaneous translation method provided by this application implements text-to-text simultaneous translation, the decision step size D can be set to 4. It should be noted that, this embodiment does not limit the decision step size to 4, 32, and 4, 32 is only an example.
  • Step S1013. Determine the vector used to predict the next output data unit according to the currently obtained output data unit as the output data prediction vector.
  • steps S1011 to S1012 process input data units
  • step S1013 processes output data units.
  • steps S1011 to S1012 and S1013 are two independent data processing processes. Therefore, this embodiment The execution order of the processing process of the input data unit and the output process of the output data unit is not limited.
  • Step S102 According to the processing result, it is predicted whether data output is performed at the position of the current input data unit, and when it is predicted that data output is performed, the output data unit is determined and output.
  • the process of determining the output data unit and outputting the point may include: according to the context vector corresponding to the current input data unit and The output data prediction vector is used to determine whether data output is performed at the position of the current input data unit, and when the data output is determined to be performed, the output data unit is determined and output.
  • the probability of outputting each set data unit at the position of the current input data and the probability of not outputting can be determined according to the context vector corresponding to the current input data and the output data prediction vector, and according to the determined probability, it can be determined in the current input data Whether data output is required at the position of the data, and if output is required, the output data unit is determined and output according to the determined probability.
  • a dictionary can be pre-built, and the dictionary can include multiple entries, and each entry is a data unit.
  • the dictionary can include multiple entries, and each entry is a data unit.
  • the probability of each entry in the output dictionary at the position of the current input data, and the probability of outputting "blank" ("blank" means that the output is empty, that is, no output) assuming that the dictionary includes N entries, you will eventually get N+1 probabilities, and then it can be determined according to N+1 probabilities whether data output is required at the position of the current input data.
  • the output probabilities of outputting "blank” are greater than the output probabilities of each entry in the output dictionary, then determine No data is output at the position of the current input data, and then an output decision is made on the next input data unit to be decided. If the output probability of some or all entries in the dictionary is greater than the output probability of "blank", the position of the current input data is determined Output data at the place, and output the entry with the highest output probability.
  • the prediction of the data output position and the determination of the output data at the data output position It is carried out in the direction of co-optimizing translation quality and translation delay. It can be understood that translation delay and translation quality are a pair of contradictions. If the translation delay decreases, the translation quality will decrease, and if the translation delay increases, the translation quality will improve. In this embodiment, the translation quality and translation delay should be jointly optimized. The data output location and the determination of the output data at the data output location can make translation delay and translation quality relatively optimal.
  • step S101 and step S102 are to process the current input data unit and the currently obtained output data unit, predict whether data output is performed at the position of the current input data unit according to the processing result, and When the data output is predicted, the process of determining and outputting the output data unit can be realized based on the pre-established simultaneous interpretation model.
  • the simultaneous interpretation translation model can dynamically predict whether to output data for the input data unit.
  • the simultaneous interpretation translation model processes the input data unit and the obtained output data unit separately, and then according to the processing results of the input data unit and The processing result of the output data unit determines the output data unit.
  • an attention mechanism is introduced, so that more important data can be paid attention to when determining the output data unit, so that better quality translation results can be obtained.
  • FIG. 2 shows a schematic structural diagram of a simultaneous interpretation model, which may include: an encoding module 201 , an attention module 202 , a vector prediction module 203 , and an output position and output data prediction module 204 . in:
  • the encoding module 201 is configured to encode the current input data unit to obtain an encoding result of the current input data unit.
  • the encoding result of the current input data unit is a vector capable of representing the current input data unit.
  • xi in FIG. 2 represents the i-th input data unit, and so on for other x
  • h i represents the encoding result of the i-th input data unit xi , and so on for other h.
  • the attention module 202 is configured to determine the context vector corresponding to the current input data unit according to the encoding result of the current input data unit and the encoding result of the historical input data unit when an output decision needs to be made at the position of the current input data unit. Specifically, determine the weights corresponding to the current input data unit and the historical input data unit, and determine the context vector corresponding to the current input data unit according to the determined weight, the encoding result of the current input data unit, and the encoding result of the historical input data unit , more specifically, the encoding result of the current input data unit and the encoding result of the historical input data unit are weighted and summed according to corresponding weights, so as to obtain the context vector corresponding to the current input data unit.
  • the introduction of the attention module 202 in this application enables the simultaneous interpretation translation model to solve the ordering problem, thereby improving the translation quality.
  • the input data units are A1 and A2
  • the translation result of A1 is assumed to be B1
  • the translation result of A2 is B2
  • the introduction of the attention module makes it possible to output B2 first and then output B1, instead of necessarily outputting B1 first and then outputting B2.
  • the vector prediction module 203 is configured to determine a vector for predicting a next output data unit as an output data prediction vector according to the currently obtained output data unit. Wherein, the output data prediction vector contains useful information for predicting the next output data.
  • Figure 2 is a vector used to predict the jth output data unit, which is determined according to the existing output data units y 1 ⁇ y j-1 .
  • the output position and output data prediction module 204 is used to determine whether to perform data output at the position of the current input data unit according to the context vector and output data prediction vector corresponding to the current input data unit, and determine the output data when determining to perform data output unit and output. Pr in the figure indicates that the output position and output data prediction module 204 determines the probability of outputting each data unit set at the position of the current input data and not outputting according to the context vector and the output data prediction vector corresponding to the current input data.
  • the simultaneous interpretation translation model in this embodiment is obtained by training the training data unit sequence, and the training objective of the simultaneous interpretation translation model is to jointly optimize the translation quality and translation delay of the simultaneous interpretation translation model on the training data unit sequence.
  • the training objective of the simultaneous interpretation translation model is to jointly optimize the translation quality and translation delay of the simultaneous interpretation translation model on the training data unit sequence.
  • the translation quality and translation delay of all possible simultaneous interpretation paths from the training data unit sequence to the corresponding translation results can be jointly optimized.
  • the simultaneous interpretation translation model cannot determine which simultaneous interpretation path is the optimal path at first, optimize the translation quality and translation delay of all possible simultaneous interpretation paths from the training data unit sequence to the corresponding translation result .
  • the probability of the optimal path being selected by the simultaneous interpretation model will increase, while the probability of the non-optimal path being selected by the simultaneous interpretation model will increase.
  • the simultaneous interpretation translation model can have the ability to select the optimal path from all possible simultaneous interpretation paths. It should be noted that the ability of the simultaneous interpretation model to select the optimal path refers to the ability to output appropriate data (corresponding to translation quality) at an appropriate output position (corresponding to translation delay).
  • This embodiment illustrates the simultaneous interpretation path in conjunction with Figure 3: a data unit sequence ⁇ x 1 , x 2 , x 3 , x 4 , x 5 , x 6 ⁇ to its corresponding translation result (the translation result is also a sequence)
  • ⁇ y 1 ,y 2 ,y 3 ,y 4 ,y 5 ,y 6 ⁇ there are multiple simultaneous interpretation paths in ⁇ y 1 ,y 2 ,y 3 ,y 4 ,y 5 ,y 6 ⁇
  • Figure 3 shows a schematic diagram of two of them, and each simultaneous interpretation path can represent the translation result
  • each output data unit in the output for example, for path 1, the first output data unit y 1 is output at x 2 , the second output data unit y 2 is also output at x 2 , and the third Output data unit y3 is output at x4 , ..., for path 2, the first output data unit y1 is output at x1 , the second output data unit is output at x2 , ....
  • the simultaneous interpretation translation method provided by the embodiment of the present application can be realized based on the simultaneous interpretation model, and the simultaneous interpretation model is in the process of training to simultaneously translate all possible simultaneous interpretation paths from the training data unit sequence to the corresponding translation result
  • the goal is to optimize the quality and translation delay. Therefore, the trained simultaneous interpretation model has the ability to predict the appropriate translation delay and output translation results with better quality.
  • the input data is Simultaneous interpretation can not only obtain a suitable delay, but also obtain better quality translation results, that is, the simultaneous translation model based on training can generally achieve a trade-off between delay and translation quality.
  • the simultaneous interpretation translation method provided by the embodiment of the present application aims to jointly optimize the translation quality and translation delay.
  • By processing the current input data unit and the currently obtained output data unit it is predicted whether the position of the current input data unit is Data output is performed, and when the data output is predicted, the output data unit is determined and output.
  • the simultaneous interpretation translation method provided by the embodiment of this application can realize the dynamic prediction of translation delay, and, because the simultaneous interpretation translation method provided by this application can optimize both translation quality and translation delay, the data output position and output data Prediction, therefore, it can predict more appropriate translation delay and better quality translation results.
  • the simultaneous interpretation translation process of steps S101 to S104 can be realized based on a pre-established simultaneous interpretation translation model.
  • This embodiment focuses on the introduction of the process of establishing a simultaneous interpretation translation model.
  • Figure 4 shows a schematic flow diagram of establishing a simultaneous interpretation translation model, which may include:
  • Step S401 Input the data units in the training data unit sequence into the simultaneous interpretation translation model one by one to obtain the prediction result corresponding to the data unit in the training data unit sequence and the translation result corresponding to the training data unit sequence.
  • the prediction result corresponding to a data unit in the training data unit sequence includes: the probability of outputting each set data unit at the position of the data unit and the probability of not outputting.
  • the simultaneous interpretation translation model For each data unit xi in the training data unit sequence x input to the simultaneous interpretation translation model, the simultaneous interpretation translation model encodes xi , if an output decision needs to be made at the position of xi , then according to the encoding result of xi and For the encoding results of x 1 ⁇ xi -1 , determine the context vector corresponding to xi , and determine the vector used to predict the next output data unit as the output data prediction vector according to the currently obtained output data unit, according to the context corresponding to xi The vector and output data prediction vector predicts the probability of outputting each data unit set at xi and the probability of not outputting it as the prediction result corresponding to xi .
  • the "translation result corresponding to the training data unit sequence" in step S401 is the translation result composed of all output data units output by the simultaneous interpretation translation model.
  • Step S402 According to the prediction results corresponding to the data units in the training data unit sequence and all possible simultaneous interpretation paths from the training data unit sequence to the corresponding translation results, determine the prediction loss of the simultaneous translation model in the dimension of translation quality and the Prediction loss along the dimension of translation delay.
  • this embodiment determines the prediction loss of the simultaneous translation model in the dimension of translation quality and the prediction loss of the dimension of translation delay.
  • step S402 may include:
  • Step S4021 according to the prediction result corresponding to the data unit in the training data unit sequence, determine the probability sum of all possible simultaneous interpretation paths from the training data unit sequence to the corresponding translation result.
  • the training data unit sequence is expressed as x
  • the translation result corresponding to the training data unit sequence x is expressed as y
  • all possible simultaneous interpretation paths from the training data unit sequence x to the corresponding translation result y are represented by H(x, y), one of which is the same as transmission path means that Then the probability sum of all possible simultaneous interpretation paths from the training data unit sequence x to the corresponding translation result y can be expressed as
  • step S4021 may include:
  • Step a1 for each node passed by all possible simultaneous interpretation paths from the training data unit sequence to the corresponding translation result:
  • Step a1-a according to the prediction results corresponding to the data units in the training data unit sequence, determine the probability sum of all forward paths passing through the node and the probability sum of all backward paths passing through the node.
  • the forward path through the node refers to the path from the start node to the node
  • the backward path through the node refers to the path from the node to the end node.
  • Step a1-b according to the probability sum of all forward paths passing through the node and the probability sum of all backward paths passing through the node, determine the probability of the path passing through the node as the probability corresponding to the node.
  • the training data unit sequence is inserted into all possible simultaneous transmission paths of the corresponding translation results, and the probability sum of all forward paths passing through the node (i, j) is expressed as ⁇ (i , j), express the probability sum of all backward paths passing through the node (i, j) as ⁇ (i, j), then by multiplying ⁇ (i, j) with ⁇ (i, j), we can get Get the probability of a path passing through this node.
  • the structure of the simultaneous interpretation translation model provided by this application (input data and historical output data are processed separately, and there is no coupling between the two) makes it possible to obtain the same semantic representation at the sink nodes of different paths, so that after passing through the sink nodes
  • the paths of can be merged, that is, for node (i, j), all forward paths passing through node (i, j) can be combined for calculation, and all backward paths passing through node (i, j) can be combined for calculation.
  • the probability corresponding to each node passed by all possible simultaneous interpretation paths from the training data unit sequence to the corresponding translation result can be obtained through the above step a1.
  • Step a2 Determine the sum of the probabilities of all possible simultaneous interpretation paths from the training data unit sequence to the corresponding translation results according to the probabilities corresponding to all the nodes passed by all the possible simultaneous interpretation paths from the training data unit sequence to the corresponding translation results.
  • x) Pr(y
  • the above formula expresses the sum of the probabilities corresponding to all the nodes that pass through all possible simultaneous interpretation paths from the training data unit sequence to the corresponding translation results, where 1 ⁇ m ⁇
  • Step S4022 according to the probability sum of all possible simultaneous interpretation paths from the training data unit sequence to the corresponding translation result, determine the prediction loss of the simultaneous translation model in the dimension of translation quality.
  • the prediction loss of the simultaneous interpretation model in the dimension of translation quality is the negative logarithmic likelihood of the marginal distribution of all possible simultaneous interpretation paths from the training data unit sequence to the corresponding translation result.
  • Step S4022 according to the ideal output position of each data unit in the translation result corresponding to the training data unit sequence and the actual output position of each data unit in each simultaneous transmission path, determine the expected delay of all possible simultaneous transmission paths, as Prediction loss of simultaneous translation models on the dimension of translation delay.
  • the ideal output position of a data unit in the translation result corresponding to the training data unit sequence is based on the length of the training data unit sequence, the length of the translation result corresponding to the training data unit sequence, and the translation result corresponding to the data unit in the training data unit sequence The position in is determined.
  • step S4022 may include:
  • Step b1 for each node passed by all possible simultaneous interpretation paths from the training data unit sequence to the corresponding translation result: according to the ideal output position of each data unit in the translation result corresponding to the training data unit sequence and the position of each data unit in The actual output position on the simultaneous transmission path passing through the node is determined as the delay expectation corresponding to the node.
  • the process of delaying expectations can include:
  • Step b11-a for each forward path passing through the node: according to the actual output position and the ideal output position of the data unit output on the forward path, determine the delay loss corresponding to the forward path.
  • Step b12-a according to the respective delay losses corresponding to all forward paths passing through the node, determine the expected delay of all forward paths passing through the node.
  • each forward path passing through the node determines the target probability corresponding to the forward path, multiply the target probability corresponding to the forward path by the delay loss corresponding to the forward path, and obtain the The multiplication result corresponding to the forward path, where the target probability corresponding to the forward path is the ratio of the probability of the forward path to the sum of the probabilities of all forward paths passing through the node; The multiplication results corresponding to the paths are summed to obtain the expected delay of all forward paths passing through this node.
  • Step b11-b for each backward path passing through the node: according to the actual output position and the ideal output position of the data unit output on the backward path, determine the delay loss corresponding to the backward path.
  • step b11-a and this step need to determine the delay loss corresponding to a path (forward path or backward path) according to the ideal output position and actual output position of the data unit output on the path, and the process is given next.
  • d(i, j) represents the delay of outputting the jth data unit at the position of the ith input data unit, expressed in the path
  • d(i, j) represents the delay of outputting the jth data unit at the position of the ith input data unit, expressed in the path
  • the i in represents the actual output location
  • the lower limit of 0 is set in the calculation formula of d(i, j) to avoid an overly aggressive strategy, that is, to avoid the impact of translation quality due to excessive output speed.
  • Step b12-b Determine the expected delays of all the backward paths passing through the node according to the respective delay losses corresponding to all the backward paths passing through the node.
  • each backward path passing through the node determine the target probability corresponding to the backward path, multiply the target probability corresponding to the backward path by the delay loss corresponding to the backward path, and obtain the The multiplication results corresponding to the backward paths, where the target probability corresponding to the backward path is the ratio of the probability of the backward path to the sum of the probabilities of all the backward paths passing through the node; The multiplication results corresponding to the backward paths are summed to obtain the expected delay of all the backward paths passing through this node.
  • Step b13-b Determine the expected delay of all paths passing through the node according to the expected delays of all forward paths passing through the node and the expected delays of all backward paths passing through the node.
  • the expected delay of all forward paths passing through node (i, j) is expressed as ⁇ lat (i, j), and all backward paths passing through node (i, j)
  • the expected delay of a path is expressed as ⁇ lat (i, j)
  • ⁇ lat (i, j) and ⁇ lat (i, j) can be summed to obtain the expected delay of all paths passing through node (i, j) c(i,j), namely:
  • step b1 The delay expectation corresponding to each node along all possible simultaneous interpretation paths from the training data unit sequence to the corresponding translation result can be obtained through step b1.
  • Step b2 according to the delay expectation and probability corresponding to each node passed by all possible simultaneous interpretation paths from the training data unit sequence to the corresponding translation result, and the probability of all possible simultaneous interpretation paths from the training data unit sequence to the corresponding translation result and, determine the expected delays of all possible simultaneous interpretation paths from the training data unit sequence to the corresponding translation result.
  • step b2 the probability corresponding to any node passed by all possible simultaneous interpretation paths from the training data unit sequence to the corresponding translation result is the sum of the probability of the forward path passing through the node and the The probability of the backward path and the probability of determination.
  • the delay expectation of all possible simultaneous interpretation paths from the training data unit sequence to the corresponding translation result It can be calculated by the following formula:
  • Step S403 Update the parameters of the simultaneous translation model according to the prediction loss of the simultaneous translation model in the dimension of translation quality and the prediction loss of the dimension of translation delay.
  • the total prediction loss of the simultaneous translation model Can be expressed as:
  • ⁇ latency is the weight of the prediction loss of the simultaneous translation model in the dimension of translation delay.
  • ⁇ latency can be set according to the specific application scenario. The different settings of ⁇ latency will result in different translation delays of the trained simultaneous translation model. .
  • the trained simultaneous interpretation translation model can not only translate and output with an appropriate delay, but also output Better quality translation results.
  • the present application preferably makes output decisions according to the preset decision step size D.
  • the computational complexity can be changed from O(
  • the output decision is made according to the preset decision step size D (that is, multi-step decision-making).
  • the simultaneous interpretation model makes output decisions according to the decision-making step size D in the training phase, then in the actual application stage after the training, the simultaneous interpretation model also makes output decisions according to the decision-making step size D.
  • weight ⁇ latency and decision step D are adjustable parameters. In practical applications, ⁇ latency and decision step D can be adjusted according to specific application scenarios to match the application requirements of specific scenarios.
  • the impact of weight ⁇ latency and decision step D on translation delay and translation quality is that if ⁇ latency increases (or D decreases), the simultaneous translation delay decreases, and the corresponding translation quality decreases. Conversely, ⁇ latency decreases ( or D increases), the simultaneous interpretation delay increases, and the corresponding translation quality increases.
  • the simultaneous interpretation translation model in the embodiment of the present application may be, but not limited to, an RNN-based simultaneous interpretation translation model or a Transformer-based simultaneous interpretation translation model.
  • FIG. 5 shows an example of an RNN-based simultaneous interpretation translation model.
  • Figure 6 shows an example of a Transformer-based simultaneous translation model. Whether it is an RNN-based simultaneous translation model or a Transformer-based simultaneous translation model, it generally consists of an encoding module, an attention module, and a vector prediction module and output position and output data prediction module.
  • the part on the right in Figure 6 is the part used to process the obtained output data unit, that is, the vector prediction module, which predicts the vector used to determine the next output data unit according to the obtained output data unit, as shown in Fig.
  • the middle part of 6 is the part used for immediate output prediction, that is, the output location and output data prediction module
  • the left part of Figure 6 is the part used for processing input data, namely the encoding module and the attention module.
  • the embodiment of the present application also provides a simultaneous interpretation translation device.
  • the simultaneous interpretation translation device provided in the embodiment of the present application is described below.
  • the simultaneous interpretation translation device described below and the simultaneous interpretation translation method described above can be referred to in correspondence.
  • FIG. 7 shows a schematic structural diagram of a simultaneous interpretation device provided by an embodiment of the present application, which may include: a data processing module 701 and a data prediction module 702 .
  • the data processing module 701 is configured to process the current input data unit and the currently obtained output data unit to obtain a processing result.
  • the data prediction module 702 is configured to predict whether data output will be performed at the position of the current input data unit according to the processing result, and determine and output the output data unit when the data output is predicted to be performed.
  • the prediction of the data output location and the determination of the output data at the data output location to jointly optimize the translation quality and the translation delay are carried out as prediction.
  • the data processing module 701 may include: an input data processing module and a historical output data processing module.
  • the input data processing module is used to encode the current input data unit to obtain the encoding result of the current input data unit, and when an output decision needs to be made at the position of the current input data unit, according to the encoding result of the current input data unit and the encoding result of the historical input data unit to determine the context vector corresponding to the current input data unit;
  • the historical output data processing module is configured to determine a vector for predicting a next output data unit as an output data prediction vector according to the currently obtained output data unit.
  • the data prediction module 702 is specifically configured to determine whether data output is performed at the position of the current input data unit according to the context vector corresponding to the current input data unit and the output data prediction vector, and determine the output data unit when determining to perform data output and output.
  • the input data processing module determines the context vector corresponding to the current input data unit according to the encoding result of the current input data unit and the encoding result of the historical input data unit, it is specifically used for:
  • the context vector corresponding to the current input data unit is determined according to the encoding result of the current input data unit and the encoding result of the historical input data unit.
  • the data processing module 701 and the data prediction module 702 are realized by a simultaneous interpretation model.
  • the simultaneous interpretation translation model is obtained through training with a training data unit sequence, and the training objective of the simultaneous interpretation translation model is to jointly optimize the translation quality and translation delay of the simultaneous interpretation translation model on the training data unit sequence.
  • the simultaneous interpretation translation model includes: an encoding module, an attention module, a vector prediction module, and an output position and output data prediction module;
  • the encoding module is configured to encode the current input data unit each time an input data unit is obtained, so as to obtain an encoding result of the current input data unit;
  • the attention module is used to determine the weights corresponding to the current input data unit and the historical input data unit respectively, and determine the current input data according to the determined weight, the encoding result of the current input data unit and the encoding result of the historical input data unit
  • the vector prediction module is used to determine a vector for predicting the next output data unit as the output data prediction vector according to the currently obtained output data unit;
  • the output position and output data prediction module is used to determine whether data output is performed at the position of the current input data unit according to the context vector corresponding to the current input data unit and the output data prediction vector, and when determining to perform data output, Determine the output data unit and output it.
  • the simultaneous interpretation device may also include: a model training module.
  • the model training module includes: a data acquisition module, a prediction loss determination module and a model parameter update module.
  • the data acquisition module is configured to input the data units in the training data unit sequence into the simultaneous translation model one by one, so as to obtain the prediction results corresponding to the data units in the training data unit sequence, and the training data unit sequence
  • the prediction loss determination module is used to determine the simultaneous interpretation model in the translation according to the prediction results corresponding to the data units in the training data unit sequence and all possible simultaneous interpretation paths from the training data unit sequence to the corresponding translation results. Prediction loss on the dimension of quality and prediction loss on the dimension of translation delay.
  • the model parameter update module is configured to update the parameters of the simultaneous translation model according to the prediction loss of the simultaneous translation model in the dimension of translation quality and the prediction loss in the dimension of translation delay.
  • the prediction loss determination module includes: a first prediction loss determination module and a second prediction loss determination module.
  • the first prediction loss determination module is configured to determine the probability sum of all possible simultaneous transmission paths according to the prediction results corresponding to the data units in the training data unit sequence, and determine the probability sum of all possible simultaneous transmission paths according to the Probabilities sum to determine the prediction loss of the simultaneous interpretation translation model on the dimension of translation quality.
  • the second prediction loss determination module is used to determine the ideal output position of each data unit in the translation result corresponding to the training data unit sequence and the actual output position of each data unit in each simultaneous transmission path.
  • the delay expectation of all possible simultaneous interpretation paths mentioned above is used as the prediction loss of the simultaneous interpretation translation model in the dimension of translation delay.
  • the simultaneous interpretation device provided by the present application may further include: an ideal output position determination module.
  • An ideal output position determination module configured to determine the ideal output position of each data unit in the translation result corresponding to the training data unit sequence
  • the ideal output position determination module determines the ideal output position of a data unit in the translation result corresponding to the training data unit sequence, it is specifically configured to: according to the length of the training data unit sequence, the training data unit sequence The length of the corresponding translation result and the position of the data unit in the translation result corresponding to the training data unit sequence are determined.
  • the first prediction loss determination module determines the probability sum of all possible simultaneous transmission paths according to the prediction results corresponding to the data units in the training data unit sequence, it is specifically used to:
  • the probability of the path passing through the node is determined as the probability corresponding to the node
  • the sum of the probabilities of all possible simultaneous transmission paths is determined according to the probabilities respectively corresponding to all the nodes passed by the all possible simultaneous transmission paths.
  • the second prediction loss determining module is based on the ideal output position of each data unit in the translation result corresponding to the training data unit sequence and the actual output position of each data unit in each simultaneous transmission path, When determining the expected delay of all possible simultaneous interpretation paths, it is specifically used for:
  • the delay expectation and probability corresponding to each node passed by the all possible simultaneous transmission paths, and the sum of the probabilities of the all possible simultaneous transmission paths determine the delay expectations of all possible simultaneous transmission paths, wherein, The probability corresponding to a node is determined according to the probability sum of the forward path passing through the node and the probability sum of the backward path passing through the node.
  • the second prediction loss determination module is based on the ideal output position of each data unit in the translation result corresponding to the training data unit sequence and the actual output of each data unit on the simultaneous transmission path passing through the node location, when determining the expected delay of all simultaneous transmission paths passing through this node, it is specifically used for:
  • Delay expectations of all paths passing through the node are determined according to delay expectations of all forward paths passing through the node and delay expectations of all backward paths passing through the node.
  • the second prediction loss determination module determines the delay loss corresponding to the path according to the ideal output position and the actual output position of the data unit output on the path, it is specifically used to:
  • the deviation of the actual output position of the data unit output at the node relative to the corresponding ideal output position is taken as the delay loss corresponding to the node;
  • the simultaneous interpretation translation device provided by the embodiment of the present application aims to jointly optimize translation quality and translation delay, and predict whether the position of the current input data unit is processed by processing the current input data unit and the currently obtained output data unit Data output is performed, and when the data output is predicted, the output data unit is determined and output. It can be seen that the simultaneous interpretation translation device provided by the embodiment of the present application can realize the dynamic prediction of translation delay, and, because the simultaneous interpretation translation device provided by the present application can optimize both translation quality and translation delay for the direction of data output position and output data Prediction, therefore, it can predict more appropriate translation delay and better quality translation results. .
  • the embodiment of the present application also provides a simultaneous interpretation device. Please refer to FIG. 8 , which shows a schematic structural diagram of the simultaneous interpretation device.
  • the simultaneous interpretation device may include: at least one processor 801 and at least one communication interface 802 , at least one memory 803 and at least one communication bus 804;
  • the number of processor 801, communication interface 802, memory 803, and communication bus 804 is at least one, and the processor 801, communication interface 802, and memory 803 complete mutual communication through the communication bus 804;
  • Processor 801 may be a central processing unit CPU, or a specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement the embodiments of the present invention
  • the memory 803 may include a high-speed RAM memory, and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory;
  • the memory stores a program
  • the processor can call the program stored in the memory, and the program is used for:
  • the prediction of the data output location and the determination of the output data at the data output location are carried out in the direction of co-optimizing translation quality and translation delay.
  • the embodiment of the present application also provides a readable storage medium, which can store a program suitable for execution by a processor, and the program is used for:
  • the prediction of the data output location and the determination of the output data at the data output location are carried out in the direction of co-optimizing translation quality and translation delay.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

一种同传翻译方法、装置、设备及存储介质,方法包括:对当前输入数据单元和当前已获得的输出数据单元进行处理,以获得处理结果,根据处理结果预测当前输入数据单元的位置处是否进行数据输出,以及在预测出进行数据输出时,确定输出数据单元并输出;其中,输出位置的预测以及输出位置处输出数据的确定以使翻译质量和翻译延迟共同优化为方向进行。所述同传翻译方法可实现翻译延迟的动态预测,并且,所述方法以使翻译质量和翻译延迟共同优化为方向对数据输出位置和输出数据进行预测,因此,其能够预测出较为合适的翻译延迟以及质量较佳的翻译结果。

Description

一种同传翻译方法、装置、设备及存储介质
本申请要求于2021年08月02日提交中国专利局、申请号为CN202110881817.4、发明名称为“一种同传翻译方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及翻译技术领域,尤其涉及一种同传翻译方法、装置、设备及存储介质。
背景技术
机器翻译,又称自动翻译,是利用计算机将一种自然语言(源语言)转换成另一种自然语言(目标语言)的过程。同传翻译(或同声传译),是指在源语言句子尚未结束时同时开始目标语言的产生。
目前研究较多的是文本-文本的同传翻译和语音-文本的同传翻译。其中,文本-文本的同传翻译往往作为更复杂的语音同传系统的子模块和实时语音识别共同工作,完成语音同传任务,而语音-文本的同传翻译直接端到端的完成源语言语音-目标语言文本的实时翻译过程,语音-文本的同传翻译直接降低了跨语言交流的时间成本,在一系列多语言沟通交流场景下发挥了重要作用,如国际会议、跨语言实时字幕生成等。
目前的同传翻译方案主要为基于wait-k的同传翻译方案,基于wait-k的同传翻译方案的大致思路是,从输入的第k个数据单位(比如字符)开始翻译,即输入第k个数据单元时,输出一个数据单元,输入第k+1个数据单元时,输出第2个数据单元,以此类推,即输入相对输出固定延迟k步。为了获得较好的同传翻译效果,通常需要设置较低的延迟,然而延迟偏低会导致翻译结果不忠实原文,即翻译质量不佳,为了获得较好的翻译质量,实用中必需采用相对较高的延迟,而过高的延迟对大多数翻译内容又不是必需的,导致延迟的浪费。
发明内容
有鉴于此,本申请提供了一种同传翻译方法、装置、设备及存储介质,用以解决基于wait-k的同传翻译方案所存在的问题,其技术方案如下:
一种同传翻译方法,包括:
对当前输入数据单元和当前已获得的输出数据单元进行处理,以获得处理 结果;
根据所述处理结果预测当前输入数据单元的位置处是否进行数据输出,以及在预测出进行数据输出时,确定输出数据单元并输出;
其中,数据输出位置的预测以及所述数据输出位置处输出数据的确定以使翻译质量和翻译延迟共同优化为方向进行。
可选的,所述对当前输入数据单元和当前已获得的输出数据单元进行处理,以获得处理结果,包括:
对当前输入数据单元进行编码,以获得当前输入数据单元的编码结果;
若需要在当前输入数据单元的位置处进行输出决策,则根据当前输入数据单元的编码结果和历史输入数据单元的编码结果,确定当前输入数据单元对应的上下文向量;
根据当前已获得的输出数据单元确定用于预测下一输出数据单元的向量,作为输出数据预测向量;
所述根据所述处理结果预测当前输入数据单元的位置处是否进行数据输出,以及在预测出进行数据输出时,确定输出数据单元并输出,包括:
根据所述当前输入数据单元对应的上下文向量和所述输出数据预测向量,预测当前输入数据单元的位置处是否进行数据输出,以及在预测出进行数据输出时,确定输出数据单元并输出。
可选的,所述若需要在当前输入数据单元的位置处进行输出决策,则根据当前输入数据单元的编码结果和历史输入数据单元的编码结果,确定当前输入数据单元对应的上下文向量,包括:
若根据预设的决策步长确定需要在当前输入数据单元的位置处进行输出决策,则根据当前输入数据单元的编码结果和历史输入数据单元的编码结果,确定当前输入数据单元对应的上下文向量。
可选的,所述对当前输入数据单元和当前已获得的输出数据单元进行处理,以获得处理结果,根据所述处理结果预测当前输入数据单元的位置处是否进行数据输出,以及在预测出进行数据输出时,确定输出数据单元并输出,包括:
利用预先建立的同传翻译模型,对当前输入数据单元和当前已获得的输出数据单元进行处理,并根据处理结果预测当前输入数据单元的位置处是否进行 数据输出,以及在预测出进行数据输出时,确定输出数据单元并输出;
其中,所述同传翻译模型采用训练数据单元序列训练得到,所述同传翻译模型的训练目标为,联合优化所述同传翻译模型在所述训练数据单元序列上的翻译质量和翻译延迟。
可选的,所述同传翻译模型包括:编码模块、注意力模块、向量预测模块和输出位置及输出数据预测模块;
所述编码模块,用于对当前输入数据单元进行编码,以获得当前输入数据单元的编码结果;
所述注意力模块,用于确定当前输入数据单元和历史输入数据单元分别对应的权重,并根据确定出的权重、当前输入数据单元的编码结果以及历史输入数据单元的编码结果,确定当前输入数据单元对应的上下文向量;
所述向量预测模块,用于根据当前已获得的输出数据单元确定用于预测下一输出数据单元的向量,作为输出数据预测向量;
所述输出位置及输出数据预测模块,用于根据当前输入数据单元对应的上下文向量和所述输出数据预测向量,预测当前输入数据单元的位置处是否进行数据输出,以及在预测出进行数据输出时,确定输出数据单元并输出。
可选的,建立所述同传翻译模型的过程包括:
将所述训练数据单元序列中的数据单元依次逐个输入同传翻译模型,以得到所述训练数据单元序列中数据单元对应的预测结果,以及所述训练数据单元序列对应的翻译结果;其中,所述训练数据单元序列中一数据单元对应的预测结果包括:在该数据单元的位置处输出设定的各数据单元的概率以及不进行输出的概率;
根据所述训练数据单元序列中数据单元对应的预测结果,以及所述训练数据单元序列到对应翻译结果的所有可能的同传路径,确定同传翻译模型在翻译质量这一维度上的预测损失以及在翻译延迟这一维度上的预测损失;
根据所述同传翻译模型在翻译质量这一维度上的预测损失以及在翻译延迟这一维度上的预测损失,更新同传翻译模型的参数。
可选的,所述根据所述训练数据单元序列中数据单元对应的预测结果,以及所述训练数据单元序列到对应翻译结果的所有可能的同传路径,确定同传翻译模型在翻译质量这一维度上的预测损失以及在翻译延迟这一维度上的预测 损失,包括:
根据所述训练数据单元序列中数据单元对应的预测结果,确定所述所有可能的同传路径的概率和;
根据所述所有可能的同传路径的概率和,确定所述同传翻译模型在翻译质量这一维度上的预测损失;
根据所述训练数据单元序列对应的翻译结果中每个数据单元的理想输出位置以及每个数据单元在每条同传路径中的实际输出位置,确定所述所有可能的同传路径的延迟期望,作为同传翻译模型在翻译延迟这一维度上的预测损失。
可选的,确定所述训练数据单元序列对应的翻译结果中一数据单元的理想输出位置的过程包括:
根据所述训练数据单元序列的长度、所述训练数据单元序列对应的翻译结果的长度,以及该数据单元在所述训练数据单元序列对应的翻译结果中的位置,确定该数据单元的理想输出位置。
可选的,所述根据所述训练数据单元序列中数据单元对应的预测结果,确定所述所有可能的同传路径的概率和,包括:
针对所述所有可能的同传路径所经过的每个节点:
根据所述训练数据单元序列中数据单元对应的预测结果,确定经过该节点的所有前向路径的概率和以及经过该节点的所有后向路径的概率和;
根据经过该节点的所有前向路径的概率和以及经过该节点的所有后向路径的概率和,确定经过该节点的路径的概率,作为该节点对应的概率;
根据所述所有可能的同传路径所经过的所有节点分别对应的概率,确定所述所有可能的同传路径的概率和。
可选的,所述根据所述训练数据单元序列对应的翻译结果中每个数据单元的理想输出位置以及每个数据单元在每条同传路径中的实际输出位置,确定所述所有可能的同传路径的延迟期望,包括:
针对所述所有可能的同传路径所经过的每个节点:
根据所述训练数据单元序列对应的翻译结果中每个数据单元的理想输出位置以及每个数据单元在经过该节点的同传路径上的实际输出位置,确定经过该节点的所有同传路径的延迟期望,作为该节点对应的延迟期望;
根据所述所有可能的同传路径所经过的每个节点对应的延迟期望和概率,以及所述所有可能的同传路径的概率和,确定所述所有可能的同传路径的延迟期望,其中,一个节点对应的概率根据经过该节点的前向路径的概率和以及经过该节点的后向路径的概率和确定。
可选的,所述根据所述训练数据单元序列对应的翻译结果中每个数据单元的理想输出位置以及每个数据单元在经过该节点的同传路径上的实际输出位置,确定经过该节点的所有同传路径的延迟期望,包括:
针对经过该节点的每条前向路径:根据在该前向路径上输出的数据单元的实际输出位置和理想输出位置,确定该前向路径对应的延迟损失;
根据经过该节点的所有前向路径分别对应的延迟损失,确定经过该节点的所有前向路径的延迟期望;
针对经过该节点的每条后向路径:根据在该后向路径上输出的数据单元的实际输出位置和理想输出位置,确定该后向路径对应的延迟损失;
根据经过该节点的所有后向路径分别对应的延迟损失,确定经过该节点的所有后向路径的延迟期望;
根据经过该节点的所有前向路径的延迟期望和经过该节点的所有后向路径的延迟期望,确定经过该节点的所有路径的延迟期望。
可选的,根据一路径上输出的数据单元的理想输出位置和实际输出位置,确定该路径对应的延迟损失,包括:
针对该路径所经过的每个节点:
若该节点处有数据单元输出,则将该节点处输出的数据单元的实际输出位置相对于对应的理想输出位置的偏差作为该节点对应的延迟损失;
若该节点处无数据单元输出,则确定该节点对应的延迟损失为0。
一种同传翻译装置,包括:数据处理模块和数据预测模块;
所述数据处理模块,用于对当前输入数据单元和当前已获得的输出数据单元进行处理,以获得处理结果;
所述数据预测模块,用于根据所述处理结果预测当前输入数据单元的位置处是否进行数据输出,以及在预测出进行数据输出时,确定输出数据单元并输出;
其中,数据输出位置的预测以及所述数据输出位置处输出数据的确定以使 翻译质量和翻译延迟共同优化为方向进行。
可选的,所述数据处理模块和所述数据预测模块由同传翻译模型实现;
其中,所述同传翻译模型采用训练数据单元序列训练得到,所述同传翻译模型的训练目标为,联合优化所述同传翻译模型在所述训练数据单元序列上的翻译质量和翻译延迟。
一种同传翻译设备,包括:存储器和处理器;
所述存储器,用于存储程序;
所述处理器,用于执行所述程序,实现上述任一项所述的同传翻译方法的各个步骤。
一种可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时,实现上述任一项所述的同传翻译方法的各个步骤。
经由上述方案可知,本申请提供的同传翻译方法、装置、设备及存储介质,可对当前输入数据单元和当前已获得的输出数据单元进行处理,以获得处理结果,并可根据处理结果预测当前输入数据单元的位置处是否进行数据输出,以及在预测出进行数据输出时,确定输出数据单元并输出,本申请中输出位置的预测以及输出位置处输出数据的确定以使翻译质量和翻译延迟共同优化为方向进行。本申请提供的同传翻译方法可实现翻译延迟的动态预测,并且,由于本申请提供的同传翻译方法以使翻译质量和翻译延迟共同优化为方向对数据输出位置和输出数据进行预测,因此,其能够预测出较为合适的翻译延迟以及质量较佳的翻译结果。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。
图1为本申请实施例提供的同传翻译方法的流程示意图;
图2为本申请实施例提供的同传翻译模型的结构示意图;
图3为本申请实施例提供的由一数据单元序列到其对应的翻译结果的两条同传路径的示意图;
图4为本申请实施例提供的建立同传翻译模型的流程示意图;
图5为本申请实施例提供的基于RNN的同传翻译模型的一示例;
图6为本申请实施例提供的基于Transformer的同传翻译模型的一示例;
图7为本申请实施例提供的同传翻译模型的结构示意图;
图8为本申请实施例提供的同传翻译设备的结构示意图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
鉴于现有的基于wait-k的同传翻译方案为基于固定策略(在何处输出数据是固定的)的同传翻译方案,而基于固定策略的同传翻译方案很容易存在延迟不足(延迟不足导致翻译质量不佳)或延迟过大(存在延迟浪费)的问题,有鉴于此,本案发明人想到可采用基于动态策略的同传翻译方案,并在该想法的基础上进行了深入研究,通过不断研究最终提出了一种效果较好的同传翻译方法,该方法可动态预测输出位置并在预测的输出位置处输出数据。
本申请提供的同传翻译方法可应用于具有数据处理能力的终端,终端按本申请提供的同传翻译方法对输入数据进行同传翻译,该终端可以包括处理组件、存储器、输入/输出接口和电源组件,可选的,该终端还可以包括多媒体组件、音频组件、传感器组件和通信组件等。其中:
处理组件用于进行数据处理,其可以进行本案的语音合成处理,处理组件可以包括一个或多个处理器,处理组件还可以包括一个或多个模块,便于与其它组件之间的交互。
存储器被配置为存储各种类型的数据,存储器可以有任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM)、电可擦除可编程只读存储器(EEPROM)、可擦除可编程只读存储器(EPROM)、可编程只读存储器(PROM)、只读存储器(ROM)、磁存储器、快闪存储器、磁盘、光盘等中的一种或多种的组合。
电源组件为终端的各种组件提供电力,电源组件可以包括电源管理系统、一个或多个电源等。
多媒体组件可以包括屏幕,优选的,屏幕可以为触摸显示屏,触摸显示屏 可接收来自用户的输入信号。多媒体组件还可以包括前置摄像头和/或后置摄像头。
音频组件被配置为输出和/或输入音频信号,如音频组件可以包括麦克风,麦克风被配置为接收外部音频信号,音频组件还可以包括扬声器,扬声器被配置为输出音频信号,终端合成的语音可通过扬声器输出。
输入/输出接口为处理组件与外围接口模块之间的接口,外围接口模块可以为键盘、按钮等,其中,按钮可包括但不限定于主页按钮、音量按钮、启动按钮、锁定按钮等。
传感器组件可以包括一个或多个传感器,用于为终端提供各个方面的状态评估,例如,传感器组件可以检测终端的打开/关闭状态、用户与终端是否接触、装置的方位、速度、温度等。传感器组件可以包括但不限定于图像传感器、加速度传感器、陀螺仪传感器、压力传感器、温度传感器等中的一种或多种的组合。
通信组件被配置为便于终端和其它设备进行有线或无线通信。终端可接入基于通信标准的无线网络,如WiFi、2G、3G、4G、5G中的一种或多种的组合。
可选的,终端可被一个或多个应用专用集成电路(ASIC)、数字信号处理器(ASP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行本申请提供的同传翻译方法。
本申请提供的同传翻译方法还可应用于服务器,服务器按本申请提供的同传翻译方法对输入数据进行同传翻译。服务器可以包括一个或一个以上的中央处理器和存储器,其中,存储器被配置为存储各种类型的数据,存储器可以有任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM)、电可擦除可编程只读存储器(EEPROM)、可擦除可编程只读存储器(EPROM)、可编程只读存储器(PROM)、只读存储器(ROM)、磁存储器、快闪存储器、磁盘、光盘等中的一种或多种的组合。服务器还可以包括一个或一个以上电源、一个或一个以上有线网络接口和/或一个或一个以上无线网络接口、一个或一个以上操作系统。
接下来通过下述实施例对本申请提供的同传翻译方法进行介绍。
第一实施例
请参阅图1,示出了本申请提供的同传翻译方法的流程示意图,可以包括:
步骤S101:对当前输入数据单元和当前已获得的输出数据单元进行处理,以获得处理结果。
本申请提供的同传翻译方法可用于实现文本-文本的同传翻译,还可用于实现语音-文本的同传翻译,也可实现语音-语音的同传翻译。若本申请提供的同传翻译方法实现的是文本-文本的同传翻译,则输入数据单元可以为一个字符,即本申请处理的是字符序列,若本申请提供的同传翻译方法实现的是语音-文本的同传翻译或者语音-语音的同传翻译,则输入数据单元可以为一帧语音的声学特征,即本申请处理的声学特征序列。
具体的,对当前输入数据单元和当前已获得的输出数据单元进行处理,以获得处理结果的过程可以包括:
步骤S1011、在每获得一输入数据单元时,对当前输入数据单元进行编码,以获得当前输入数据单元的编码结果。
步骤S1012、若需要在当前输入数据单元的位置处进行输出决策,则根据当前输入数据单元的编码结果和历史输入数据单元的编码结果,确定当前输入数据单元对应的上下文向量。
在一种可能的实现方式中,可在每获得一个输入数据单元时,在获得的输入数据单元的位置处进行输出决策。在另一种可能的实现方式中,可预设决策步长D(D大于等于2),根据预设的决策步长D确定在何位置处进行输出决策,若根据预设的决策步长确定需要在当前输入数据单元的位置处进行输出决策,则根据当前输入数据单元的编码结果和历史输入数据单元的编码结果,确定当前输入数据单元对应的上下文向量。
上述提及的决策步长可根据具体的应用场景设定,比如,若本申请提供的同传翻译方法实现的是语音-文本的同传翻译,则可将决策步长D设置为32,即每隔32帧进行一次决策,若本申请提供的同传翻译方法实现的是文本-文本的同传翻译,则可将决策步长D设置为4。需要说明的是,本实施例并不限定决策步长为4、32,4、32仅为示例。
另外,需要说明的是,若当前输入数据单元的位置处不进行输出决策,则只对当前输入数据单元进行编码即可。
步骤S1013、根据当前已获得的输出数据单元确定用于预测下一输出数据 单元的向量,作为输出数据预测向量。
需要说明的是,步骤S1011~步骤S1012处理的是输入数据单元,而步骤S1013处理的是输出数据单元,步骤S1011~步骤S1012与步骤S1013是两个独立的数据处理过程,为此,本实施例并不对输入数据单元的处理过程与输出数据单元的输出过程进行执行顺序的限定。
步骤S102:根据处理结果预测当前输入数据单元的位置处是否进行数据输出,以及在预测出进行数据输出时,确定输出数据单元并输出。
具体的,根据处理结果预测当前输入数据单元的位置处是否进行数据输出,以及在预测出进行数据输出时,确定输出数据单元并输出点的过程可以包括:根据当前输入数据单元对应的上下文向量和输出数据预测向量,确定当前输入数据单元的位置处是否进行数据输出,以及在确定进行数据输出时,确定输出数据单元并输出。
进一步的,可根据当前输入数据对应的上下文向量和输出数据预测向量确定在当前输入数据的位置处输出设定的各数据单元的概率以及不进行输出的概率,根据确定出的概率确定在当前输入数据的位置处是否需要进行数据输出,若需要进行输出,根据确定出的概率确定输出数据单元并输出。
在一种可能的实现方式,可预先构建词典,词典中可以包括多个词条,每个词条为一个数据单元,则本实施例根据当前输入数据对应的上下文向量和输出数据预测向量确定在当前输入数据的位置处输出词典中各词条的概率,以及输出“blank”(“blank”表示输出为空,即不进行输出)的概率,假设词典中包括N个词条,则最终会获得N+1个概率,进而可根据N+1个概率确定在当前输入数据的位置处是否需要进行数据输出,比如输出“blank”的输出概率均大于输出词典中各词条的输出概率,则确定当前输入数据的位置处不输出数据,接着对下一待决策的输入数据单元进行输出决策,若词典中部分或所有词条的输出概率大于“blank”的输出概率,则确定当前输入数据的位置处输出数据,将输出概率最大的词条输出。
需要说明的是,为了能够按合适的翻译延迟进行翻译,同时能够获得质量较佳的翻译结果,本实施例提供的同传翻译方法中,数据输出位置的预测以及数据输出位置处输出数据的确定以使翻译质量和翻译延迟共同优化为方向进行。可以理解的是,翻译延迟与翻译质量是一对矛盾体,翻译延迟减小,则翻 译质量下降,翻译延迟增大,翻译质量提升,本实施例以使翻译质量和翻译延迟共同优化为方向进行数据输出位置以及数据输出位置处输出数据的确定,能够使得翻译延迟和翻译质量达到相对最优。
在一种可能的实现方式,上述步骤S101和步骤S102,即对当前输入数据单元和当前已获得的输出数据单元进行处理,根据处理结果预测当前输入数据单元的位置处是否进行数据输出,以及在预测出进行数据输出时,确定输出数据单元并输出的过程可基于预先建立的同传翻译模型实现。该同传翻译模型可针对输入数据单元动态预测是否进行数据输出,对于翻译过程,该同传翻译模型对输入数据单元和已获得的输出数据单元分别进行处理,进而根据输入数据单元的处理结果和输出数据单元的处理结果确定输出数据单元,在对输入数据单元进行处理时,引入注意力机制,使得在确定输出数据单元时能够关注到较为重要的数据,从而能够获得质量较佳的翻译结果。
请参阅图2,示出了同传翻译模型的结构示意图,其可以包括:编码模块201、注意力模块202、向量预测模块203和输出位置及输出数据预测模块204。其中:
编码模块201,用于对当前输入数据单元进行编码,以获得当前输入数据单元的编码结果。其中,当前输入数据单元的编码结果为能够表征当前输入数据单元的向量。图2中的x i表示第i个输入数据单元,其它x以此类推,h i表示第i个输入数据单元x i的编码结果,其它h以此类推。
注意力模块202,用于在需要在当前输入数据单元的位置处进行输出决策时,根据当前输入数据单元的编码结果和历史输入数据单元的编码结果,确定当前输入数据单元对应的上下文向量。具体的,确定当前输入数据单元和历史输入数据单元分别对应的权重,并根据确定出的权重、当前输入数据单元的编码结果以及历史输入数据单元的编码结果,确定当前输入数据单元对应的上下文向量,更为具体的,将当前输入数据单元的编码结果以及历史输入数据单元的编码结果按对应的权重加权求和,以得到当前输入数据单元对应的上下文向量。需要说明的是,本申请中的注意力模块202的引入使得同传翻译模型可以解决调序问题,从而能够提升翻译质量,比如,输入的数据单元为A1、A2,假设A1的翻译结果为B1,A2的翻译结果为B2,则注意力模块的引入使得可以先输出B2再输出B1,而不一定先输出B1再输出B2。
向量预测模块203,用于根据当前已获得的输出数据单元确定用于预测下一输出数据单元的向量,作为输出数据预测向量。其中,输出数据预测向量中含有用于预测下一输出数据的有用信息。图2中的
Figure PCTCN2022105363-appb-000001
为用于预测第j个输出数据单元的向量,其根据已有的输出数据单元y 1~y j-1确定。
输出位置及输出数据预测模块204,用于根据当前输入数据单元对应的上下文向量和输出数据预测向量,确定当前输入数据单元的位置处是否进行数据输出,以及在确定进行数据输出时,确定输出数据单元并输出。图中的Pr表示输出位置及输出数据预测模块204根据当前输入数据对应的上下文向量和输出数据预测向量确定在当前输入数据的位置处输出设定的各数据单元以及不进行输出的概率。
本实施例中的同传翻译模型采用训练数据单元序列训练得到,该同传翻译模型的训练目标为,联合优化同传翻译模型在训练数据单元序列上的翻译质量和翻译延迟。优选的,在采用训练数据单元序列训练同传翻译模型时,可联合优化由训练数据单元序列到对应翻译结果的所有可能的同传路径的翻译质量和翻译延迟。在训练阶段,由于同传翻译模型起初并不能判别出哪条同传路径为最优路径,因此,对训练数据单元序列到对应翻译结果的所有可能的同传路径的翻译质量和翻译延迟进行优化,需要说明的是,随着训练过程的不断进行,最优路径被同传翻译模型选择的概率将会越来越大,而非最优路径被同传翻译模型选择的概率将会越来越小,最终,同传翻译模型能够具备从所有可能的同传路径中选出最优路径的能力。需要说明的是,同传翻译模型选出最优路径的能力指的是在合适的输出位置(对应翻译延迟)输出合适数据(对应翻译质量)的能力。
本实施例结合图3对同传路径进行说明:一数据单元序列{x 1,x 2,x 3,x 4,x 5,x 6}到其对应的翻译结果(翻译结果也为一序列){y 1,y 2,y 3,y 4,y 5,y 6}存在多条同传路径,图3示出了其中的两条同传路径的示意图,每条同传路径能够表征翻译结果中的每个输出数据单元在何位置输出,比如,对于路径1,第1个输出数据单元y 1在x 2处输出,第2个输出数据单元y 2也在x 2处输出,第3个输出数据单元y 3在x 4处输出,…,对于路径2,第1个输出数据单元y 1在x 1处输出,第2个输出数据单元在x 2处输出,…。需要说明的是,不同同传路径中至少部分输出数据单元输出的位置不同。
由于本申请实施例提供的同传翻译方法可基于同传翻译模型实现,而同传翻译模型在训练的过程中,以同时对训练数据单元序列到对应翻译结果的所有可能的同传路径的翻译质量和翻译延迟进行优化为目标,因此,训练得到的同传翻译模型具备预测出合适的翻译延迟以及输出质量较佳的翻译结果的能力,进而,基于训练得到的同传翻译模型对输入数据进行同传翻译,既能够获得合适的延迟,又能获得质量较好的翻译结果,即基于训练得到的同传翻译模型总体上可实现延迟与翻译质量的权衡。
本申请实施例提供的同传翻译方法,以使翻译质量和翻译延迟共同优化为方向,通过对当前输入数据单元和当前已获得的输出数据单元进行处理,来预测当前输入数据单元的位置处是否进行数据输出,以及在预测出进行数据输出时,确定输出数据单元并输出。可见,本申请实施例提供的同传翻译方法可实现翻译延迟的动态预测,并且,由于本申请提供的同传翻译方法以使翻译质量和翻译延迟共同优化为方向对数据输出位置和输出数据进行预测,因此,其能够预测出较为合适的翻译延迟以及质量较佳的翻译结果。
第二实施例
上述实施例提到,步骤S101~步骤S104的同传翻译过程可基于预先建立的同传翻译模型实现,本实施例重点对建立同传翻译模型的过程进行介绍。
请参阅图4,示出了建立同传翻译模型的流程示意图,可以包括:
步骤S401:将训练数据单元序列中的数据单元依次逐个输入同传翻译模型,以得到训练数据单元序列中数据单元对应的预测结果,以及训练数据单元序列对应的翻译结果。
其中,训练数据单元序列中一数据单元对应的预测结果包括:在该数据单元的位置处输出设定的各数据单元的概率以及不进行输出的概率。
针对训练数据单元序列x中输入同传翻译模型中的每个数据单元x i,同传翻译模型对x i进行编码,若x i的位置处需要进行输出决策,则根据x i的编码结果和x 1~x i-1的编码结果,确定x i对应的上下文向量,根据当前已获得的输出数据单元确定用于预测下一输出数据单元的向量作为输出数据预测向量,根据x i对应的上下文向量和输出数据预测向量预测x i处输出设定的各数据单元的概率以及不进行输出的概率,作为x i对应的预测结果。在获得x i对应的预测结果后,可根据x i对应的预测结果确定x i处是否需要输出数据单元,若是,则进一 步根据x i处输出设定的各数据单元的概率确定需要输出的数据单元并输出。需要说明的是,步骤S401中“训练数据单元序列对应的翻译结果”为由同传翻译模型输出的所有输出数据单元组成的翻译结果。
另外,需要说明的是,同传翻译模型对训练数据单元序列进行翻译的过程与第一实施例提供的翻译过程类似,具体可相互参见,本实施例在此不做赘述。
步骤S402:根据训练数据单元序列中数据单元对应的预测结果,以及训练数据单元序列到对应翻译结果的所有可能的同传路径,确定同传翻译模型在翻译质量这一维度上的预测损失以及在翻译延迟这一维度上的预测损失。
为了能够对同传翻译模型的翻译质量和翻译延迟共同优化,本实施例确定同传翻译模型在翻译质量这一维度上的预测损失以及在翻译延迟这一维度上的预测损失。
进一步的,步骤S402的具体实现过程可以包括:
步骤S4021、根据训练数据单元序列中数据单元对应的预测结果,确定训练数据单元序列到对应翻译结果的所有可能的同传路径的概率和。
若训练数据单元序列表示为x,训练数据单元序列x对应翻译结果表示为y,训练数据单元序列x到对应翻译结果y的所有可能的同传路径用H(x,y)表示,其中一条同传路径用
Figure PCTCN2022105363-appb-000002
表示,即
Figure PCTCN2022105363-appb-000003
则训练数据单元序列x到对应翻译结果y的所有可能的同传路径的概率和可表示为
Figure PCTCN2022105363-appb-000004
具体的,步骤S4021的实现过程可以包括:
步骤a1、针对训练数据单元序列到对应翻译结果的所有可能的同传路径所经过的每个节点:
步骤a1-a、根据训练数据单元序列中数据单元对应的预测结果,确定经过该节点的所有前向路径的概率和以及经过该节点的所有后向路径的概率和。
其中,经过该节点的前向路径指的是,从开始节点到该节点的路径,类似的,经过该节点的后向路径指的是,从该节点到结束节点的路径。
步骤a1-b、根据经过该节点的所有前向路径的概率和以及经过该节点的所有后向路径的概率和,确定经过该节点的路径的概率,作为该节点对应的概率。
若将该节点表示为(i,j),将训练数据单元序列到对应翻译结果的所有可能的同传路径中,经过节点(i,j)的所有前向路径的概率和表示为α(i,j),将经过节点(i,j)的所有后向路径的概率和表示为β(i,j),则可通过将α(i,j)与β(i,j)相 乘,来获得经过该节点的路径的概率。
需要说明的是,本申请提供的同传翻译模型的结构(输入数据和历史输出数据分开处理,二者不存在耦合)使得不同路径的汇聚节点处可获得相同的语义表示,从而使得经过汇聚节点的路径可合并,即,对于节点(i,j),可将经过节点(i,j)的所有前向路径进行合并计算,将经过节点(i,j)的所有后向路径进行合并计算。
经由上述步骤a1可获得训练数据单元序列到对应翻译结果的所有可能的同传路径所经过的每个节点对应的概率。
步骤a2、根据训练数据单元序列到对应翻译结果的所有可能的同传路径所经过的所有节点分别对应的概率,确定训练数据单元序列到对应翻译结果的所有可能的同传路径的概率和。
若将训练数据单元序列到对应翻译结果的所有可能的同传路径的概率和
Figure PCTCN2022105363-appb-000005
表示为Pr(y|x),则Pr(y|x)可表示为:
Pr(y|x)=∑ (i,j):i+j=mα(i,j)·β(i,j)      (1)
上式表示,将训练数据单元序列到对应翻译结果的所有可能的同传路径所经过的所有节点分别对应的概率求和,其中,1≤m≤|x|+|y|,|x|表示训练数据单元序列x的长度,即x所包含的数据单元的个数,|y|表示训练数据单元序列x对应的翻译结果y的长度,即y所包含的数据单元的个数。
步骤S4022、根据训练数据单元序列到对应翻译结果的所有可能的同传路径的概率和,确定同传翻译模型在翻译质量这一维度上的预测损失。
若同传翻译模型在翻译质量这一维度上的预测损失用
Figure PCTCN2022105363-appb-000006
表示,则可通过下式确定
Figure PCTCN2022105363-appb-000007
Figure PCTCN2022105363-appb-000008
即,同传翻译模型在翻译质量这一维度上的预测损失为训练数据单元序列到对应翻译结果的所有可能的同传路径边缘分布的负对数似然度。
步骤S4022、根据训练数据单元序列对应的翻译结果中每个数据单元的理想输出位置以及每个数据单元在每条同传路径中的实际输出位置,确定所有可 能的同传路径的延迟期望,作为同传翻译模型在翻译延迟这一维度上的预测损失。
其中,训练数据单元序列对应的翻译结果中一数据单元的理想输出位置根据训练数据单元序列的长度、训练数据单元序列对应的翻译结果的长度,以及该数据单元在训练数据单元序列对应的翻译结果中的位置确定。
进一步的,步骤S4022的实现过程可以包括:
步骤b1、针对训练数据单元序列到对应翻译结果的所有可能的同传路径所经过的每个节点:根据训练数据单元序列对应的翻译结果中每个数据单元的理想输出位置以及每个数据单元在经过该节点的同传路径上的实际输出位置,确定经过该节点的所有同传路径的延迟期望,作为该节点对应的延迟期望。
具体的,根据训练数据单元序列对应的翻译结果中每个数据单元的理想输出位置以及每个数据单元在经过该节点的同传路径上的实际输出位置,确定经过该节点的所有同传路径的延迟期望的过程可以包括:
步骤b11-a、针对经过该节点的每条前向路径:根据在该前向路径上输出的数据单元的实际输出位置和理想输出位置,确定该前向路径对应的延迟损失。
步骤b12-a、根据经过该节点的所有前向路径分别对应的延迟损失,确定经过该节点的所有前向路径的延迟期望。
具体的,针对经过该节点的每条前向路径:确定该条前向路径对应的目标概率,将该条前向路径对应的目标概率与该前向路径对应的延迟损失相乘,得到该条前向路径对应的相乘结果,其中,该条前向路径对应的目标概率为该条前向路径的概率与经过该节点的所有前向路径的概率和的比值;将经过该节点的所有前向路径分别对应的相乘结果求和,得到经过该节点的所有前向路径的延迟期望。
步骤b11-b、针对经过该节点的每条后向路径:根据在该条后向路径上输出的数据单元的实际输出位置和理想输出位置,确定该条后向路径对应的延迟损失。
步骤b11-a和本步骤都需要根据一路径(前向路径或后向路径)上输出的数据单元的理想输出位置和实际输出位置,确定该路径对应的延迟损失,接下来给出这一过程的具体实现方式:
针对该路径所经过的每个节点:若该节点处有数据单元输出,则将该节点处输出的数据单元的实际输出位置相对于对应的理想输出位置的偏差作为该节点对应的延迟损失;若该节点处无数据单元输出,则确定该节点对应的延迟损失为0。
上述确定一路径
Figure PCTCN2022105363-appb-000009
对应的延迟损失
Figure PCTCN2022105363-appb-000010
的过程可通过下式表征:
Figure PCTCN2022105363-appb-000011
Figure PCTCN2022105363-appb-000012
Figure PCTCN2022105363-appb-000013
其中,
Figure PCTCN2022105363-appb-000014
表示路径
Figure PCTCN2022105363-appb-000015
经过的第k个节点的延迟损失,
Figure PCTCN2022105363-appb-000016
Figure PCTCN2022105363-appb-000017
经过的所有节点的延迟损失的和,d(i,j)表示在第i个输入数据单元的位置处输出第j个数据单元的延迟,
Figure PCTCN2022105363-appb-000018
表示在路径
Figure PCTCN2022105363-appb-000019
的第k个节点处无数据单元输出,
Figure PCTCN2022105363-appb-000020
中的i表示实际输出位置,
Figure PCTCN2022105363-appb-000021
表示理想输出位置,d(i,j)的计算式中设置下限0是为了避免过于激进的策略,即避免出现输出速度过快而影响翻译质量。
步骤b12-b、根据经过该节点的所有后向路径分别对应的延迟损失,确定经过该节点的所有后向路径的延迟期望。
具体的,针对经过该节点的每条后向路径:确定该条后向路径对应的目标概率,将该条后向路径对应的目标概率与该条后向路径对应的延迟损失相乘,得到该条后向路径对应的相乘结果,其中,该条后向路径对应的目标概率为该条后向路径的概率与经过该节点的所有后向路径的概率和的比值;将经过该节点的所有后向路径分别对应的相乘结果求和,得到经过该节点的所有后向路径 的延迟期望。
步骤b13-b、根据经过该节点的所有前向路径的延迟期望和经过该节点的所有后向路径的延迟期望,确定经过该节点的所有路径的延迟期望。
若将该节点表示为(i,j),将经过节点(i,j)的所有前向路径的延迟期望表示为α lat(i,j),将经过节点(i,j)的所有后向路径的延迟期望表示为β lat(i,j),则可将α lat(i,j)与β lat(i,j)求和,以得到经过节点(i,j)的所有路径的延迟期望c(i,j),即:
c(i,j)=α lat(i,j)+β lat(i,j)      (6)
经由步骤b1可获得训练数据单元序列到对应翻译结果的所有可能的同传路径所经过的每个节点对应的延迟期望。
步骤b2、根据训练数据单元序列到对应翻译结果的所有可能的同传路径所经过的每个节点对应的延迟期望和概率,以及训练数据单元序列到对应翻译结果的所有可能的同传路径的概率和,确定训练数据单元序列到对应翻译结果的所有可能的同传路径的延迟期望。
需要说明的是,步骤b2中训练数据单元序列到对应翻译结果的所有可能的同传路径所经过的任一节点对应的概率,即为根据经过该节点的前向路径的概率和以及经过该节点的后向路径的概率和确定的概率。
若将同传翻译模型在翻译延迟这一维度上的预测损失用
Figure PCTCN2022105363-appb-000022
表示,训练数据单元序列到对应翻译结果的所有可能的同传路径的延迟期望用
Figure PCTCN2022105363-appb-000023
表示,即:
Figure PCTCN2022105363-appb-000024
具体的,训练数据单元序列到对应翻译结果的所有可能的同传路径的延迟期望
Figure PCTCN2022105363-appb-000025
可通过下式计算得到:
Figure PCTCN2022105363-appb-000026
步骤S403:根据同传翻译模型在翻译质量这一维度上的预测损失以及在翻译延迟这一维度上的预测损失,更新同传翻译模型的参数。
具体的,根据同传翻译模型在翻译质量这一维度上的预测损失以及同传翻 译模型在翻译延迟这一维度上的预测损失,确定同传翻译模型的总预测损失
Figure PCTCN2022105363-appb-000027
根据同传翻译模型的总预测损失
Figure PCTCN2022105363-appb-000028
更新同传翻译模型的参数。
其中,同传翻译模型的总预测损失
Figure PCTCN2022105363-appb-000029
可表示为:
Figure PCTCN2022105363-appb-000030
其中,λ latency为同传翻译模型在翻译延迟这一维度上的预测损失的权重,λ latency可根据具体的应用场景设定,λ latency设置的不同,训练得到的同传翻译模型的翻译延迟不同。
在根据同传翻译模型的总预测损失
Figure PCTCN2022105363-appb-000031
更新同传翻译模型的参数时,需要进行梯度计算,在进行梯度计算时,可分别针对
Figure PCTCN2022105363-appb-000032
Figure PCTCN2022105363-appb-000033
计算梯度,进而根据计算出的梯度更新同传翻译模型的参数,具体的,可根据下式对
Figure PCTCN2022105363-appb-000034
Figure PCTCN2022105363-appb-000035
计算梯度:
Figure PCTCN2022105363-appb-000036
Figure PCTCN2022105363-appb-000037
按上述过程进行多次迭代训练,直至满足训练结果条件,训练结束后得到的模型即为建立的同传翻译模型。
由于在训练同传翻译模型的过程中,对所有可能的同传路径的翻译质量和翻译延迟共同进行优化,因此,训练得到的同传翻译模型既能够按合适的延迟进行翻译输出,又能输出质量较好的翻译结果。
另外,在训练过程中,本申请优选按预设的决策步长D进行输出决策,相比于针对在每个输入数据单元的位置处进行输出决策,运算复杂度能够从O(|x|·|y|)降低为
Figure PCTCN2022105363-appb-000038
另外,除效率上的优势外,按预设的决策步长D进行输出决策(即多步决策),通过降低决策次数,降低了模型不恰当位置决策的风险,从而提升了模型的翻译质量。需要说明的是,若训练阶段,同传翻 译模型按决策步长D进行输出决策,则在训练结束后的实际应用阶段,同传翻译模型也按决策步长D进行输出决策。
需要说明的是,上述的权重λ latency和决策步长D均为可调节参数,在实际应用中,可根据具体的应用场景调节λ latency和决策步长D使其与具体场景的应用需求匹配。权重λ latency和决策步长D对应翻译延迟和翻译质量的影响是,λ latency增大(或D减小),则同传翻译延迟减小,对应的翻译质量下降,反之,λ latency减小(或D增大),则同传翻译延迟增大,对应的翻译质量上升。
本申请实施例中的同传翻译模型可以但不限为基于RNN的同传翻译模型、基于Transformer的同传翻译模型,请参阅图5,示出了基于RNN的同传翻译模型的一示例,请参阅图6,示出了基于Transformer的同传翻译模型的一示例,不管是基于RNN的同传翻译模型还是基于Transformer的同传翻译模型,总体上均由编码模块、注意力模块、向量预测模块和输出位置及输出数据预测模块组成。需要说明的是,图6中右边的部分为用于处理已获得的输出数据单元的部分,即向量预测模块,其根据已获得的输出数据单元预测用于确定下一输出数据单元的向量,图6中间部分为用于即进行输出预测的部分,即输出位置及输出数据预测模块,图6左侧部分为用于处理输入数据的部分,即编码模块和注意力模块。
第三实施例
本申请实施例还提供了一种同传翻译装置,下面对本申请实施例提供的同传翻译装置进行描述,下文描述的同传翻译装置与上文描述的同传翻译方法可相互对应参照。
请参阅图7,示出了本申请实施例提供的同传翻译装置的结构示意图,可以包括:数据处理模块701和数据预测模块702。
数据处理模块701,用于对当前输入数据单元和当前已获得的输出数据单元进行处理,以获得处理结果。
数据预测模块702,用于根据所述处理结果预测当前输入数据单元的位置处是否进行数据输出,以及在预测出进行数据输出时,确定输出数据单元并输出。
其中,数据输出位置的预测以及所述数据输出位置处输出数据的确定以使 翻译质量和翻译延迟共同优化为预测进行。
可选的,数据处理模块701可以包括:输入数据处理模块和历史输出数据处理模块。
输入数据处理模块,用于对当前输入数据单元进行编码,以获得当前输入数据单元的编码结果,以及,在需要在当前输入数据单元的位置处进行输出决策时,根据当前输入数据单元的编码结果和历史输入数据单元的编码结果,确定当前输入数据单元对应的上下文向量;
历史输出数据处理模块,用于根据当前已获得的输出数据单元确定用于预测下一输出数据单元的向量,作为输出数据预测向量。
数据预测模块702,具体用于根据当前输入数据单元对应的上下文向量和所述输出数据预测向量,确定当前输入数据单元的位置处是否进行数据输出,以及在确定进行数据输出时,确定输出数据单元并输出。
可选的,输入数据处理模块在根据当前输入数据单元的编码结果和历史输入数据单元的编码结果,确定当前输入数据单元对应的上下文向量时,具体用于:
若根据预设的决策步长确定需要在当前输入数据单元的位置处进行输出决策,则根据当前输入数据单元的编码结果和历史输入数据单元的编码结果,确定当前输入数据单元对应的上下文向量。
可选的,数据处理模块701和数据预测模块702由同传翻译模型实现。
其中,所述同传翻译模型采用训练数据单元序列训练得到,所述同传翻译模型的训练目标为,联合优化所述同传翻译模型在所述训练数据单元序列上的翻译质量和翻译延迟。
可选的,所述同传翻译模型包括:编码模块、注意力模块、向量预测模块和输出位置及输出数据预测模块;
所述编码模块,用于在每获得一输入数据单元时,对当前输入数据单元进行编码,以获得当前输入数据单元的编码结果;
所述注意力模块,用于确定当前输入数据单元和历史输入数据单元分别对应的权重,并根据确定出的权重、当前输入数据单元的编码结果以及历史输入数据单元的编码结果,确定当前输入数据单元对应的上下文向量;
所述向量预测模块,用于根据当前已获得的输出数据单元确定用于预测下一输出数据单元的向量,作为输出数据预测向量;
所述输出位置及输出数据预测模块,用于根据当前输入数据单元对应的上下文向量和所述输出数据预测向量,确定当前输入数据单元的位置处是否进行数据输出,以及在确定进行数据输出时,确定输出数据单元并输出。
可选的,本申请提供的同传翻译装置还可以包括:模型训练模块。
所述模型训练模块包括:数据获取模块、预测损失确定模块和模型参数更新模块。
所述数据获取模块,用于将所述训练数据单元序列中的数据单元依次逐个输入同传翻译模型,以得到所述训练数据单元序列中数据单元对应的预测结果,以及所述训练数据单元序列对应的翻译结果;其中,所述训练数据单元序列中一数据单元对应的预测结果包括:在该数据单元的位置处输出设定的各数据单元的概率以及不进行输出的概率。
所述预测损失确定模块,用于根据所述训练数据单元序列中数据单元对应的预测结果,以及所述训练数据单元序列到对应翻译结果的所有可能的同传路径,确定同传翻译模型在翻译质量这一维度上的预测损失以及在翻译延迟这一维度上的预测损失。
所述模型参数更新模块,用于根据所述同传翻译模型在翻译质量这一维度上的预测损失以及在翻译延迟这一维度上的预测损失,更新同传翻译模型的参数。
可选的,所述预测损失确定模块包括:第一预测损失确定模块和第二预测损失确定模块。
所述第一预测损失确定模块,用于根据所述训练数据单元序列中数据单元对应的预测结果,确定所述所有可能的同传路径的概率和,并根据所述所有可能的同传路径的概率和,确定所述同传翻译模型在翻译质量这一维度上的预测损失。
所述第二预测损失确定模块,用于根据所述训练数据单元序列对应的翻译结果中每个数据单元的理想输出位置以及每个数据单元在每条同传路径中的实际输出位置,确定所述所有可能的同传路径的延迟期望,作为同传翻译模型 在翻译延迟这一维度上的预测损失。
可选的,本申请提供的同传翻译装置还可以包括:理想输出位置确定模块。
理想输出位置确定模块,用于确定所述训练数据单元序列对应的翻译结果中每个数据单元的理想输出位置;
可选的,理想输出位置确定模块在确定所述训练数据单元序列对应的翻译结果中一数据单元的理想输出位置时,具体用于根据所述训练数据单元序列的长度、所述训练数据单元序列对应的翻译结果的长度,以及该数据单元在所述训练数据单元序列对应的翻译结果中的位置确定。
可选的,所述第一预测损失确定模块在根据所述训练数据单元序列中数据单元对应的预测结果,确定所述所有可能的同传路径的概率和时,具体用于:
针对所述所有可能的同传路径所经过的每个节点:
根据所述训练数据单元序列中数据单元对应的预测结果,确定经过该节点的所有前向路径的概率和以及经过该节点的所有后向路径的概率和;
根据经过该节点的所有前向路径的概率和以及经过该节点的所有后向路径的概率和,确定经过该节点的路径的概率,作为该节点对应的概率;
根据所述所有可能的同传路径所经过的所有节点分别对应的概率,确定所述所有可能的同传路径的概率和。
可选的,所述第二预测损失确定模块在根据所述训练数据单元序列对应的翻译结果中每个数据单元的理想输出位置以及每个数据单元在每条同传路径中的实际输出位置,确定所述所有可能的同传路径的延迟期望时,具体用于:
针对所述所有可能的同传路径所经过的每个节点:
根据所述训练数据单元序列对应的翻译结果中每个数据单元的理想输出位置以及每个数据单元在经过该节点的同传路径上的实际输出位置,确定经过该节点的所有同传路径的延迟期望,作为该节点对应的延迟期望;
根据所述所有可能的同传路径所经过的每个节点对应的延迟期望和概率,以及所述所有可能的同传路径的概率和,确定所述所有可能的同传路径的延迟期望,其中,一个节点对应的概率根据经过该节点的前向路径的概率和以及经过该节点的后向路径的概率和确定。
可选的,所述第二预测损失确定模块在根据所述训练数据单元序列对应的翻译结果中每个数据单元的理想输出位置以及每个数据单元在经过该节点的同传路径上的实际输出位置,确定经过该节点的所有同传路径的延迟期望时,具体用于:
针对经过该节点的每条前向路径:根据在该前向路径上输出的数据单元的实际输出位置和理想输出位置,确定该前向路径对应的延迟损失;
根据经过该节点的所有前向路径分别对应的延迟损失,确定经过该节点的所有前向路径的延迟期望;
针对经过该节点的每条后向路径:根据在该向路径上输出的数据单元的实际输出位置和理想输出位置,确定该后向路径对应的延迟损失;
根据经过该节点的所有后向路径分别对应的延迟损失,确定经过该节点的所有后向路径的延迟期望;
根据经过该节点的所有前向路径的延迟期望和经过该节点的所有后向路径的延迟期望,确定经过该节点的所有路径的延迟期望。
可选的,所述第二预测损失确定模块在根据一路径上输出的数据单元的理想输出位置和实际输出位置,确定该路径对应的延迟损失时,具体用于:
针对该路径所经过的每个节点:
若该节点处有数据单元输出,则将该节点处输出的数据单元的实际输出位置相对于对应的理想输出位置的偏差作为该节点对应的延迟损失;
若该节点处无数据单元输出,则确定该节点对应的延迟损失为0。
本申请实施例提供的同传翻译装置,以使翻译质量和翻译延迟共同优化为方向,通过对当前输入数据单元和当前已获得的输出数据单元进行处理,来预测当前输入数据单元的位置处是否进行数据输出,以及在预测出进行数据输出时,确定输出数据单元并输出。可见,本申请实施例提供的同传翻译装置可实现翻译延迟的动态预测,并且,由于本申请提供的同传翻译装置以使翻译质量和翻译延迟共同优化为方向对数据输出位置和输出数据进行预测,因此,其能够预测出较为合适的翻译延迟以及质量较佳的翻译结果。。
第四实施例
本申请实施例还提供了一种同传翻译设备,请参阅图8,示出了该同传翻 译设备的结构示意图,该同传翻译设备可以包括:至少一个处理器801,至少一个通信接口802,至少一个存储器803和至少一个通信总线804;
在本申请实施例中,处理器801、通信接口802、存储器803、通信总线804的数量为至少一个,且处理器801、通信接口802、存储器803通过通信总线804完成相互间的通信;
处理器801可能是一个中央处理器CPU,或者是特定集成电路ASIC(Application Specific Integrated Circuit),或者是被配置成实施本发明实施例的一个或多个集成电路等;
存储器803可能包含高速RAM存储器,也可能还包括非易失性存储器(non-volatile memory)等,例如至少一个磁盘存储器;
其中,存储器存储有程序,处理器可调用存储器存储的程序,所述程序用于:
对当前输入数据单元和当前已获得的输出数据单元进行处理,以获得处理结果;
根据所述处理结果预测当前输入数据单元的位置处是否进行数据输出,以及在预测出进行数据输出时,确定输出数据单元并输出;
其中,数据输出位置的预测以及所述数据输出位置处输出数据的确定以使翻译质量和翻译延迟共同优化为方向进行。
可选的,所述程序的细化功能和扩展功能可参照上文描述。
第五实施例
本申请实施例还提供一种可读存储介质,该可读存储介质可存储有适于处理器执行的程序,所述程序用于:
对当前输入数据单元和当前已获得的输出数据单元进行处理,以获得处理结果;
根据所述处理结果预测当前输入数据单元的位置处是否进行数据输出,以及在预测出进行数据输出时,确定输出数据单元并输出;
其中,数据输出位置的预测以及所述数据输出位置处输出数据的确定以使翻译质量和翻译延迟共同优化为方向进行。
可选的,所述程序的细化功能和扩展功能可参照上文描述。
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。
对所公开的实施例的上述说明,使本领域专业技术人员能够实现或使用本发明。对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本发明的精神或范围的情况下,在其它实施例中实现。因此,本发明将不会被限制于本文所示的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。

Claims (15)

  1. 一种同传翻译方法,其特征在于,包括:
    对当前输入数据单元和当前已获得的输出数据单元进行处理,以获得处理结果;
    根据所述处理结果预测当前输入数据单元的位置处是否进行数据输出,以及在预测出进行数据输出时,确定输出数据单元并输出;
    其中,数据输出位置的预测以及所述数据输出位置处输出数据的确定以使翻译质量和翻译延迟共同优化为方向进行。
  2. 根据权利要求1所述的同传翻译方法,其特征在于,所述对当前输入数据单元和当前已获得的输出数据单元进行处理,以获得处理结果,包括:
    对当前输入数据单元进行编码,以获得当前输入数据单元的编码结果;
    若需要在当前输入数据单元的位置处进行输出决策,则根据当前输入数据单元的编码结果和历史输入数据单元的编码结果,确定当前输入数据单元对应的上下文向量;
    根据当前已获得的输出数据单元确定用于预测下一输出数据单元的向量,作为输出数据预测向量;
    所述根据所述处理结果预测当前输入数据单元的位置处是否进行数据输出,以及在预测出进行数据输出时,确定输出数据单元并输出,包括:
    根据所述当前输入数据单元对应的上下文向量和所述输出数据预测向量,预测当前输入数据单元的位置处是否进行数据输出,以及在预测出进行数据输出时,确定输出数据单元并输出。
  3. 根据权利要求2所述的同传翻译方法,其特征在于,所述若需要在当前输入数据单元的位置处进行输出决策,则根据当前输入数据单元的编码结果和历史输入数据单元的编码结果,确定当前输入数据单元对应的上下文向量,包括:
    若根据预设的决策步长确定需要在当前输入数据单元的位置处进行输出决策,则根据当前输入数据单元的编码结果和历史输入数据单元的编码结果,确定当前输入数据单元对应的上下文向量。
  4. 根据权利要求1所述的同传翻译方法,其特征在于,所述对当前输入数据单元和当前已获得的输出数据单元进行处理,以获得处理结果,根据所述 处理结果预测当前输入数据单元的位置处是否进行数据输出,以及在预测出进行数据输出时,确定输出数据单元并输出,包括:
    利用预先建立的同传翻译模型,对当前输入数据单元和当前已获得的输出数据单元进行处理,并根据处理结果预测当前输入数据单元的位置处是否进行数据输出,以及在预测出进行数据输出时,确定输出数据单元并输出;
    其中,所述同传翻译模型采用训练数据单元序列训练得到,所述同传翻译模型的训练目标为,联合优化所述同传翻译模型在所述训练数据单元序列上的翻译质量和翻译延迟。
  5. 根据权利要求4所述的同传翻译方法,其特征在于,所述同传翻译模型包括:编码模块、注意力模块、向量预测模块和输出位置及输出数据预测模块;
    所述编码模块,用于对当前输入数据单元进行编码,以获得当前输入数据单元的编码结果;
    所述注意力模块,用于确定当前输入数据单元和历史输入数据单元分别对应的权重,并根据确定出的权重、当前输入数据单元的编码结果以及历史输入数据单元的编码结果,确定当前输入数据单元对应的上下文向量;
    所述向量预测模块,用于根据当前已获得的输出数据单元确定用于预测下一输出数据单元的向量,作为输出数据预测向量;
    所述输出位置及输出数据预测模块,用于根据当前输入数据单元对应的上下文向量和所述输出数据预测向量,预测当前输入数据单元的位置处是否进行数据输出,以及在预测出进行数据输出时,确定输出数据单元并输出。
  6. 根据权利要求4所述的同传翻译方法,其特征在于,建立所述同传翻译模型的过程包括:
    将所述训练数据单元序列中的数据单元依次逐个输入同传翻译模型,以得到所述训练数据单元序列中数据单元对应的预测结果,以及所述训练数据单元序列对应的翻译结果;其中,所述训练数据单元序列中一数据单元对应的预测结果包括:在该数据单元的位置处输出设定的各数据单元的概率以及不进行输出的概率;
    根据所述训练数据单元序列中数据单元对应的预测结果,以及所述训练数据单元序列到对应翻译结果的所有可能的同传路径,确定同传翻译模型在翻译 质量这一维度上的预测损失以及在翻译延迟这一维度上的预测损失;
    根据所述同传翻译模型在翻译质量这一维度上的预测损失以及在翻译延迟这一维度上的预测损失,更新同传翻译模型的参数。
  7. 根据权利权利要求6所述的同传翻译方法,其特征在于,所述根据所述训练数据单元序列中数据单元对应的预测结果,以及所述训练数据单元序列到对应翻译结果的所有可能的同传路径,确定同传翻译模型在翻译质量这一维度上的预测损失以及在翻译延迟这一维度上的预测损失,包括:
    根据所述训练数据单元序列中数据单元对应的预测结果,确定所述所有可能的同传路径的概率和;
    根据所述所有可能的同传路径的概率和,确定所述同传翻译模型在翻译质量这一维度上的预测损失;
    根据所述训练数据单元序列对应的翻译结果中每个数据单元的理想输出位置以及每个数据单元在每条同传路径中的实际输出位置,确定所述所有可能的同传路径的延迟期望,作为同传翻译模型在翻译延迟这一维度上的预测损失。
  8. 根据权利要求7所述的同传翻译方法,其特征在于,确定所述训练数据单元序列对应的翻译结果中一数据单元的理想输出位置的过程包括:
    根据所述训练数据单元序列的长度、所述训练数据单元序列对应的翻译结果的长度,以及该数据单元在所述训练数据单元序列对应的翻译结果中的位置,确定该数据单元的理想输出位置。
  9. 根据权利要求7所述的同传翻译方法,其特征在于,所述根据所述训练数据单元序列中数据单元对应的预测结果,确定所述所有可能的同传路径的概率和,包括:
    针对所述所有可能的同传路径所经过的每个节点:
    根据所述训练数据单元序列中数据单元对应的预测结果,确定经过该节点的所有前向路径的概率和以及经过该节点的所有后向路径的概率和;
    根据经过该节点的所有前向路径的概率和以及经过该节点的所有后向路径的概率和,确定经过该节点的路径的概率,作为该节点对应的概率;
    根据所述所有可能的同传路径所经过的所有节点分别对应的概率,确定所述所有可能的同传路径的概率和。
  10. 根据权利要求7所述的同传翻译方法,其特征在于,所述根据所述训练数据单元序列对应的翻译结果中每个数据单元的理想输出位置以及每个数据单元在每条同传路径中的实际输出位置,确定所述所有可能的同传路径的延迟期望,包括:
    针对所述所有可能的同传路径所经过的每个节点:
    根据所述训练数据单元序列对应的翻译结果中每个数据单元的理想输出位置以及每个数据单元在经过该节点的同传路径上的实际输出位置,确定经过该节点的所有同传路径的延迟期望,作为该节点对应的延迟期望;
    根据所述所有可能的同传路径所经过的每个节点对应的延迟期望和概率,以及所述所有可能的同传路径的概率和,确定所述所有可能的同传路径的延迟期望,其中,一个节点对应的概率根据经过该节点的前向路径的概率和以及经过该节点的后向路径的概率和确定。
  11. 根据权利要求10所述的同传翻译方法,其特征在于,所述根据所述训练数据单元序列对应的翻译结果中每个数据单元的理想输出位置以及每个数据单元在经过该节点的同传路径上的实际输出位置,确定经过该节点的所有同传路径的延迟期望,包括:
    针对经过该节点的每条前向路径:根据在该前向路径上输出的数据单元的实际输出位置和理想输出位置,确定该前向路径对应的延迟损失;
    根据经过该节点的所有前向路径分别对应的延迟损失,确定经过该节点的所有前向路径的延迟期望;
    针对经过该节点的每条后向路径:根据在该后向路径上输出的数据单元的实际输出位置和理想输出位置,确定该后向路径对应的延迟损失;
    根据经过该节点的所有后向路径分别对应的延迟损失,确定经过该节点的所有后向路径的延迟期望;
    根据经过该节点的所有前向路径的延迟期望和经过该节点的所有后向路径的延迟期望,确定经过该节点的所有路径的延迟期望。
  12. 根据权利要求11所述的同传翻译方法,其特征在于,根据一路径上输出的数据单元的理想输出位置和实际输出位置,确定该路径对应的延迟损失,包括:
    针对该路径所经过的每个节点:
    若该节点处有数据单元输出,则将该节点处输出的数据单元的实际输出位置相对于对应的理想输出位置的偏差作为该节点对应的延迟损失;
    若该节点处无数据单元输出,则确定该节点对应的延迟损失为0。
  13. 一种同传翻译装置,其特征在于,包括:数据处理模块和数据预测模块;
    所述数据处理模块,用于对当前输入数据单元和当前已获得的输出数据单元进行处理,以获得处理结果;
    所述数据预测模块,用于根据所述处理结果预测当前输入数据单元的位置处是否进行数据输出,以及在预测出进行数据输出时,确定输出数据单元并输出;
    其中,数据输出位置的预测以及所述数据输出位置处输出数据的确定以使翻译质量和翻译延迟共同优化为方向进行。
  14. 根据权利要求13所述的同传翻译装置,其特征在于,所述数据处理模块和所述数据预测模块由同传翻译模型实现;
    其中,所述同传翻译模型采用训练数据单元序列训练得到,所述同传翻译模型的训练目标为,联合优化所述同传翻译模型在所述训练数据单元序列上的翻译质量和翻译延迟。
  15. 一种同传翻译设备,其特征在于,包括:存储器和处理器;
    所述存储器,用于存储程序;
    所述处理器,用于执行所述程序,实现如权利要求1~12中任一项所述的同传翻译方法的各个步骤。
PCT/CN2022/105363 2021-08-02 2022-07-13 一种同传翻译方法、装置、设备及存储介质 WO2023011125A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110881817.4 2021-08-02
CN202110881817.4A CN113486681A (zh) 2021-08-02 2021-08-02 一种同传翻译方法、装置、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2023011125A1 true WO2023011125A1 (zh) 2023-02-09

Family

ID=77944080

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/105363 WO2023011125A1 (zh) 2021-08-02 2022-07-13 一种同传翻译方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN113486681A (zh)
WO (1) WO2023011125A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113486681A (zh) * 2021-08-02 2021-10-08 科大讯飞股份有限公司 一种同传翻译方法、装置、设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110211570A (zh) * 2019-05-20 2019-09-06 北京百度网讯科技有限公司 同声传译处理方法、装置及设备
CN110298046A (zh) * 2019-07-03 2019-10-01 科大讯飞股份有限公司 一种翻译模型训练方法、文本翻译方法及相关装置
CN110969028A (zh) * 2018-09-28 2020-04-07 百度(美国)有限责任公司 用于同步翻译的系统和方法
US20210182504A1 (en) * 2018-11-28 2021-06-17 Tencent Technology (Shenzhen) Company Limited Text translation method and apparatus, and storage medium
CN113486681A (zh) * 2021-08-02 2021-10-08 科大讯飞股份有限公司 一种同传翻译方法、装置、设备及存储介质

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112735417B (zh) * 2020-12-29 2024-04-26 中国科学技术大学 语音翻译方法、电子设备、计算机可读存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110969028A (zh) * 2018-09-28 2020-04-07 百度(美国)有限责任公司 用于同步翻译的系统和方法
US20210182504A1 (en) * 2018-11-28 2021-06-17 Tencent Technology (Shenzhen) Company Limited Text translation method and apparatus, and storage medium
CN110211570A (zh) * 2019-05-20 2019-09-06 北京百度网讯科技有限公司 同声传译处理方法、装置及设备
CN110298046A (zh) * 2019-07-03 2019-10-01 科大讯飞股份有限公司 一种翻译模型训练方法、文本翻译方法及相关装置
CN113486681A (zh) * 2021-08-02 2021-10-08 科大讯飞股份有限公司 一种同传翻译方法、装置、设备及存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
DAN LIU; MENGGE DU; XIAOXI LI; YUCHEN HU; LIRONG DAI: "The USTC-NELSLIP Systems for Simultaneous Speech Translation Task at IWSLT 2021", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 1 July 2021 (2021-07-01), 201 Olin Library Cornell University Ithaca, NY 14853, XP091006249 *

Also Published As

Publication number Publication date
CN113486681A (zh) 2021-10-08

Similar Documents

Publication Publication Date Title
US11676606B2 (en) Context-sensitive dynamic update of voice to text model in a voice-enabled electronic device
JP7278477B2 (ja) 復号化ネットワーク構築方法、音声認識方法、装置、設備及び記憶媒体
JP7407968B2 (ja) 音声認識方法、装置、設備及び記憶媒体
JP7431833B2 (ja) 言語シーケンスラベリング方法、装置、プログラム及びコンピューティング機器
WO2021136029A1 (zh) 重打分模型训练方法及装置、语音识别方法及装置
TW201935273A (zh) 語句的使用者意圖識別方法和裝置
WO2019154411A1 (zh) 词向量更新方法和装置
WO2021196954A1 (zh) 序列化数据处理方法和装置、文本处理方法和装置
WO2020228175A1 (zh) 多音字预测方法、装置、设备及计算机可读存储介质
Alumäe et al. Efficient estimation of maximum entropy language models with n-gram features: an SRILM extension.
CN110379411B (zh) 针对目标说话人的语音合成方法和装置
CN110717345B (zh) 一种译文重对齐的循环神经网络跨语言机器翻译方法
WO2023011125A1 (zh) 一种同传翻译方法、装置、设备及存储介质
CN111144124A (zh) 机器学习模型的训练方法、意图识别方法及相关装置、设备
JP2023503717A (ja) エンド・ツー・エンド音声認識における固有名詞認識
WO2021184769A1 (zh) 神经网络文本翻译模型的运行方法、装置、设备、及介质
US20220310073A1 (en) Mixture Model Attention for Flexible Streaming and Non-Streaming Automatic Speech Recognition
WO2020155769A1 (zh) 关键词生成模型的建模方法和装置
JP7278309B2 (ja) 文章レベルテキストの翻訳方法及び装置
KR20240070689A (ko) 콘포머에 대한 추론 성능의 최적화
WO2022257454A1 (zh) 一种合成语音的方法、装置、终端及存储介质
JP2024515199A (ja) 要素テキスト処理方法、装置、電子機器及び記憶媒体
WO2024037348A1 (zh) 音频处理方法、模型训练方法、装置、设备、介质及产品
JP2023537480A (ja) 予測情報を生成するための方法、装置、電子機器及び媒体
WO2021057926A1 (zh) 一种神经网络模型训练方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22851839

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE