WO2023115770A1 - 一种翻译方法及其相关设备 - Google Patents

一种翻译方法及其相关设备 Download PDF

Info

Publication number
WO2023115770A1
WO2023115770A1 PCT/CN2022/088961 CN2022088961W WO2023115770A1 WO 2023115770 A1 WO2023115770 A1 WO 2023115770A1 CN 2022088961 W CN2022088961 W CN 2022088961W WO 2023115770 A1 WO2023115770 A1 WO 2023115770A1
Authority
WO
WIPO (PCT)
Prior art keywords
translation
text
length
source text
main source
Prior art date
Application number
PCT/CN2022/088961
Other languages
English (en)
French (fr)
Inventor
林超
刘微微
刘聪
Original Assignee
科大讯飞股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 科大讯飞股份有限公司 filed Critical 科大讯飞股份有限公司
Publication of WO2023115770A1 publication Critical patent/WO2023115770A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present application relates to the technical field of data processing, in particular to a translation method and related equipment.
  • the simultaneous voice interpretation translation scenario is a translation scenario that lacks contextual information, is partial, real-time, and needs to consider the characteristics of at least two languages.
  • the real-time requirements of simultaneous voice interpretation translation are usually relatively high, but under the same semantic expression, it is easy to have more words in the target text than in the source text, so that the actual translation speed often cannot meet the real-time translation speed requirements , which leads to the phenomenon of delayed accumulation, which leads to poor translation effect.
  • the main purpose of the embodiments of the present application is to provide a translation method and related equipment, which can improve the translation effect.
  • the embodiment of the present application provides a translation method, the method includes: obtaining the source text to be processed; extracting the main source text from the source text to be processed; describing data according to the main source text, translation length, and pre-constructing
  • the compressed translation model is used to determine the simplified translation to be used; wherein, the compressed translation model is used to perform compressed translation on the main source text with reference to the translation length description data.
  • the embodiment of the present application also provides a translation device, including: a text acquisition unit, used to acquire the source text to be processed; a trunk extraction unit, used to extract the main source text from the source text to be processed; a compression translation unit, using Determine the condensed translation to be used according to the main source text, the translation length description data, and the pre-built compressed translation model; wherein, the compressed translation model is used to refer to the translation length description data, and the main source text Perform compression translation.
  • a text acquisition unit used to acquire the source text to be processed
  • a trunk extraction unit used to extract the main source text from the source text to be processed
  • a compression translation unit using Determine the condensed translation to be used according to the main source text, the translation length description data, and the pre-built compressed translation model; wherein, the compressed translation model is used to refer to the translation length description data, and the main source text Perform compression translation.
  • the embodiment of the present application also provides a device, which is characterized in that the device includes: a processor, a memory, and a system bus; the processor and the memory are connected through the system bus; the memory is used to store a or a plurality of programs, the one or more programs include instructions, and the instructions, when executed by the processor, cause the processor to execute any implementation manner of the translation method provided in the embodiments of the present application.
  • the embodiment of the present application also provides a computer-readable storage medium, which is characterized in that instructions are stored in the computer-readable storage medium, and when the instructions are run on a terminal device, the terminal device executes the Any implementation of the translation method provided in the examples.
  • the embodiment of the present application also provides a computer program product, which, when running on a terminal device, enables the terminal device to execute any implementation manner of the translation method provided in the embodiment of the present application.
  • the technical solution provided by the present application after obtaining the source text to be processed (for example, the speech recognition text of the current speech segment in the simultaneous speech stream), first extract the main source text from the source text to be processed, so that the main text
  • the source text is used to represent the core backbone information in the source text to be processed; then, according to the main source text, the translation length description data, and the pre-built compression translation model, determine the simplified translation to be used, so that the simplified translation to be used can
  • the semantic information carried by the source text to be processed is represented by fewer characters in the target text, which can effectively avoid the phenomenon that the number of words in the target text is more than that in the source text, so that the translation can be shortened without losing the core meaning
  • the length of the text can effectively reduce the translation delay, which can improve the real-time translation and improve the translation effect.
  • the main source text is obtained by extracting the main source text from the source text to be processed
  • the text length of the main source text is smaller than the text length of the source text to be processed, thus achieving the purpose of simplifying the text data in the source language
  • the condensed translation to be used is obtained by compressing the main source text through the compression translation model, so that the condensed translation to be used can represent the semantic information carried by the source text to be processed with fewer translation characters, so as to realize the simplification of the text in the target language Purpose of Data.
  • the embodiment of the present application realizes the compressed translation of the source text to be processed by simplifying both ends (that is, the source language side + the target language side), so as to ensure that the compressed translation result of the source text to be processed can be as fast as possible
  • the fewer characters in the translation represent the semantic information carried by the source text to be processed, which can effectively avoid the phenomenon that the number of words in the translation is more than the number of words in the source, and shorten the translation time without losing the core meaning. length, which in turn can effectively reduce the translation delay, which can improve the real-time translation, which can help improve the translation effect.
  • the condensed translation to be used is obtained by compressing translation with reference to the translation length description data by the compressed translation model, the text length of the condensed translation to be used is almost close to or even equal to the expected length of the translation represented by the translation length description data, Therefore, the text length of the condensed translation to be used is controllable, which can effectively ensure that the condensed translation to be used can express the semantic information carried by the source text to be processed with a reasonable number of words, thereby effectively Avoid the adverse effects caused by the uncontrollable number of words in the translation, which in turn helps to improve the translation effect.
  • Fig. 1 is a flowchart of a translation method provided by the embodiment of the present application.
  • FIG. 2 is a schematic diagram of a simultaneous voice stream provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of a dependency syntax tree provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a compression translation model provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a compression translation process provided by an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of a translation device provided by an embodiment of the present application.
  • the number of words in the translated text is likely to be more than the number of words in the speech recognition text, the actual translation speed cannot meet the real-time translation speed requirement, resulting in the accumulation of delays.
  • an embodiment of the present application provides a translation method, which includes: after obtaining the source text to be processed (for example, the voice of the current voice segment in the simultaneous voice stream) After identifying the text), first extract the main source text from the source text to be processed, so that the main source text is used to represent the core backbone information in the source text to be processed; then describe the data according to the main source text, translation length, And the pre-built compressed translation model determines the condensed translation to be used, so that the condensed translation to be used can express the semantic information carried by the source text to be processed with fewer translation characters, so that it can effectively avoid the large number of words at the translation end Due to the phenomenon of the number of words in the source text, the length of the translated text can be shortened without losing the core meaning, and the translation delay can be effectively reduced. This can improve the real-time translation and improve the translation effect.
  • the main source text is obtained by extracting the main source text from the source text to be processed
  • the text length of the main source text is smaller than the text length of the source text to be processed, thus achieving the purpose of simplifying the text data in the source language
  • the condensed translation to be used is obtained by compressing the main source text through the compression translation model, so that the condensed translation to be used can represent the semantic information carried by the source text to be processed with fewer translation characters, so as to realize the simplification of the text in the target language Purpose of Data.
  • the embodiment of the present application realizes the compressed translation of the source text to be processed by simplifying both ends (that is, the source language side + the target language side), so as to ensure that the compressed translation result of the source text to be processed can be as fast as possible
  • the fewer characters in the translation represent the semantic information carried by the source text to be processed, which can effectively avoid the phenomenon that the number of words in the translation is more than the number of words in the source, and shorten the translation time without losing the core meaning. length, which in turn can effectively reduce the translation delay, which can improve the real-time translation, which can help improve the translation effect.
  • the condensed translation to be used is obtained by compressing translation with reference to the translation length description data by the compressed translation model, the text length of the condensed translation to be used is almost close to or even equal to the expected length of the translation represented by the translation length description data, Therefore, the text length of the condensed translation to be used is controllable, which can effectively ensure that the condensed translation to be used can express the semantic information carried by the source text to be processed with a reasonable number of words, thereby effectively Avoid the adverse effects caused by the uncontrollable number of words in the translation, so that the length of the translated text can be shortened without losing the core meaning, and the translation delay can be effectively reduced, which can improve the real-time translation and thus benefit Improve translation performance.
  • the embodiment of the present application does not limit the subject of execution of the translation method.
  • the translation method provided in the embodiment of the present application can be applied to data processing devices such as terminal devices or servers.
  • the terminal device may be a smart phone, a computer, a personal digital assistant (Personal Digital Assistant, PDA), or a tablet computer.
  • the server can be an independent server, a cluster server or a cloud server.
  • FIG. 1 this figure is a flow chart of a translation method provided by an embodiment of the present application.
  • the translation method provided by the embodiment of this application includes S1-S3:
  • source text to be processed refers to text data in the source language; and the “source text to be processed” needs to be translated into text content in the target language.
  • source text to be processed refers to Chinese text data; and the source text to be processed needs to be translated into English text data.
  • the embodiment of the present application does not limit the above-mentioned "source text to be processed".
  • the "source text to be processed” may include a sentence.
  • this embodiment of the present application does not limit the implementation manner of S1, for example, it may specifically be: the current text collected in real time from the text stream.
  • S1 may specifically include: after acquiring the current speech segment, performing speech recognition processing on the current speech segment to obtain the source text to be processed. (It should be noted that the simultaneous interpretation scene will be used as an example below)
  • the aforementioned "current speech segment” is used to represent a speech segment collected in real time from a speech stream (for example, a speech stream in a simultaneous interpretation scenario). For example, as shown in Figure 2, when a "third voice segment” is collected from the voice stream shown in Figure 2, the third voice segment can be determined as the current voice segment, so that the follow-up can be used for the current voice
  • the compression translation process of the segment can determine the translation result of the third speech segment in time.
  • the embodiment of the present application does not limit the collection frequency of the above-mentioned "speech segment", for example, it can be set according to the application scenario.
  • the collection frequency of the "speech segment” can be set according to the sentence length in the source language, so that the above-mentioned "current speech segment” includes one sentence.
  • extracting main source text is used to represent the core main information in the source text to be processed.
  • the extracted main source text can be "the development of artificial intelligence has brought opportunities to all countries”.
  • the embodiment of the present application does not limit the implementation of S2, for example, any existing or future method that can perform stem extraction for a text data (for example, a sentence simplification method, etc.) can be used for implementation.
  • any possible implementation manner of S2 shown in the second method embodiment can be used for implementation.
  • S3 Determine the simplified translation to be used according to the main source text, the translation length description data, and the pre-built compressed translation model.
  • translation length description data is used to describe the text length of the translation result of the source text to be processed; and the embodiment of the present application does not limit the “translation length description data", for example, it may include translation expected length, translation source expected length ratio at least one of the
  • text length is used to describe the length of a text data; and the embodiment of the present application does not limit the expression method of the "text length", for example, it can use the number of semantic units (for example, words and/or number of words) to represent.
  • the above “semantic unit” refers to the unit of semantic representation in a language; and the embodiment of the present application does not limit the "semantic unit", for example, the “semantic unit” can be a vocabulary or a character (such as , the semantic unit under Chinese can be Chinese characters or vocabulary, etc.).
  • expected translation length refers to the text length that the user expects the translation result of the source text to be processed to have; and the embodiment of this application does not limit the "expected translation length", for example, in order to avoid as much as possible the number of words at the translation end that is more than Due to the phenomenon of the number of words in the source text, the "expected translation length" can be close to the text length of the source text to be processed. For example, if the text length of the source text to be processed is 6 words, the "expected translation length" may be 6 words.
  • this embodiment of the present application does not limit the acquisition method of the above-mentioned "expected translation length", for example, it may be preset. For another example, it may be determined according to the setting operation triggered by the user for the "expected translation length”. As another example, length statistical analysis can be performed on a large number of sentences in the target language to obtain the "expected translation length".
  • translation source expected length ratio refers to the ratio between the text length that the user expects the translation result of the source text to be processed and the text length of the source text to be processed; and the embodiment of this application does not limit the "translation source Expected length ratio", for example, in order to avoid the phenomenon that the number of words in the translated text is more than the number of words in the source text, the "anticipated length ratio of translated source” can adopt a value close to 1 (for example, the above-mentioned "translated source Expected length ratio” can be 1 or 0.8).
  • the embodiment of the present application does not limit the manner of obtaining the above-mentioned "expected translation source length ratio", for example, it may be preset. For another example, it may be determined according to the setting operation triggered by the user on the "expected translation source length ratio”. As another example, the length ratio statistical analysis can be performed on a large number of sentence pairs in the target language and the source language to obtain the "expected translation source length ratio".
  • the above-mentioned “compressed translation model” is used to perform compression translation processing with controllable translation length for the input data of the compressed translation model.
  • the above-mentioned “compressed translation model” can refer to the description data of the translation length to perform compressed translation on the main source text, so that the compressed translation result for the main source text can reach the expected length of the translation represented by the description data of the translation length as much as possible, In this way, the translation process can be controlled to compress the translation length.
  • the embodiment of the present application does not limit the above-mentioned "compressed translation model", for example, it may be a machine learning model.
  • the compressed translation model shown in the third method embodiment can be used for implementation.
  • the source text to be processed for example, the speech recognition text of the current speech segment in the simultaneous speech stream
  • first extract the main source text from the source text to be processed so that The main source text is used to represent the core backbone information in the source text to be processed; then according to the main source text, translation length description data, and the pre-built compression translation model, determine the simplified translation to be used, so that the simplified translation to be used
  • the translation can express the semantic information carried by the source text to be processed with fewer translation characters, so that it can effectively avoid the phenomenon that the number of words in the translation is more than that in the source, so that the core meaning can be realized without losing the core meaning. Shortening the length of the translated text can effectively reduce the translation delay, which can improve the real-time performance of the translation, thereby improving the translation effect.
  • the main source text is obtained by extracting the main source text from the source text to be processed
  • the text length of the main source text is smaller than the text length of the source text to be processed, thus achieving the purpose of simplifying the text data in the source language
  • the condensed translation to be used is obtained by compressing the main source text through the compression translation model, so that the condensed translation to be used can represent the semantic information carried by the source text to be processed with fewer translation characters, so as to realize the simplification of the text in the target language Purpose of Data.
  • the embodiment of the present application realizes the compressed translation of the source text to be processed by simplifying both ends (that is, the source language side + the target language side), so as to ensure that the compressed translation result of the source text to be processed can be as fast as possible
  • the fewer characters in the translation represent the semantic information carried by the source text to be processed, which can effectively avoid the phenomenon that the number of words in the translation is more than the number of words in the source, and shorten the translation time without losing the core meaning. length, which in turn can effectively reduce the translation delay, which can improve the real-time translation, which can help improve the translation effect.
  • the condensed translation to be used is obtained by compressing translation with reference to the translation length description data by the compressed translation model, the text length of the condensed translation to be used is almost close to or even equal to the expected length of the translation represented by the translation length description data, Therefore, the text length of the condensed translation to be used is controllable, which can effectively ensure that the condensed translation to be used can express the semantic information carried by the source text to be processed with a reasonable number of words, thereby effectively Avoid the adverse effects caused by the uncontrollable number of words in the translation, so that the length of the translated text can be shortened without losing the core meaning, and the translation delay can be effectively reduced, which can improve the real-time translation and thus benefit Improve translation performance.
  • the text data can be filtered with the help of dependency syntax analysis technology and part-of-speech tagging technology.
  • S2 may specifically include S21-S24:
  • dependency syntax analysis processing is used to identify the directed dependency relationship between different words in a text data; and the embodiment of the present application does not limit the implementation of the above-mentioned “dependency syntax analysis processing", existing or future emerging Any one of the dependency parsing techniques for implementation.
  • dependency syntax analysis and processing is based on the dependency syntax theory, specifically: it is believed that there is a master-slave relationship between words, which is a binary unequal relationship.
  • the modifier is called a dependent
  • the modified word is called a head
  • the grammatical relationship between the two is called a dependency relation.
  • Figure 3 shows the directed dependencies.
  • dependency syntax analysis result is used to indicate the directed dependency relationship between different words in the source text to be processed; and the embodiment of the present application does not limit the representation of the "dependency syntax analysis result".
  • dependency syntax tree can be used to (as shown in Figure 3) for representation.
  • the dependency syntax tree is a multi-fork tree.
  • the root node of the dependency syntax tree is the core word "lay” in the sentence
  • each child node of the dependency syntax tree is a word or component dominated by the parent node.
  • "Root" is used to mark that "lay” is the root node.
  • S22 Perform part-of-speech tagging on the source text to be processed to obtain a part-of-speech tagging result.
  • part-of-speech tagging processing is used to identify and tag the part-of-speech of each vocabulary in a text data; and the embodiment of the present application does not limit the "part-of-speech tagging processing", which can be performed by using any existing or future part-of-speech tagging technology implement.
  • the above "part-of-speech tagging result" is used to indicate the part-of-speech of each vocabulary in the source text to be processed.
  • vocabulary importance representation data is used to represent the importance of each vocabulary in the source text to be processed; and this embodiment of the present application does not limit the “vocabulary importance representation data", for example, it may include a multi-fork tree to be used.
  • multi-fork tree to be used is a multi-fork tree structure; and the “multi-fork tree to be used” can not only indicate the importance of each word in the source text to be processed, but also can indicate the Directed dependencies between different words in (and the part of speech of each word in the source text to be processed).
  • this embodiment of the present application does not limit the method for expressing the importance of the above "multi-fork tree to be used". For example, when the distribution position of the root node in the above-mentioned "multi-fork tree to be used" is higher than that of other non-root nodes, and When the distribution position of the node is higher than the distribution position of the child nodes under the parent node, the "multi-fork tree to be used" includes multi-layer nodes, and the importance degree expression method of the "multi-fork tree to be used” can specifically be: for different For any two nodes in the same layer, the importance of the node at the upper layer is higher than that of the node at the lower layer; moreover, for any two nodes at the same layer, the importance of the node at the left is higher than The importance of nodes that are positioned to the right.
  • the embodiment of the present application does not limit the determination process of the above "multi-fork tree to be used". It may include: first referring to the preset importance analysis rules and part-of-speech tagging results, performing importance analysis on each subtree in the dependency syntax tree to obtain the degree of importance of each subtree in the dependency syntax tree; and then according to the dependency The importance of each subtree in the syntax tree, adjust the distribution position of each subtree in the dependent syntax tree, so that the order of all subtrees with the same parent node in the adjusted dependency syntax tree is arranged from left to right , are presented in descending order of the importance of the highest-level nodes in each subtree.
  • the above “importance analysis rules” can be preset, and the embodiment of the present application does not limit the “importance analysis rules", for example, it may include: (1) the importance of the parent node is higher than that of the parent node The importance of each child node under the node. (2) For multiple child nodes under a parent node, the importance of the first part of speech (for example, noun) is higher than that of the second part of speech (for example, verb), and the importance of the second part of speech is higher than that of the third part of speech The importance of parts of speech (e.g., adjectives), ....
  • S24 Determine the main source text according to the lexical importance characterization data and the source text to be processed.
  • S24 may specifically include S241-S247:
  • S241 Determine the node to be deleted according to the multi-fork tree to be used.
  • node to be deleted refers to a node that needs to be judged whether to be deleted from the multi-tree to be used.
  • the "node to be deleted” may include a leaf node in the multi-fork tree to be used, or may include a subtree in the multi-fork tree to be used ( That is, a parent node and all nodes below that parent node). It can be seen that the above “node to be deleted” may include at least one node in the multi-tree to be used. It should be noted that the above-mentioned “leaf node” refers to a node without a fork; the above-mentioned “parent node” refers to a node with a fork.
  • the embodiment of the present application does not limit the above-mentioned determination process of "node to be deleted". If the "node with the lowest importance" is a leaf node, then the "node with the lowest importance” is determined as the node to be deleted; if the node with the lowest importance is a parent node, then the "node with the lowest importance ” and all nodes below are determined as nodes to be deleted.
  • multi-fork tree to be used it can be traversed in a bottom-up and right-to-left manner, and the current traversed node (and all nodes below the currently traversed node) can be determined as pending Delete the node, so that it can be judged later whether to delete the currently traversed node (and all nodes below the currently traversed node) from the multi-fork tree to be used.
  • the "currently traversed node” refers to a node (for example, a leaf node or a parent node) to be traversed in the multi-fork tree to be used in the current round.
  • S242 Determine the deletion identification result of the node to be deleted according to the length of the deleted text corresponding to the node to be deleted and the text length of the source text to be processed.
  • length of the deleted text corresponding to the node to be deleted refers to the text length of the vocabulary represented by all the remaining nodes after the node to be deleted is deleted from the multi-tree to be used.
  • deletion recognition result of a node to be deleted is used to indicate whether to delete the node to be deleted from the source text to be processed.
  • the embodiment of the present application does not limit the determination process of the above-mentioned "deletion identification result of the node to be deleted", for example, it may specifically include Step 11-Step 15:
  • Step 11 pre-delete the node to be deleted from the multi-tree to be used, and obtain the multi-tree after pre-deletion.
  • multi-fork tree after pre-deletion is used to indicate the multi-fork tree to be used that does not include the node to be deleted, so that the "multi-fork tree after pre-deletion” is included after the node to be deleted is deleted from the multi-fork tree to be used all remaining nodes. It should be noted that the above “pre-deletion” is a deletion demonstration action; and the “pre-deletion” will not change the multi-tree to be used.
  • Step 12 According to the pre-deleted multi-tree and the source text to be processed, determine the length of the deleted text corresponding to the node to be deleted.
  • the pre-deleted text can be determined according to the pre-deleted multi-tree and the source text to be processed, so that the pre-deleted text only includes the pre-deleted text. Semantic units represented by all nodes in the multi-fork tree after deletion; then the text length of the pre-deleted text is determined as the length of the deleted text corresponding to the node to be deleted.
  • Step 13 Determine the length ratio between the length of the deleted text and the text length of the source text to be processed.
  • Step 14 Comparing the length ratio with a preset ratio threshold to obtain a comparison result to be used.
  • preset ratio threshold can be preset, or can be mined from a large number of sentence comparison values in the target language and the source language.
  • the above-mentioned “comparison result to be used” is used to indicate the relative size between the above-mentioned “length ratio between the length of the deleted text and the text length of the source text to be processed” and the above-mentioned “preset ratio threshold”.
  • Step 15 Determine the deletion identification result of the node to be deleted according to the comparison result to be used.
  • the comparison result to be used indicates that the above-mentioned "length ratio between the length of the deleted text and the text length of the source text to be processed" is higher than the above-mentioned "preset ratio threshold ", then it can be determined that too much character information will not be deleted when deleting the node to be deleted from the multi-fork tree to be used, so that it can be determined that the node to be deleted can be deleted from the multi-fork tree to be used, so the deleted mark can be (for example, "1"), determined as the deletion identification result of the node to be deleted; however, if the comparison result to be used indicates that the above-mentioned "length ratio between the text length after deletion and the text length of the source text to be processed" is not high Based on the above-mentioned "preset ratio threshold", it can be determined that when deleting a node to be deleted from the multi-fork tree to be used, too much character information may be deleted,
  • the deletion identification result of the node to be deleted can be determined according to the ratio between the length of the deleted text corresponding to the node to be deleted and the text length of the source text to be processed, So that the deletion identification result can indicate whether the node to be deleted is deleted from the source text to be processed.
  • S243 Determine whether the deletion identification result of the node to be deleted meets the preset deletion condition, if yes, execute S244-S245; if not, execute S245.
  • the above-mentioned "preset deletion condition" can be preset, for example, it can specifically include: the above-mentioned “deletion identification result of the node to be deleted” indicates that the node to be deleted can be deleted (for example, the above-mentioned "deletion identification result of the node to be deleted” including deleted markers).
  • the deletion identification result indicates that the node to be deleted can be deleted, it can be determined that the deletion identification result satisfies the preset deletion condition, so the node to be deleted can be directly deleted from It can be deleted in the multi-fork tree to be used; if the deletion identification result indicates that the node to be deleted cannot be deleted, it can be determined that the deletion identification result does not meet the preset deletion conditions, so the node to be deleted can be kept in the Use multi-tree.
  • the node to be deleted after it is determined that the deletion identification result of the node to be deleted satisfies the preset deletion condition, the node to be deleted can be deleted from the multi-fork tree to be used, and the deleted multi-fork tree to be used can be obtained, which can realize For the update process of the multi-tree to be used, subsequent operations (for example, the next round of traversal process) can be performed based on the deleted multi-tree to be used.
  • S245 Determine whether the preset stop condition is met, if yes, execute S246; if not, return to execute S241.
  • preset stop condition may be preset, for example, it may specifically be: all nodes except the root node in the multi-fork tree to be used have been traversed.
  • S246 Determine the main source text according to the multi-tree to be used and the source text to be processed.
  • the main source text can be extracted from the source text to be processed according to the multi-fork tree to be used , so that the main source text only includes the semantic units represented by all nodes in the multi-fork tree to be used, so that the main source text can express the semantic information carried by the source text to be processed with a small number of characters .
  • the main source information of the source text to be processed can be extracted with the help of dependency syntax analysis technology and part-of-speech tagging technology to obtain the main source text, so that the main text
  • the source text can express the semantic information carried by the source text to be processed with as few words as possible, which is conducive to improving the effect of backbone information extraction, and thus helping to shorten the length of the translated text without losing the core meaning , which in turn can effectively reduce the translation delay, which can improve the real-time performance of translation, thereby helping to improve the translation effect.
  • this application can comprehensively integrate the expected length information from the source end of the translation model, the target end of the translation model, and the decoding stage of the translation model. Realize more natural and complete compression translation processing under the condition of ensuring the length is controllable.
  • the embodiment of the present application provides a possible implementation of the above “compressed translation model”.
  • the “compressed translation model” may include an encoder and a decoder (for example, the compressed translation model).
  • the above-mentioned “compressed translation model” includes an encoder and a decoder
  • the above-mentioned “compact translation to be used” determination process may specifically include steps 21-23:
  • Step 21 According to the main source text and the translation length description data, determine the features to be encoded.
  • feature to be encoded refers to a feature that needs to be encoded.
  • step 21 does not limit the implementation of step 21, for example, it may specifically include steps 211-213:
  • Step 211 Determine the text features to be used according to the main source text.
  • the above "text feature to be used” can be used to represent the character information carried by the main source text.
  • step 211 does not limit the implementation of step 211, for example, any existing or future text feature extraction method (eg, Word2Vec) can be used for implementation.
  • any existing or future text feature extraction method eg, Word2Vec
  • step 211 which may specifically include: describing the data according to the main source text and the translation length, Determine the text feature to be used, so that the "text feature to be used" can not only represent the character information carried by the main source text, but also represent the translation length information carried by the translation length description data.
  • step 211 which may specifically include steps 2111-2112:
  • Step 2111 According to the translation length description data, determine the proportion interval of the length to be used.
  • step 2111 may specifically include: first determining the ratio between the expected length of the translation and the text length of the source text to be processed as the ratio of the expected length of the translation source; Then search for a candidate length ratio interval including the expected length ratio of the translation source from at least one candidate length ratio interval, and determine it as the length ratio interval to be used, so that the length ratio interval to be used includes the expected length ratio of the translation source.
  • At least one candidate length ratio interval refers to the ratio interval that needs to be learned during the construction of the above “compression model”; and the relevant content of the “at least one candidate length ratio interval” please refer to the following Relevant content of step 53.
  • step 2111 may specifically include: searching for a candidate length ratio interval including the expected length ratio of the translation source from at least one candidate length ratio interval, and determining it as the expected length ratio of the translation source.
  • the length ratio interval is used so that the length ratio interval to be used includes the expected length ratio of the translation source.
  • Step 2112 Determine the text features to be used according to the length ratio interval to be used and the main source text.
  • step 2112 In order to facilitate understanding of step 2112, four possible implementation manners are described below.
  • step 2112 may specifically include step 31-step 32:
  • Step 31 Splicing the to-be-used length ratio interval and the main source text to obtain the first text, so that the first text includes the to-be-used length ratio interval and the main source text.
  • step 31 does not limit the implementation of step 31.
  • it may specifically include: adding the length ratio interval to be used to the head position of the main source text to obtain the first text, so that the first text can be expressed as ⁇ Length ratio interval to be used, main source text ⁇ .
  • Step 32 Perform vectorization processing on the first text to obtain text features to be used.
  • step 2112 Based on the first possible implementation of the above step 2112, it can be seen that after obtaining the length ratio interval to be used and the main source text, the two data can be spliced first and then vectorized to obtain the text features to be used.
  • step 2112 may specifically include step 41-step 43:
  • Step 41 Find the interval identifier corresponding to the length ratio interval to be used from the preset mapping relationship, and obtain the interval identifier to be used.
  • the preset mapping relationship includes the corresponding relationship between the to-be-used length ratio interval and the to-be-used interval identifier.
  • preset mapping relationship is used to record the interval identification corresponding to each candidate length ratio interval; and the embodiment of the present application does not limit the “preset mapping relationship", for example, it may specifically include: the first candidate length ratio interval and The correspondence between the first interval identifiers, the correspondence between the second candidate length ratio interval and the second interval identifier, ... (and so on), and the Qth candidate length ratio interval and the Qth Correspondence between interval identifiers.
  • Q is a positive integer
  • Q represents the number of candidate length proportional intervals in the aforementioned "at least one candidate length proportional interval”.
  • the above "qth interval identifier” refers to the interval identifier corresponding to the qth candidate length ratio interval, so that the "qth interval identifier" is used to represent the qth candidate length ratio interval; and the implementation of the present application
  • the example does not limit the relationship between the qth candidate length ratio interval and the qth interval identifier, for example, the qth interval identifier (for example, [0.8]) is based on the qth candidate length ratio interval (for example, 0.7 -1.1) determined by a proportional value.
  • q is a positive integer, and q ⁇ Q.
  • the length ratio interval to be used can be matched with each candidate length ratio interval in the preset mapping relationship; and then the interval corresponding to the successfully matched candidate length ratio interval , is determined as the to-be-used interval identifier, so that the to-be-used interval identifier can represent the to-be-used length ratio interval.
  • Step 42 Splicing the identifier of the section to be used and the main source text to obtain the second text.
  • step 42 may specifically include: adding the interval identifier to be used to the head position of the main source text to obtain the second text, so that the second text can be expressed as ⁇ to be Use interval identifier, main source text ⁇ .
  • Step 43 Perform vectorization processing on the second text to obtain text features to be used.
  • the interval identifier corresponding to the length ratio interval to be used can be determined first;
  • the two data of the text are concatenated and vectorized in turn to obtain the text features to be used.
  • step 2112 may specifically include: first performing vectorization processing on the main source text to obtain a text representation vector, so that the text representation vector can represent the character information carried by the main source text; The length ratio interval to be used and the text representation vector are spliced to obtain the text features to be used.
  • the embodiment of the present application does not limit the implementation of the above-mentioned "stitching".
  • the to-be-used length ratio interval can be added to the head position of the text representation vector to obtain the to-be-used text feature, so that the The first eigenvalue of the text feature to be used is the left boundary point of the above-mentioned "length ratio interval to be used", the second eigenvalue is the right boundary point of the above-mentioned "length ratio interval to be used”, and other eigenvalues are from The above-mentioned "text representation vector".
  • the main source text can be vectorized first, and then the length ratio interval to be used and the vector The results of the chemical processing are spliced to obtain the text features to be used.
  • step 2112 may specifically include: first vectorize the main source text to obtain the text representation vector, and search for the interval identifier corresponding to the length ratio interval to be used from the preset mapping relationship, and obtain The interval identifier to be used; and then splicing the interval identifier to be used and the text representation vector to obtain the text feature to be used.
  • the embodiment of the present application does not limit the implementation of the above-mentioned "splicing".
  • the to-be-used interval identifier can be added to the head position of the text representation vector to obtain the to-be-used text features, so that the to-be-used
  • the first eigenvalue in the used text feature is the above-mentioned "interval identifier to be used", and other eigenvalues are all from the above-mentioned "text representation vector”.
  • the features of the text to be used can be determined according to the main source text (and the translation length description data), so that the text features to be used can represent the main source
  • the character information carried in the text and the translation length information carried in the translation length description data
  • the subsequent encoding process can be performed on the text feature to be used.
  • Step 212 Determine the location feature to be used according to the text feature to be used and the translation length description data.
  • position feature to be used is used to represent the character position information carried in the main source text and the translation length information carried in the translation length description data.
  • step 212 does not limit the implementation manner of step 212, for example, any existing or future location feature extraction method may be used for implementation.
  • step 212 in order to further prevent the translation length description data from being forgotten during the entire model processing process, the embodiment of the present application also provides another possible implementation of step 212, which will be described below with examples.
  • step 212 may specifically include steps 2121-2122:
  • Step 2121 According to the position index of the nth feature value in the text feature to be used, the translation length description data, and the dimension index of the nth feature value, determine the position encoding result of the nth feature value.
  • n is a positive integer
  • n ⁇ N and N is a positive integer.
  • position index of the nth feature value is used to indicate the position of the nth feature value in the above “text feature to be used”.
  • the above “dimension index of the nth feature value” is used to indicate the location of the position encoding result of the nth feature value in the above "position feature to be used”.
  • the embodiment of the present application does not limit the implementation manner of step 2121.
  • two possible implementation manners are taken as examples below for description.
  • step 2121 may specifically include: according to the difference between the expected length of the translation and the position index of the nth feature value in the text feature to be used, and the dimension index of the nth feature value , to determine the position encoding result of the nth eigenvalue (as shown in formulas (1)-(2)).
  • the above-mentioned "expected translation length” can be determined according to the translation length description data.
  • the above-mentioned “translation length description data” may include the expected length of the translation.
  • the expected length of the translation is determined according to the product of the text length of the source text to be processed and the expected length ratio of the translation source.
  • step 2121 may specifically include: according to the ratio between the position index of the nth feature value in the text feature to be used and the expected length of the translation, and the dimension index of the nth feature value, Determine the position encoding result of the nth eigenvalue (as shown in formulas (3)-(4)).
  • the position index of the nth feature value, the translation length description data, and the dimension index of the nth feature value can be used , determine the position encoding result of the nth feature value, so that the position encoding result can not only express the text position information of the semantic unit represented by the nth feature value, but also express the translation length description data carried Translation length information.
  • n is a positive integer
  • n ⁇ N and N is a positive integer.
  • Step 2122 According to the position encoding results of the first feature value to the position encoding results of the Nth feature value, determine the position feature to be used.
  • the position encoding results of the N feature values can be collected to obtain the position feature to be used, So that the first-dimensional feature of the position feature to be used is the above-mentioned "position encoding result of the first feature value", the second-dimensional feature is the above-mentioned "position encoding result of the second feature value”, ...
  • the N-th dimension feature is the above-mentioned "position encoding result of the N-th feature value", so that the dimension of the position feature to be used is consistent with the dimension of the "text feature to be used" above, so that it will be convenient to use
  • the location feature and the "text feature to be used” are summed.
  • the location features to be used can be determined according to these two data, so that the location features to be used can not only represent the main source
  • the character position information carried in the text can also indicate the translation length information carried in the translation length description data.
  • Step 213 Obtain the features to be encoded according to the text features to be used and the location features to be used.
  • these two features can be summed (or concatenated) to obtain the feature to be encoded, so that the feature to be encoded can be more It can well represent the semantic information carried by the main source text and the translation length information carried by the translation length description data.
  • the features to be encoded can be extracted from these two data, so that the features to be encoded can represent the semantics carried by the main source text information, and the translation length information carried by the translation length description data.
  • Step 22 Input the feature to be encoded into the encoder, and obtain the feature encoding result output by the encoder.
  • the above-mentioned “encoder” is used to perform encoding processing on the input data of the encoder; and the embodiment of the present application does not limit the “encoder”, and any existing or future encoding network can be used for implementation.
  • the above “compressed translation model” is implemented using a Transformer structure
  • the above “encoder” may include multiple encoding layers (that is, the Encoder network in Transformer).
  • Step 23 Determine the condensed translation to be used according to the feature encoding result and the decoder.
  • the embodiment of the present application does not limit the implementation manner of the above-mentioned “decoder”, and any existing or future decoding network can be used for implementation.
  • the above “decoder” may include multiple decoding layers (that is, the Decoder network in Transformer).
  • the above "translation length description data” can be integrated into the decoder as a vector to encourage the decoder to perform decoding processing according to the expected length of the translation as much as possible .
  • the embodiment of the present application also provides another possible implementation of step 23, which specifically may include: determining the condensed translation to be used according to the feature encoding result, the translation length description data, and the decoder.
  • the embodiment of this application also provides a possible implementation of the above "decoder".
  • the " A "decoder” may comprise at least one first decoding layer.
  • first decoding layer is used to refer to the expected length of the translation and perform decoding processing on the input data of the first decoding layer (for example, the decoding processing shown in "decoding network 0" in FIG. 4).
  • first decoding layer does not limit the above "first decoding layer", for example, it may include: a first decoding module, an information fusion module and a first normalization module; and the input data of the first normalization module includes The output data of the first decoding module and the output data of the information fusion module ("decoding network 0" shown in FIG. 4).
  • first decoding module is used to perform decoding processing on the input data of the first decoding module; and the “first decoding module” in the embodiment of the present application, for example, as shown in Figure 4, may include a self-attention layer (Self-Attention), two summation and normalization layers (Add&Normalize), codec attention layer (Encoder-Dncoder Attention), and feedforward neural network layer (Feed Forward).
  • Self-Attention self-attention layer
  • Add&Normalize two summation and normalization layers
  • Encoder-Dncoder Attention codec attention layer
  • Feeforward feedforward neural network layer
  • the above-mentioned “information fusion module” is used to multiply the input data of the information fusion module and the expected length of the translation; and the embodiment of the present application does not limit the above-mentioned “input data of the information fusion module", for example, the "information fusion module's
  • the "input data” may be the input data of the above-mentioned "first decoding layer” (for example, the input data of the self-attention layer in the decoding network 0 shown in FIG. 4 ).
  • first normalization module is used to add and normalize the input data of the first normalization module; and the embodiment of the present application does not limit the implementation of the "first normalization module” , for example, as shown in Figure 4, which can be implemented with summation and normalization layers.
  • the embodiment of the present application does not limit the working principle of the above "first normalization module", for example, it may specifically include: when the first decoding layer performs the first frame decoding operation (that is, for the above "feature encoding result" When the represented first character is decoded), the “first normalization module” can add and normalize the output data of the above-mentioned “information fusion module” and the output data of the above-mentioned “first decoding module” processing (as shown in formula (5)); when the first decoding layer performs non-first frame decoding operation (that is, performs decoding processing on the non-first character represented by the above-mentioned "feature encoding result"), the "first A normalization module” can only perform summation and normalization processing on the output data of the above-mentioned "first decoding module".
  • layer 1 represents the decoding operation result of the first frame of the first decoding layer
  • x represents the input data of the first decoding layer (for example, the input data of the self-attention layer in the decoding network 0 shown in Figure 4); len Indicates the expected length of the translation
  • x ⁇ len indicates the output result of the information fusion module in the first decoding layer
  • DM i (x) indicates the output result of the first decoding module in the first decoding layer
  • LayerNorm() indicates the first decoding Function to compute the first normalization module in a layer.
  • the embodiment of the present application also provides another possible implementation of the above "decoder".
  • the "decoder” not only includes at least one first decoding layer, but may also include at least one second decoding layer. decoding layer.
  • second decoding layer is used to perform decoding processing on the input data of the decoding layer; and the embodiment of the present application does not limit the “second decoding layer”, for example, it can use any existing or future
  • the decoding network for example, the Decoder network in Transformer
  • the following description will be given in combination with examples.
  • the above-mentioned “second decoding layer” may include a second decoding module and a second normalization module; and the input data of the second normalization module includes the output data of the second decoding module (as shown in Figure 4 "Decoding Network 1").
  • second decoding module is similar to the above-mentioned “first decoding module”; and the above-mentioned “second normalization module” can be implemented by using summation and normalization layers.
  • the difference between the above-mentioned “second decoding layer” and the above-mentioned “first decoding layer” is: the “second decoding layer” does not need to refer to the expected length of the translation when decoding (as shown in Figure 4 " decoding network 1"), but the “first decoding layer” needs to refer to the expected length of the translation when decoding ("decoding network 0" as shown in Figure 4).
  • the above “decoder” may include one first decoding layer and J second decoding layers.
  • the input data of the 1st second decoding layer includes the output data of the first decoding layer;
  • the input data of the jth second decoding layer includes the output data of the j-1th second decoding layer, and j is a positive integer, 2 ⁇ j ⁇ J; J is a positive integer.
  • the input of the Jth second decoding layer in the decoder includes the output data of the J-1th second decoding layer, the input data of the J-1th second decoding layer includes the output data of the J-2th second decoding layer, ... (and so on), the first
  • the input data of the 3 second decoding layers includes the output data of the 2nd second decoding layer, the input data of the 2nd second decoding layer includes the output data of the 1st second decoding layer, and the 1st second decoding layer
  • the input data of a layer comprises the output data of the first decoding layer.
  • the decoder can add the expected length of the translation as a constraint to the initial layer of the decoder, so that the expected length of the translation can be layer by layer in the decoder and the specific process can be: when the decoder performs the initial layer operation, the expected length information can be multiplied as a vector with the initial layer input of the decoder to obtain the expected information length unit, so that the expected information
  • the length unit is propagated layer by layer in the decoder, and is attenuated layer by layer through the forward propagation operation and nonlinear mapping transformation in each layer, and finally motivates the translation model to translate a text sequence closer to the expected length.
  • the above-mentioned "decoder” is implemented by a heuristic fusion decoding method; and the heuristic fusion decoding method can integrate the expected length of the translation into the decoder as a vector, so as to encourage the compression translation model including the decoder to be able to follow
  • the expected length of the translation is rewritten in a sentence pattern, so that unimportant information can be truncated, and longer expressions can be converted into more concise expressions while expressing the same semantics, which is beneficial to improve the compression translation effect.
  • the nonlinear activation function in each network layer is like a door, which can filter some information from specific units in each network layer.
  • the information attenuation occurs layer by layer in the nonlinear activation function, so that different expected lengths have different degrees of information attenuation, so that the compressed translation model can attenuate through its own length information inspired by the expected length information To learn the possibility of generating the end-of-sentence symbol EOS, so that the compressed translation model can generate natural and complete compressed translation results under the given expected length constraint of the translation.
  • the translation length description data can be first introduced into the encoding device, so that the encoder can refer to the translation length description data, encode the main source text, and obtain the characteristic encoding result; then introduce the translation length description data into the decoder, so that the decoder can refer to the translation length description Data, decode the result of the feature encoding, and obtain the simplified translation to be used, so that the short rewriting of the expression can be realized on the premise of deleting as little information as possible, so as to realize the simplified compression translation based on end-to-end controllable length processing, which in turn can make the translation result for the source text to be processed more refined.
  • the embodiment of the present application does not limit the implementation of the "linear layer” in FIG. 4 , and any existing or future linear layer (Linear) can be used for implementation.
  • the embodiment of the present application does not limit the implementation of the "decision-making layer” in FIG. 4 , and any existing or future decision-making layer (eg, Softmax) can be used for implementation.
  • this embodiment of the present application does not limit the implementation of the "codec attention layer” in Figure 4, and any existing or future method of attention processing based on the output data of the encoder can be used (for example, Multi-Head Attention in Transformer) is implemented.
  • the embodiment of the present application also provides a possible implementation manner of constructing the above-mentioned "compressed translation model", which may specifically include steps 51-53:
  • Step 51 Obtain at least one sample original text and the actual translation corresponding to the at least one sample original text.
  • sample original text refers to the text data in the source language that needs to be used when constructing the compressed translation model; and the embodiment of the present application does not limit the number of the above-mentioned “sample original text", for example, it can be D. Among them, D is a positive integer.
  • the "actual translation corresponding to the d-th sample original text” refers to the actual translation result of the d-th sample original text in the target language; and this embodiment of the application does not limit the "actual translation corresponding to the d-th sample original text", for example, In order to avoid the phenomenon that the number of words in the target text is more than that in the source text, the text length of the "actual translation corresponding to the d-th sample original text” is relatively close to (or even smaller than) the text length of the above-mentioned "sample original text". Wherein, d is a positive integer, d ⁇ D.
  • Step 52 According to the text length of the actual translation corresponding to each sample original text, determine the translation length description data corresponding to each sample original text.
  • the description data of the translation length corresponding to the d-th sample original text is used to describe the text length of the translation result of the d-th sample original text in the target language; and the embodiment of this application does not limit the "translation length corresponding to the d-th sample original text Description data", for example, it may include: the ratio between the text length of the actual translation corresponding to the d-th sample original text and the text length of the d-th sample original text, and the text of the actual translation corresponding to the d-th sample original text at least one of the lengths.
  • d is a positive integer, d ⁇ D.
  • Step 53 Construct a compressed translation model according to at least one sample original text, the translation length description data corresponding to the at least one sample original text, and the actual translation corresponding to the at least one sample original text.
  • step 53 may specifically include step 531-step 538:
  • Step 531 According to the translation length description data corresponding to at least one sample original text, determine at least one candidate length ratio interval and a preset mapping relationship.
  • step 531 may specifically include step 5311-step 5316:
  • ratio of the translation source length corresponding to the d-th sample original text refers to the ratio between the text length of the actual translation corresponding to the d-th sample original text and the text length of the d-th sample original text.
  • the maximum value of the D translation source length ratios can be determined as the upper limit of the ratio range to be used, and the D The minimum value in the translation source length ratio is determined as the lower limit of the ratio range to be used.
  • the ratio range to be used may be evenly divided into Q candidate length ratio intervals.
  • Q represents the number of candidate length proportional intervals in the above "at least one candidate length proportional interval”.
  • each candidate length ratio interval respectively determine an interval identifier corresponding to each candidate length ratio interval.
  • the "interval identifier corresponding to the qth candidate length ratio interval" is used to represent the qth candidate length ratio interval.
  • q is a positive integer, and q ⁇ Q.
  • the embodiment of the present application does not limit the determination process of the above-mentioned "interval identifier corresponding to the qth candidate length ratio interval", for example, it may be based on a ratio value in the qth candidate length ratio interval (for example, 0.7-1.1) , determine the interval identifier corresponding to the qth candidate length ratio interval (for example, [0.8]).
  • q is a positive integer, and q ⁇ Q.
  • At least one candidate length ratio interval and a preset mapping relationship can be determined according to the translation length description data corresponding to these sample original texts.
  • Step 532 According to the translation length description data corresponding to the d-th sample original text, at least one candidate length ratio interval, and a preset mapping relationship, determine the length ratio interval identifier corresponding to the d-th sample original text.
  • d is a positive integer, d ⁇ D.
  • Step 534 Determine the text extraction feature corresponding to the d-th sample original text according to the d-th sample original text and the length ratio interval identifier corresponding to the d-th sample original text.
  • d is a positive integer, d ⁇ D.
  • Step 535 Input the text extraction features corresponding to the d-th sample original text into the compressed translation model, and obtain the model-predicted translation result corresponding to the d-th sample original text output by the compressed translation model.
  • d is a positive integer, d ⁇ D.
  • Step 536 Determine whether the preset end condition is met, if yes, execute step 538; if not, execute step 537.
  • the above “preset end condition” may be preset, for example, it may include: the model loss value of the compressed translation model is lower than the preset loss threshold, and the change rate of the model loss value of the compressed translation model is lower than the preset change rate threshold ( That is, the model reaches convergence), and the number of updates of the compressed translation model reaches at least one of the preset times thresholds.
  • model loss value of the compressed translation model is used to represent the compressed translation performance of the compressed translation model; and the embodiment of the present application does not limit the determination process of the "model loss value of the compressed translation model", for example, the existing Or implement any method for determining the model loss value that will appear in the future.
  • Step 537 Update the compressed translation model according to the model-predicted translation result corresponding to at least one sample original text and the actual translation corresponding to the at least one sample original text, and return to step 535.
  • Step 538 Save the compressed translation model.
  • the compressed translation model can be saved so that the compressed translation model can be used to participate in simultaneous interpretation later process.
  • the length ratio of the target text to the source text is calculated; and then these length ratios are discretized into multiple interval, and use a scale marker as a proxy for the proportion of that interval.
  • the sentence pairs marked with different proportions are sampled, and the data volume of the sentence pairs marked with different proportions is kept in a relatively balanced state, so that the encoder in the compressed translation model can mark the length interval
  • the information of each word in the sentence is integrated into the hidden layer vector representation of each word in the sentence, so that the text vector with the same scale mark can be projected to the vector cluster corresponding to the scale information in the semantic representation vector space of the encoder, so that The semantic representation vector space of the entire encoder will form clusters corresponding to multiple scale markers. It can be seen that the mapping between source text vectors with different scale marks and target text vectors of different lengths can be learned through the overall training of the model.
  • the text interception process can be performed on the compressed translation result of each speech segment, so that the upper-screen translation of each speech segment strictly meets the corresponding requirements of each speech segment.
  • the expected length of the translation can be performed on the compressed translation result of each speech segment, so that the upper-screen translation of each speech segment strictly meets the corresponding requirements of each speech segment.
  • the embodiment of this application also provides another possible implementation of the above "translation method".
  • the translation method not only includes S1-S2, but also includes S4-S6:
  • S4 Determine the condensed translation to be used according to the main source text, the translation length description data, the compressed translation model, and at least one historical semantic unit.
  • the above-mentioned "at least one historical semantic unit” refers to the semantic unit that was not used (for example, not uploaded to the screen or not sent to the user) in the previous compressed translation result because it exceeded the expected length of the translation.
  • the previous compressed translation result is "Artificial intelligence is loved by all countries”
  • the expected length of the translation corresponding to the previous compressed translation result is 5 words
  • the characters to be used corresponding to the previous compressed translation result are "Artificial intelligence is loved by”
  • the unused characters corresponding to the previous compressed translation result are "all countries”
  • “at least one historical semantic unit” corresponding to the current compressed translation process is "all countries”.
  • characters to be used refer to translated characters sent to the user.
  • previous compression translation result refers to the result of compression translation for the previous speech segment of the current speech segment. It can be seen that the above “at least one semantic unit left over from history” can be determined according to the previous speech segment of the current speech segment.
  • the collection time of the "previous speech segment of the current speech segment” is adjacent to the collection time of the "current speech segment”; and the collection time of the "previous speech segment of the current speech segment” is earlier than the "current speech segment” collection time. For example, as shown in FIG. 2, if the current speech segment is the "third speech segment”, then the "previous speech segment of the current speech segment” is the "second speech segment”.
  • this embodiment of the present application does not limit the implementation of S4.
  • the number of semantic units in the above-mentioned “simplified translation to be used” is G, and G
  • the determination process of the gth semantic unit in the above-mentioned “condensed translation to be used” includes steps 61-62:
  • Step 61 If g ⁇ K, then determine the gth semantic unit in the simplified translation to be used according to the main source text, the translation length description data, the compressed translation model, and the gth historical semantic unit left over.
  • g is a positive integer, g ⁇ K.
  • step 61 may specifically include step 611-step 614:
  • Step 611 According to the main source text, the translation length description data, and the compressed translation model, determine the model prediction probability in the gth state.
  • model prediction probability in the gth state refers to the distribution probability of the gth semantic unit obtained by compressing the translation model for the main source text (for example, the outputted by the "decision-making layer” shown in Figure 4).
  • the prediction probability of g semantic units), so that the "model prediction probability in the gth state” is used to indicate that the gth semantic unit in the compressed translation result of the main source text is each candidate semantic unit (for example, each candidate word) possibility.
  • step 611 which can be implemented by using the working principle of predicting the gth semantic unit in the compressed translation result of the main source text in the "compressed translation model” above; and
  • step 611 any implementation manner of the "compressed translation model" shown above can be applied to step 611 .
  • Step 612 Determine the penalty factor value (as shown in formula (6)) according to the model prediction probability in the gth state and the object prediction probability of the gth historical semantic unit.
  • punish g represents the penalty factor value
  • y g represents the model prediction probability in the g-th state
  • y′ g represents the object prediction probability of the g-th historical semantic unit
  • ⁇ (y g , y′ g ) is used to represent the The simulated annealing distribution of the model-predicted probability at state g and the object-predicted probability of the g-th historical legacy semantic unit.
  • object prediction probability of the gth historical semantic unit refers to the probability distribution of the penultimate K-g+1 semantic unit in the previous compression translation result, so that the "object prediction probability of the gth historical semantic unit "probability" is used to represent the possibility that the penultimate K-g+1th semantic unit in the previous compressed translation result is each candidate semantic unit (eg, each candidate word).
  • the embodiment of the present application does not limit the above-mentioned "object prediction probability of the g-th historical semantic unit left over", for example, if the above-mentioned "reciprocal K-g+1 semantic unit "corresponding penalty factor value, it can be determined that the model prediction probability of the "reciprocal K-g+1 semantic unit” has not been corrected, so the "reciprocal K-g+1 semantic unit” can be directly
  • the model prediction probability of the above-mentioned "the object prediction probability of the g-th historical legacy semantic unit” can be determined; however, if there is the above-mentioned "reciprocal K-g+1 semantic unit "corresponding penalty factor value, it can be determined that the model prediction probability of the "reciprocal K-g+1 semantic unit” has been corrected, so the "reciprocal K-g+1 semantic unit” can be
  • the prediction correction probability is determined as the above-mentioned "object prediction probability of the g-th historical legacy semantic unit".
  • Step 613 Perform weighted summation of the model prediction probability sum and the penalty factor value in the gth state to obtain the prediction correction probability in the gth state (as shown in formula (7)).
  • p g represents the prediction correction probability in the g-th state
  • y g represents the model prediction probability in the g-th state
  • y′ g represents the object prediction probability of the g-th historical semantic unit
  • ⁇ (y g , y′ g ) It is used to represent the simulated annealing distribution of the predicted probability of the model in the gth state and the predicted probability of the object of the gth historical legacy semantic unit;
  • the adjustment strategy is specifically: if the translation result needs to be more complete and naturally fit the above, then the ⁇ value needs to be increased; otherwise, if a shorter translation result is required, the ⁇ value needs to be adjusted down. It can be seen that by setting the blending ratio ⁇ , we can obtain a smoother compression result while controlling the length of the translation result more precisely.
  • Step 614 Determine the g-th semantic unit according to the predicted correction probability in the g-th state.
  • the gth semantic unit can be determined according to the predicted correction probability in the gth state (for example, the predicted correction probability in the gth state directly has The candidate semantic unit with the highest probability value is determined as the gth semantic unit).
  • the K historical semantic units can be referred to to determine the translation result of the current speech segment
  • the first K semantic units so that the K semantic units can express the semantic information carried by the K historical legacy semantic units as much as possible, so that it can effectively avoid the result of mandatory interception processing for the previous compressed translation results.
  • the phenomenon of information omission makes the real-time translation of the voice stream more natural and smooth, which is beneficial to improve the compression translation effect.
  • Step 62 If K ⁇ g ⁇ G, then determine the gth semantic unit in the simplified translation to be used according to the main source text, the translation length description data, the compressed translation model, and the gth historical semantic unit left over.
  • g is a positive integer
  • K ⁇ g ⁇ G is a positive integer
  • G ⁇ K is a positive integer
  • G represents the number of semantic units in the above-mentioned "condensed translation to be used”.
  • step 62 does not limit the implementation of step 62, for example, it can use the above "compressed translation model" to predict the gth semantic unit in the compressed translation result of the main source text
  • the working principle is implemented; and the implementation process can be specifically as follows: first, according to the main source text, the translation length description data, the compressed translation model, and the gth historical semantic unit, determine the model prediction probability in the gth state; then according to the first The model predicts the probability in the state g, and determines the g-th semantic unit (for example, directly determines the candidate semantic unit with the highest probability value in the model-predicted probability in the g-th state as the g-th semantic unit).
  • step 62 any implementation of the "compressed translation model” shown above can be applied to step 62 to realize “according to the main source text, the translation length description data, the compressed translation model, and the gth historical legacy Semantic unit that determines the model-predicted probability at the gth state".
  • the first K in the translation result is K
  • the K semantic units refer to these K historical semantic units, but the K+1 semantic units and subsequent semantic units in the translation result are implemented according to the traditional model prediction method, so that the translation result can not only express
  • the semantic information carried by the current speech segment can also be displayed, and the semantic information carried by the K historical semantic units can also be displayed, which can effectively avoid the phenomenon of information omission caused by the mandatory interception processing for the previous compressed translation result, thereby It makes the real-time translation of the voice stream more natural and smooth, which is beneficial to improve the compression translation effect.
  • the compressed translation model can refer to the translation length description data and at least one historical semantic unit to perform compressed translation processing on the main source text to obtain a simplified translation for use. So that the simplified translation to be used can not only express the semantic information carried by the current speech segment, but also express the semantic information carried by the K historical semantic units, so that it can effectively avoid the forced translation of the previous compressed translation result.
  • the phenomenon of information omission caused by sexual interception processing makes the real-time translation of the voice stream more natural and smooth, which is conducive to improving the compression translation effect.
  • translation to be used refers to the text information that needs to be sent to the user in the translation result (that is, “simplified translation to be used") of the current speech segment (or current text) (for example, similar to the above "Artificial intelligence is loved by”); and the text length of the “translation to be used” is the expected length of the translation. It can be seen that after the translation to be used is obtained, the translation to be used can be sent to the user (for example, displayed on a display screen), so that the user can know the translation result for the current speech segment.
  • translation to be discarded refers to the text information (for example, similar to “all countries” above) that does not need to be sent to the user in the translation result of the current speech segment (that is, "simplified translation to be used").
  • the translation to be discarded can be directly determined as an updated historical semantic unit, so that the updated translation can be referred to during the compression translation process for the next speech segment.
  • the historical legacy semantic unit is implemented, which can effectively avoid the phenomenon of information omission caused by the mandatory interception processing of the translation results of the current speech segment, so that the real-time translation of the speech stream is more natural and smooth, which is conducive to improving compression translation Effect.
  • the main source text can be compressed by referring to the translation length description data and at least one historical semantic unit by the compression translation model Translate processing to obtain the condensed translation to be used; then according to the length of the translation represented by the translation length description data, carry out cutting processing on the condensed translation to be used to obtain the translation to be used, so that the text length of the translation to be used is the length of the translation , so that the translation to be used can strictly follow the length constraint of the translation, which is beneficial to improve the compression translation effect.
  • the embodiment of the present application also provides a translation device, which will be explained and described below with reference to the accompanying drawings.
  • the device embodiment introduces the translation device, and for related content, please refer to the above method embodiment.
  • FIG. 6 this figure is a schematic structural diagram of a translation device provided by an embodiment of the present application.
  • the translation device 600 provided in the embodiment of this application includes:
  • a text acquisition unit 601, configured to acquire the source text to be processed
  • a trunk extracting unit 602 configured to extract a trunk source text from the source text to be processed
  • the compressed translation unit 603 is configured to determine the simplified translation to be used according to the main source text, the translation length description data, and the pre-built compressed translation model; wherein, the compressed translation model is used to refer to the translation length description data, Compression translation is performed on the backbone source text.
  • the compressed translation model includes an encoder and a decoder; the compressed translation unit 603 is specifically configured to: determine the features to be encoded according to the main source text and the translation length description data ; Input the feature to be encoded into the encoder to obtain a feature encoding result output by the encoder; determine the condensed translation to be used according to the feature encoding result and the decoder.
  • the process of determining the features to be encoded includes: determining the text features to be used according to the main source text; determining the text features to be used according to the text features to be used and the translation length description data. using location features; determining the to-be-encoded features according to the to-be-used text features and the to-be-used location features.
  • the process of determining the text features to be used includes: determining the text features to be used according to the main source text and the translation length description data.
  • the process of determining the characteristics of the text to be used includes: determining the length ratio interval to be used according to the translation length description data; , to determine the text feature to be used.
  • the process of determining the characteristics of the text to be used includes: splicing the length ratio interval to be used and the main source text to obtain the first text; Vectorization processing to obtain the text features to be used.
  • the process of determining the text features to be used includes: searching for the interval identifier corresponding to the length ratio interval to be used from the preset mapping relationship to obtain the interval identifier to be used; Splicing by using the interval identifier and the main source text to obtain a second text; performing vectorization processing on the second text to obtain the text features to be used; wherein, the preset mapping relationship includes the length to be used The corresponding relationship between the proportion interval and the identifier of the interval to be used.
  • the process of determining the characteristics of the text to be used includes: performing vectorization processing on the main source text to obtain a text representation vector; The vectors are spliced to obtain the text features to be used.
  • the process of determining the characteristics of the text to be used includes: performing vectorization processing on the main source text to obtain a text representation vector; finding the length ratio to be used from the preset mapping relationship The interval identifier corresponding to the interval is used to obtain the interval identifier to be used; the interval identifier to be used is spliced with the text representation vector to obtain the text feature to be used; wherein the preset mapping relationship includes the length to be used The corresponding relationship between the proportion interval and the identifier of the interval to be used.
  • the text feature to be used includes N feature values; wherein, N is a positive integer; the process of determining the position feature to be used includes: according to the nth feature value of the text feature to be used The position index of the feature value, the description data of the translation length, and the dimension index of the n-th feature value determine the position encoding result of the n-th feature value; wherein, n is a positive integer, n ⁇ N; According to the position encoding result of the first eigenvalue to the position encoding result of the Nth eigenvalue, determine the position feature to be used.
  • the process of determining the position encoding result of the nth feature value includes: according to the distance between the expected length of the translation and the position index of the nth feature value in the text feature to be used and the dimension index of the nth eigenvalue to determine the position encoding result of the nth eigenvalue; wherein, the expected translation length is determined according to the translation length description data.
  • the process of determining the position encoding result of the nth feature value includes: according to the ratio between the position index of the nth feature value in the text feature to be used and the expected length of the translation , and the dimension index of the nth eigenvalue, to determine the position encoding result of the nth eigenvalue; wherein, the expected translation length is determined according to the translation length description data.
  • the process of determining the condensed translation to be used includes: determining the condensed translation to be used according to the feature encoding result, the translation length description data, and the decoder; wherein , the decoder is configured to refer to the translation length description data and perform decoding processing on the feature encoding result.
  • the decoder includes at least one first decoding layer; wherein, the first decoding layer includes a first decoding module, an information fusion module, and a first normalization module; the first The input data of the normalization module includes the output data of the first decoding module and the output data of the information fusion module; the information fusion module is used to multiply the input data of the information fusion module by the expected length of the translation Processing; wherein, the expected length of the translation is determined according to the description data of the length of the translation.
  • the decoder further includes at least one second decoding layer; the second decoding layer includes a second decoding module and a second normalization module; wherein, the second normalization
  • the input data of the module comprises the output data of said second decoding module.
  • the decoder includes one first decoding layer and J second decoding layers; wherein, the input data of the first second decoding layer includes Output data: the input data of the jth second decoding layer includes the output data of the j-1th second decoding layer, j is a positive integer, and 2 ⁇ j ⁇ J.
  • the text acquiring unit 601 is specifically configured to: after acquiring the current speech segment, perform speech recognition processing on the current speech segment to obtain the source text to be processed.
  • the compressed translation unit 603 is specifically configured to: determine, according to the main source text, the translation length description data, the compressed translation model, and at least one historical legacy semantic unit, to Using a condensed translation; wherein, the at least one historical semantic unit is determined according to a previous speech segment of the current speech segment.
  • the translation device 600 further includes:
  • a text division unit configured to divide the simplified translation to be used according to the expected length of the translation to obtain a translation to be used and a translation to be discarded; wherein, the text length of the translation to be used is the expected length of the translation; the The expected length of the translation is determined according to the description data of the length of the translation;
  • a history updating unit configured to update the at least one historical semantic unit according to the translation to be discarded.
  • the number of semantic units left over from history is K; wherein, K is a positive integer; the number of semantic units in the condensed translation to be used is ⁇ K;
  • the process of determining the kth semantic unit in the simplified translation to be used includes: according to the main source text, the translation length description data, and the compressed translation model, determining the model prediction probability in the kth state; where k is Positive integer, k ⁇ K; According to the model prediction probability under the kth state and the object prediction probability of the kth historical legacy semantic unit, determine the penalty factor value; combine the model prediction probability sum and the penalty under the kth state The factor values are weighted and summed to obtain the predicted correction probability in the kth state; and the kth semantic unit is determined according to the predicted correction probability in the kth state.
  • the trunk extraction unit 602 includes:
  • a syntactic analysis subunit configured to perform dependent syntactic analysis processing on the source text to be processed to obtain a dependent syntactic analysis result
  • a part-of-speech tagging subunit configured to perform part-of-speech tagging processing on the source text to be processed to obtain a part-of-speech tagging result
  • the importance characterization subunit is used to determine the lexical importance characterization data according to the dependency parsing result and the part-of-speech tagging result;
  • the text determination subunit is configured to determine the main source text according to the lexical importance characterization data and the source text to be processed.
  • the vocabulary importance characterization data includes a multi-fork tree to be used
  • the text determining subunit is specifically used to: determine the node to be deleted according to the multi-fork tree to be used; determine the text length of the deleted text corresponding to the node to be deleted and the text length of the source text to be processed. Describe the deletion identification result of the node to be deleted; if the deletion identification result of the node to be deleted meets the preset deletion condition, delete the node to be deleted from the multi-tree to be used, and continue to execute the The step of determining the node to be deleted by describing the multi-fork tree to be used; if the deletion identification result of the node to be deleted does not meet the preset deletion condition, continue to perform the step of determining the node to be deleted according to the multi-fork tree to be used Step: Determine the main source text according to the multi-tree to be used and the source text to be processed until the preset stop condition is reached.
  • the process of determining the deletion identification result of the node to be deleted includes: pre-deleting the node to be deleted from the multi-tree to be used to obtain the multi-tree after pre-deletion; according to The pre-deleted multi-fork tree and the source text to be processed determine the length of the deleted text corresponding to the node to be deleted; determine the length between the deleted text length and the text length of the source text to be processed Ratio; comparing the length ratio with a preset ratio threshold to obtain a comparison result to be used; and determining a deletion identification result of the node to be deleted according to the comparison result to be used.
  • the translation device 600 further includes:
  • a model construction unit configured to obtain at least one sample original text and the actual translation corresponding to the at least one sample original text; determine the translation length description data corresponding to each sample original text according to the text length of the actual translation corresponding to each of the sample original texts;
  • the compressed translation model is constructed according to the at least one sample original text, the translation length description data corresponding to the at least one sample original text, and the actual translation corresponding to the at least one sample original text.
  • the embodiment of the present application also provides a device, including: a processor, a memory, and a system bus;
  • the processor and the memory are connected through the system bus;
  • the memory is used to store one or more programs, and the one or more programs include instructions, and the instructions, when executed by the processor, cause the processor to execute any implementation method of the translation method described above.
  • the embodiment of the present application also provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are run on the terminal device, the terminal device is made to execute the above-mentioned translation method any implementation method.
  • an embodiment of the present application also provides a computer program product, which, when running on a terminal device, enables the terminal device to execute any implementation method of the translation method described above.
  • each embodiment in this specification is described in a progressive manner, each embodiment focuses on the differences from other embodiments, and the same and similar parts of each embodiment can be referred to each other.
  • the description is relatively simple, and for relevant details, please refer to the description of the method part.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

本申请公开了一种翻译方法及其相关设备,该方法包括:在获取到待处理源文本之后,先从该待处理源文本中抽取主干源文本,以使该主干源文本用于表示该待处理源文本中的核心主干信息;再根据该主干源文本、译文长度描述数据、以及预先构建的压缩翻译模型,确定待使用精简译文,以使该待使用精简译文能够以较少的译文字符表示该待处理源文本所携带的语义信息,如此能够有效地避免译文端词数多于源文端词数的现象,从而能够实现核心含义不损失的前提下缩短翻译文本的长度,进而有利于降低翻译延迟,如此有利于提高翻译效果。

Description

一种翻译方法及其相关设备
本申请要求于2021年12月23日提交中国专利局、申请号为202111592412.5、申请名称为“一种翻译方法及其相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及数据处理技术领域,尤其涉及一种翻译方法及其相关设备。
背景技术
语音同传翻译场景是一种缺乏下文信息、局部的、即时的、需要考虑至少两个语种特性的翻译场景。
语音同传翻译场景的实时性要求通常比较高,但是在相同语义表达下,很容易出现译文端词数多于源文端词数的情况,如此易导致实际翻译速度往往无法达到实时翻译速度需求,从而导致发生延迟积累现象,进而导致翻译效果比较差。
发明内容
本申请实施例的主要目的在于提供一种翻译方法及其相关设备,能够提高翻译效果。
本申请实施例提供了一种翻译方法,所述方法包括:获取待处理源文本;从所述待处理源文本中抽取主干源文本;根据所述主干源文本、译文长度描述数据、以及预先构建的压缩翻译模型,确定待使用精简译文;其中,所述压缩翻译模型用于参考所述译文长度描述数据,对所述主干源文本进行压缩翻译。
本申请实施例还提供了一种翻译装置,包括:文本获取单元,用于获取待处理源文本;主干抽取单元,用于从所述待处理源文本中抽取主干源文本;压缩翻译单元,用于根据所述主干源文本、译文长度描述数据、以及预先构建的压缩翻译模型,确定待使用精简译文;其中,所述压缩翻译模型用于参考所述译文长度描述数据,对所述主干源文本进行压缩翻译。
本申请实施例还提供了一种设备,其特征在于,所述设备包括:处理器、存储器、系统总线;所述处理器以及所述存储器通过所述系统总线相连;所述存储器用于存储一个或多个程序,所述一个或多个程序包括指令,所述指令当被所述处理器执行时使所述处理器执行本申请实施例提供的翻译方法的任一实施方式。
本申请实施例还提供了一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有指令,当所述指令在终端设备上运行时,使得所述终端设备执行本申请实施例提供的翻译方法的任一实施方式。
本申请实施例还提供了一种计算机程序产品,所述计算机程序产品在终端设备上运行时,使得所述终端设备执行本申请实施例提供的翻译方法的任一实施方式。
基于上述技术方案,本申请具有以下有益效果:
本申请提供的技术方案中,在获取到待处理源文本(例如,同传语音流中当前语音段的语音识别文本)之后,先从该待处理源文本中抽取主干源文本,以使该主干源文本用于表示该待处理源文本中的核心主干信息;再根据该主干源文本、译文长度描述数据、以及预先构建的压缩翻译模型,确定待使用精简译文,以使该待使用精简译文能够以较少的译 文字符表示该待处理源文本所携带的语义信息,如此能够有效地避免译文端词数多于源文端词数的现象,从而能够实现在核心含义不损失的前提下缩短翻译文本的长度,进而能够有效地降低翻译延迟,如此能够提高翻译实时性,从而能够有利于提高翻译效果。
另外,因主干源文本是通过对待处理源文本进行主干抽取得到的,使得该主干源文本的文本长度小于该待处理源文本的文本长度,如此实现简化源语言下文本数据的目的;还因待使用精简译文是压缩翻译模型针对该主干源文本进行压缩翻译得到的,使得该待使用精简译文能够以较少的译文字符表示该待处理源文本所携带的语义信息,如此实现简化目标语言下文本数据的目的。可见,本申请实施例通过两端(也就是,源语言端+目标语言端)简化的方式实现针对待处理源文本的压缩翻译,如此能够保证该待处理源文本的压缩翻译结果能够以尽可能少的译文字符表示该待处理源文本所携带的语义信息,从而能够有效地避免译文端词数多于源文端词数的现象,从而能够实现在核心含义不损失的前提下缩短翻译文本的长度,进而能够有效地降低翻译延迟,如此能够提高翻译实时性,从而能够有利于提高翻译效果。
此外,因待使用精简译文是由压缩翻译模型参考译文长度描述数据进行压缩翻译得到的,使得该待使用精简译文的文本长度几乎接近于、甚至等于该译文长度描述数据所表征的译文期望长度,从而使得该待使用精简译文的文本长度是可控的,如此能够有效地保证该待使用精简译文能够以个数比较合理的词数表示该待处理源文本所携带的语义信息,从而能够有效地避免因译文词数不可控所带来的不良影响,进而有利于提高翻译效果。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的一种翻译方法的流程图;
图2为本申请实施例提供的一种同传语音流的示意图;
图3为本申请实施例提供的一种依存句法树的示意图;
图4为本申请实施例提供的一种压缩翻译模型的示意图;
图5为本申请实施例提供的一种压缩翻译流程的示意图;
图6为本申请实施例提供的一种翻译装置的结构示意图。
具体实施方式
发明人在针对语音同传翻译场景的研究中发现,对于源语言下的语音识别文本来说,可以对该语音识别文本进行翻译,得到目标语音下的翻译文本。然而,因该翻译文本词数很有可能多于语音识别文本的词数,如此易导致实际翻译速度无法达到实时翻译速度需求,从而导致发生延迟积累现象。
基于上述发现,为了解决背景技术部分所示的技术问题,本申请实施例提供了一种翻译方法,该方法包括:在获取到待处理源文本(例如,同传语音流中当前语音段的语音识别文本)之后,先从该待处理源文本中抽取主干源文本,以使该主干源文本用于表示该待 处理源文本中的核心主干信息;再根据该主干源文本、译文长度描述数据、以及预先构建的压缩翻译模型,确定待使用精简译文,以使该待使用精简译文能够以较少的译文字符表示该待处理源文本所携带的语义信息,如此能够有效地避免译文端词数多于源文端词数的现象,从而能够实现在核心含义不损失的前提下缩短翻译文本的长度,进而能够有效地降低翻译延迟,如此能够提高翻译实时性,从而能够有利于提高翻译效果。
另外,因主干源文本是通过对待处理源文本进行主干抽取得到的,使得该主干源文本的文本长度小于该待处理源文本的文本长度,如此实现简化源语言下文本数据的目的;还因待使用精简译文是压缩翻译模型针对该主干源文本进行压缩翻译得到的,使得该待使用精简译文能够以较少的译文字符表示该待处理源文本所携带的语义信息,如此实现简化目标语言下文本数据的目的。可见,本申请实施例通过两端(也就是,源语言端+目标语言端)简化的方式实现针对待处理源文本的压缩翻译,如此能够保证该待处理源文本的压缩翻译结果能够以尽可能少的译文字符表示该待处理源文本所携带的语义信息,从而能够有效地避免译文端词数多于源文端词数的现象,从而能够实现在核心含义不损失的前提下缩短翻译文本的长度,进而能够有效地降低翻译延迟,如此能够提高翻译实时性,从而能够有利于提高翻译效果。
此外,因待使用精简译文是由压缩翻译模型参考译文长度描述数据进行压缩翻译得到的,使得该待使用精简译文的文本长度几乎接近于、甚至等于该译文长度描述数据所表征的译文期望长度,从而使得该待使用精简译文的文本长度是可控的,如此能够有效地保证该待使用精简译文能够以个数比较合理的词数表示该待处理源文本所携带的语义信息,从而能够有效地避免因译文词数不可控所带来的不良影响,从而能够实现在核心含义不损失的前提下缩短翻译文本的长度,进而能够有效地降低翻译延迟,如此能够提高翻译实时性,从而能够有利于提高翻译效果。
还有,本申请实施例不限定翻译方法的执行主体,例如,本申请实施例提供的翻译方法可以应用于终端设备或服务器等数据处理设备。其中,终端设备可以为智能手机、计算机、个人数字助理(Personal Digital Assitant,PDA)或平板电脑等。服务器可以为独立服务器、集群服务器或云服务器。
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
方法实施例一
参见图1,该图为本申请实施例提供的一种翻译方法的流程图。
本申请实施例提供的翻译方法,包括S1-S3:
S1:获取待处理源文本。
上述“待处理源文本”是指源语言下的文本数据;而且该“待处理源文本”需要被翻译为目标语言下的文本内容。例如,对于中英翻译场景来说,待处理源文本是指中文文本数据;而且该待处理源文本需要被翻译成英文文本数据。
另外,本申请实施例不限定上述“待处理源文本”,例如,对于同传翻译场景来说,为了提高实时性,该“待处理源文本”可以包括一个句子。
此外,本申请实施例不限定S1的实施方式,例如,其可以具体可以为:从文本流中实时采集的当前文本。又如,对于同传翻译场景来说,为了提高翻译实时性,S1具体可以包括:在获取到当前语音段之后,对该当前语音段进行语音识别处理,得到待处理源文本。(需要说明的是,下文以同传翻译场景为例进行说明)
上述“当前语音段”用于表示从一个语音流(例如,同传翻译场景下语音流)中实时采集的语音片段。例如,如图2所示,在从图2所示的语音流中采集到“第三语音片段”时,可以将该第三语音片段,确定为当前语音段,以便后续能够借助针对该当前语音段的压缩翻译处理,及时地确定出该第三语音片段的翻译结果。
需要说明的是,本申请实施例不限定上述“语音识别处理”的实施方式,可以采用现有的或者未来出现的任意一种语音识别方法进行实施。
还需要说明的是,本申请实施例不限定上述“语音片段”的采集频率,例如,可以根据应用场景设定。又如,可以按照源语言下的句子长度,设定“语音片段”的采集频率,以使上述“当前语音段”包括一个句子。
基于上述S1的相关内容可知,对于同传翻译场景下实时输入的语音流来说,在从该语音流中实时地采集到当前语音段之后,可以针对该当前语音段进行语音识别处理,得到待处理源文本,以便后续能够基于该待处理源文本,实现针对该当前语音段的实时翻译处理。
S2:从待处理源文本中抽取主干源文本。
上述“抽取主干源文本”用于表示待处理源文本中的核心主干信息。例如,当待处理源文本为“人工智能的快速发展为各个国家带来了机遇”时,抽取主干源文本可以是“人工智能的发展为各国带来机遇”。
另外,本申请实施例不限定S2的实施方式,例如,可以采用现有的或者未来出现的任意一种能够针对一个文本数据进行主干抽取的方法(例如,句式简化方法等)进行实施。又如,为了提高主干源文本的抽取效果,可以采用 方法实施例二所示的S2的任一可能的实施方式进行实施。
S3:根据主干源文本、译文长度描述数据、以及预先构建的压缩翻译模型,确定待使用精简译文。
上述“译文长度描述数据”用于描述待处理源文本的翻译结果的文本长度;而且本申请实施例不限定该“译文长度描述数据”,例如,其可以包括译文期望长度、译源期望长度比中的至少一个。
上述“文本长度”用于描述一个文本数据的长度;而且本申请实施例不限定该“文本长度”的表示方法,例如,其可以采用一个文本数据中语义单元个数(例如,字和/或词个数)进行表示。
需要说明的是,上述“语义单元”是指一个语种下语义表示的单元;而且本申请实施例不限定该“语义单元”,例如,该“语义单元”可以是词汇,也可以是字符(如,中文下的语义单元可以是汉字或者词汇等)。
上述“译文期望长度”是指用户希望待处理源文本的翻译结果所具有的文本长度;而且本申请实施例不限定该“译文期望长度”,例如,为了尽可能地避免译文端词数多于源文端词数的现象,该“译文期望长度”可以接近于待处理源文本的文本长度。例如,若待处理源文本的文本长度为6个词汇,则该“译文期望长度”可以为6个词汇。
另外,本申请实施例不限定上述“译文期望长度”的获取方式,例如,可以预先设定。又如,可以根据用户针对该“译文期望长度”所触发的设置操作进行确定。又如,可以针对目标语言下大量句子进行长度统计分析,得到该“译文期望长度”。
上述“译源期望长度比”是指用户希望待处理源文本的翻译结果所具有的文本长度、与该待处理源文本的文本长度之间的比值;而且本申请实施例不限定该“译源期望长度比”,例如,为了尽可能地避免译文端词数多于源文端词数的现象,该“译源期望长度比”可以采用一个比较接近于1的数值(例如,上述“译源期望长度比”可以是1,也可以是0.8)。
另外,本申请实施例不限定上述“译源期望长度比”的获取方式,例如,可以预先设定。又如,可以根据用户针对该“译源期望长度比”所触发的设置操作进行确定。又如,可以针对目标语言与源语言下大量句子对进行长度比例统计分析,得到该“译源期望长度比”。
上述“压缩翻译模型”用于针对该压缩翻译模型的输入数据进行译文长度可控地压缩翻译处理。例如,上述“压缩翻译模型”能够参考译文长度描述数据,对主干源文本进行压缩翻译,以使针对该主干源文本的压缩翻译结果能够尽可能地达到译文长度描述数据所表征的译文期望长度,如此以实现译文长度可控地压缩翻译处理。
另外,本申请实施例不限定上述“压缩翻译模型”,例如,其可以是一种机器学习模型。又如,为了提高压缩翻译效果,可以采用 方法实施例三所示的压缩翻译模型的任一可能的实施方式进行实施。
基于上述S1至S3的相关内容可知,在获取到待处理源文本(例如,同传语音流中当前语音段的语音识别文本)之后,先从该待处理源文本中抽取主干源文本,以使该主干源文本用于表示该待处理源文本中的核心主干信息;再根据该主干源文本、译文长度描述数据、以及预先构建的压缩翻译模型,确定待使用精简译文,以使该待使用精简译文能够以较少的译文字符表示该待处理源文本所携带的语义信息,如此能够有效地避免译文端词数多于源文端词数的现象,从而能够实现在核心含义不损失的前提下缩短翻译文本的长度,进而能够有效地降低翻译延迟,如此能够提高翻译实时性,从而能够有利于提高翻译效果。
另外,因主干源文本是通过对待处理源文本进行主干抽取得到的,使得该主干源文本的文本长度小于该待处理源文本的文本长度,如此实现简化源语言下文本数据的目的;还因待使用精简译文是压缩翻译模型针对该主干源文本进行压缩翻译得到的,使得该待使用精简译文能够以较少的译文字符表示该待处理源文本所携带的语义信息,如此实现简化目标语言下文本数据的目的。可见,本申请实施例通过两端(也就是,源语言端+目标语言端)简化的方式实现针对待处理源文本的压缩翻译,如此能够保证该待处理源文本的压缩翻译结果能够以尽可能少的译文字符表示该待处理源文本所携带的语义信息,从而能够有效地避免译文端词数多于源文端词数的现象,从而能够实现在核心含义不损失的前提下缩短翻译文本的长度,进而能够有效地降低翻译延迟,如此能够提高翻译实时性,从而能够有利 于提高翻译效果。
此外,因待使用精简译文是由压缩翻译模型参考译文长度描述数据进行压缩翻译得到的,使得该待使用精简译文的文本长度几乎接近于、甚至等于该译文长度描述数据所表征的译文期望长度,从而使得该待使用精简译文的文本长度是可控的,如此能够有效地保证该待使用精简译文能够以个数比较合理的词数表示该待处理源文本所携带的语义信息,从而能够有效地避免因译文词数不可控所带来的不良影响,从而能够实现在核心含义不损失的前提下缩短翻译文本的长度,进而能够有效地降低翻译延迟,如此能够提高翻译实时性,从而能够有利于提高翻译效果。
方法实施例二
为了提高针对源语言下文本数据中不重要的词语的过滤效果,可以借助依存句法分析技术、词性标注技术对该文本数据进行过滤处理。基于此,本申请实施例还提供了上文S2的一种可能的实施方式,其具体可以包括S21-S24:
S21:对待处理源文本进行依存句法分析处理,得到依存句法分析结果。
上述“依存句法分析处理”用于识别一个文本数据中不同词汇之间的有向依存关系;而且本申请实施例不限定上述“依存句法分析处理”的实施方式,可以采用现有的或者未来出现的任意一种依存句法分析技术进行实施。
另外,上述“依存句法分析处理”所基于的依存句法理论,具体为:认为词与词之间存在主从关系,这是一种二元不等价的关系。在句子中,如果一个词修饰另一个词,则称修饰词为从属词(dependent),被修饰的词语称为支配词(head),两者之间的语法关系称为依存关系(dependency relation)。例如,如图3所示,对于句子“人工智能的快速发展为公司奠定了非常扎实的基础”来说,因“快速”修饰了“发展”,使得“快速”与“发展”之间具有图3所示的有向依存关系。
上述“依存句法分析结果”用于表示待处理源文本中不同词汇之间的有向依存关系;而且本申请实施例不限定该“依存句法分析结果”的表示方式,例如,可以借助依存句法树(如图3所示)进行表示。
需要说明的是,对于图3所示的依存句法树来说,该依存句法树是一个多叉树。其中,依存句法树的根节点为句子中的核心词“奠定”,而且该依存句法树的各个子节点均为父节点所支配的单词或成分。图3中“Root”用于标记“奠定”是根节点。
S22:对待处理源文本进行词性标注处理,得到词性标注结果。
上述“词性标注处理”用于识别并标注一个文本数据中各个词汇的词性;而且本申请实施例不限定该“词性标注处理”,可以采用现有的或者未来出现的任意一种词性标注技术进行实施。
上述“词性标注结果”用于表示待处理源文本中每个词汇的词性。
S23:根据依存句法分析结果和词性标注结果,确定词汇重要性表征数据。
上述“词汇重要性表征数据”用于表示待处理源文本中每个词汇的重要程度;而且本申请实施例不限定该“词汇重要性表征数据”,例如,其可以包括待使用多叉树。
上述“待使用多叉树”是一种多叉树结构;而且该“待使用多叉树”不仅能够表示出 待处理源文本中每个词汇的重要程度,还能够表示出该待处理源文本中不同词汇之间的有向依存关系(以及该待处理源文本中每个词汇的词性)。
另外,本申请实施例不限定上述“待使用多叉树”的重要程度表示方法,例如,当上述“待使用多叉树”中根节点的分布位置高于其他非根节点的分布位置,且父节点的分布位置高于该父节点下子节点的分布位置时,该“待使用多叉树”包括多层节点,而且该“待使用多叉树”的重要程度表示方法具体可以为:对于位于不同层的任意两个节点来说,位于高层的节点的重要程度高于位于低层的节点的重要程度;而且,对于位于同一层的任意两个节点来说,位置偏左的节点的重要程度高于位置偏右的节点的重要程度。可见,为了实现上述重要程度分布效果,在上述“待使用多叉树”的构建过程中,可以将归属于同一个父节点的多个子树按照重要性从高到低的顺序进行从左到右排序。
此外,本申请实施例不限定上述“待使用多叉树”的确定过程,例如,当上述“依存句法分析结果”采用依存句法树进行表示时,该“待使用多叉树”的确定过程具体可以包括:先参考预先设定的重要性分析规则、以及词性标注结果,对依存句法树中每个子树进行重要性分析处理,得到该依存句法树中各个子树的重要性程度;再按照依存句法树中各个子树的重要性程度,对该依存句法树中各个子树的分布位置进行调整,以使调整后的依存句法树中具有相同父节点的所有子树从左到右的排列顺序,是按照各个子树中最高层节点所具有的重要程度从高到低的排列顺序进行呈现的。
需要说明的是,上述“重要性分析规则”可以预先设定,而且本申请实施例不限定该“重要性分析规则”,例如,其可以包括:(1)父节点的重要性高于该父节点下各个子节点的重要性。(2)对于一个父节点下的多个子节点来说,第一词性(例如,名词)的重要性高于第二词性(例如,动词)的重要性,第二词性的重要性高于第三词性(例如,形容词)的重要性,……。
S24:根据词汇重要性表征数据和待处理源文本,确定主干源文本。
作为示例,当上述“词汇重要性表征数据”包括待使用多叉树时,S24具体可以包括S241-S247:
S241:根据待使用多叉树,确定待删除节点。
上述“待删除节点”是指需要判断是否从待使用多叉树中删除的节点。
另外,本申请实施例不限定上述“待删除节点”,例如,该“待删除节点”可以包括待使用多叉树中的一个叶节点,也可以包括该待使用多叉树中一个子树(也就是,一个父节点以及该父节点下面的所有节点)。可见,上述“待删除节点”可以包括待使用多叉树中至少一个节点。需要说明的是,上述“叶节点”是指没有分叉的节点;上述“父节点”是指具有分叉的节点。
此外,本申请实施例不限定上述“待删除节点”的确定过程,例如,其具体可以为:先从待使用多叉树中仍未遍历的所有节点中挑选重要程度最低的节点;若该“重要程度最低的节点”是一个叶节点,则将该“重要程度最低的节点”,确定为待删除节点;若该重要程度最低的节点”是一个父节点,则将该“重要程度最低的节点”及其下面的所有节点,均确定为待删除节点。
可见,针对上述“待使用多叉树”,可以采用自底向上、以及自右向左的方式进行遍历, 并将当前被遍历节点(以及该当前被遍历节点下面的所有节点),确定为待删除节点,以便后续能够判断是否从待使用多叉树中删除该当前被遍历节点(以及该当前被遍历节点下面的所有节点)。其中,“当前被遍历节点”是指当前轮下在待使用多叉树中所遍历至的一个节点(例如,叶节点或者父节点)。
S242:根据待删除节点对应的删除后文本长度与待处理源文本的文本长度,确定该待删除节点的删除识别结果。
上述“待删除节点对应的删除后文本长度”是指在将待删除节点从待使用多叉树中删除之后所有剩余节点所代表的词汇的文本长度。
上述“待删除节点的删除识别结果”用于表示是否从待处理源文本中删除该待删除节点。
另外,本申请实施例不限定上述“待删除节点的删除识别结果”的确定过程,例如,其具体可以包括步骤11-步骤15:
步骤11:从待使用多叉树中预删除待删除节点,得到预删除后多叉树。
上述“预删除后多叉树”用于表示不包括待删除节点的待使用多叉树,以使该“预删除后多叉树”包括在将待删除节点从待使用多叉树中删除之后所有剩余节点。需要说明的是,上述“预删除”是一种删除演示动作;而且该“预删除”不会改变待使用多叉树。
步骤12:根据预删除后多叉树和待处理源文本,确定待删除节点对应的删除后文本长度。
本申请实施例中,在获取到预删除后多叉树之后,可以先根据该预删除后多叉树和待处理源文本,确定预删除后文本,以使该预删除后文本只包括该预删除后多叉树中所有节点所代表的语义单元;再将该预删除后文本的文本长度,确定为该待删除节点对应的删除后文本长度。
步骤13:确定删除后文本长度与待处理源文本的文本长度之间的长度比值。
步骤14:将长度比值与预设比值阈值进行比较,得到待使用比较结果。
上述“预设比值阈值”可以预先设定,也可以从目标语言以及源语言下大量句子对比值中挖掘得到。
上述“待使用比较结果”用于表示上述“删除后文本长度与待处理源文本的文本长度之间的长度比值”与上述“预设比值阈值”之间的相对大小。
步骤15:根据待使用比较结果,确定待删除节点的删除识别结果。
本申请实施例中,在获取到待使用比较结果之后,若该待使用比较结果表示上述“删除后文本长度与待处理源文本的文本长度之间的长度比值”高于上述“预设比值阈值”,则可以确定在从待使用多叉树中删除待删除节点时不会删除太多字符信息,从而可以确定该待删除节点可以从该待使用多叉树中删除,故可以将被删除标记(例如,“1”),确定为该待删除节点的删除识别结果;但是,若该待使用比较结果表示上述“删除后文本长度与待处理源文本的文本长度之间的长度比值”不高于上述“预设比值阈值”,则可以确定在从待使用多叉树中删除待删除节点时很有可能会删除太多字符信息,故为了避免删除过多信息,可以将不删除标记(例如,“0”),确定为该待删除节点的删除识别结果。
基于上述S242的相关内容可知,在获取到待删除节点之后,可以根据待删除节点对应 的删除后文本长度与待处理源文本的文本长度之间的比值,确定该待删除节点的删除识别结果,以使该删除识别结果能够表示出是否从待处理源文本中删除该待删除节点。
S243:判断待删除节点的删除识别结果是否满足预设删除条件,若是,则执行S244-S245;若否,则执行S245。
上述“预设删除条件”可以预先设定,例如,其具体可以包括:上述“待删除节点的删除识别结果”表示该待删除节点可以被删除(如,上述“待删除节点的删除识别结果”包括被删除标记)。
可见,在获取到待删除节点的删除识别结果之后,若该删除识别结果表示该待删除节点可以被删除,则可以确定该删除识别结果满足预设删除条件,故可以直接将该待删除节点从待使用多叉树中删除即可;若该删除识别结果表示该待删除节点不可以被删除,则可以确定该删除识别结果不满足预设删除条件,故可以将该待删除节点保留在该待使用多叉树中。
S244:将待删除节点从待使用多叉树中删除。
本申请实施例中,在确定待删除节点的删除识别结果满足预设删除条件之后,可以将该待删除节点从待使用多叉树中删除,得到删除后的待使用多叉树,如此能够实现针对该待使用多叉树的更新过程,以便能够基于删除后的待使用多叉树进行后续操作(例如,下一轮遍历过程)。
S245:判断是否达到预设停止条件,若是,则执行S246;若否,则返回执行S241。
上述“预设停止条件”可以预先设定,例如,其具体可以为:待使用多叉树中除了根节点以外的其他所有节点均被遍历过。
可见,若确定当前的待使用多叉树中除了根节点以外的其他所有节点均已被遍历,则可以确定针对该待使用多叉树的剪枝过程已完成,从而可以确定达到预设停止条件,故可以按照该待使用多叉树,从待处理源文本抽取主干源文本即可;但是,若确定当前的待使用多叉树中仍存在非根节点未被遍历,则可以确定针对该待使用多叉树的剪枝过程仍完成,从而可以确定未达到预设停止条件,故可以返回继续执行S241及其后续步骤,以开启针对该待使用多叉树的下一轮遍历过程。
S246:根据待使用多叉树和待处理源文本,确定主干源文本。
本申请实施例中,在确定达到预设停止条件时,可以确定针对该待使用多叉树的剪枝过程已完成,故可以按照该待使用多叉树,从待处理源文本抽取主干源文本,以使该主干源文本只包括该待使用多叉树中所有节点所代表的语义单元,从而使得该主干源文本能够以个数较少的字符表示出该待处理源文本所携带的语义信息。
基于上述S21至S24的相关内容可知,在获取到待处理源文本之后,可以借助依存句法分析技术、词性标注技术对该待处理源文本进行主干信息抽取处理,得到主干源文本,以使该主干源文本能够以个数尽可能少的词汇表示出该待处理源文本所携带的语义信息,如此有利于提高主干信息抽取效果,从而有利于实现在核心含义不损失的前提下缩短翻译文本的长度,进而能够有效地降低翻译延迟,如此能够提高翻译实时性,从而能够有利于提高翻译效果。
方法实施例三
为了更好地实现译文长度可控且保证模型在信息损失最小的情况下进行压缩翻译,本申请可以从翻译模型源端、翻译模型目标端、翻译模型解码阶段全方位融入期望长度信息,以在保证长度可控的条件下实现更自然完整的压缩翻译处理。
基于此,本申请实施例提供了上述“压缩翻译模型”的一种可能的实施方式,在该实施方式下,该“压缩翻译模型”可以包括编码器和解码器(例如,图4所示的压缩翻译模型)。
为了便于理解上述“压缩翻译模型”,下面以上文“待使用精简译文”的确定过程为例进行说明。
作为示例,当上述“压缩翻译模型”包括编码器和解码器时,上文“待使用精简译文”的确定过程,具体可以包括步骤21-步骤23:
步骤21:根据主干源文本和译文长度描述数据,确定待编码特征。
上述“待编码特征”是指需要进行编码处理的特征。
本申请实施例不限定步骤21的实施方式,例如,其具体可以包括步骤211-步骤213:
步骤211:根据主干源文本,确定待使用文本特征。
上述“待使用文本特征”可以用于表示主干源文本携带的字符信息。
本申请实施例不限定步骤211的实施方式,例如,可以采用现有的或者未来出现的任意一种文本特征提取方法(例如,Word2Vec)进行实施。
另外,为了进一步避免译文长度描述数据在整个模型处理过程被遗忘,本申请实施例还提供了步骤211的另一种可能的实施方式,其具体可以包括:根据主干源文本和译文长度描述数据,确定待使用文本特征,以使该“待使用文本特征”不仅能够表示出该主干源文本携带的字符信息,还能够表示出该译文长度描述数据所携带的译文长度信息。
实际上,因源语言下不同句子的译文长度不同,使得该不同句子与其译文之间的比例也是不一致的,从而使得大量句子的译源长度比例呈多样化,进而导致上文“压缩模型”需要预先学习的译源长度比例的数量很多,如此易导致该“压缩模型”的构建时耗较大,故为了进一步提高“压缩模型”的构建效率,可以将大量的译源长度比例离散化成多个比例区间,以使该“压缩模型”只需针对这些比例区间进行学习即可。需要说明的是,“压缩模型”的构建过程请参见下文步骤51-步骤53的相关内容。
相应地,本申请实施例还提供了步骤211的又一种可能的实施方式,其具体可以包括步骤2111-步骤2112:
步骤2111:根据译文长度描述数据,确定待使用长度比例区间。
示例1,若上述“译文长度描述数据”包括译文期望长度,则步骤2111具体可以包括:先将该译文期望长度与待处理源文本的文本长度之间的比值,确定为译源期望长度比;再从至少一个候选长度比例区间中查找包括该译源期望长度比的候选长度比例区间,确定为待使用长度比例区间,以使该待使用长度比例区间包括该译源期望长度比。
需要说明的是,上述“至少一个候选长度比例区间”是指在上文“压缩模型”的构建过程中所需学习的比例区间;而且该“至少一个候选长度比例区间”的相关内容请参见下文步骤53的相关内容。
示例2,若上述“译文长度描述数据”包括译源期望长度比,则步骤2111具体可以包括:从至少一个候选长度比例区间中查找包括该译源期望长度比的候选长度比例区间,确定为待使用长度比例区间,以使该待使用长度比例区间包括该译源期望长度比。
步骤2112:根据待使用长度比例区间和主干源文本,确定待使用文本特征。
为了便于理解步骤2112,下面结合四种可能的实施方式进行说明。
在第一种可能的实施方式下,步骤2112具体可以包括步骤31-步骤32:
步骤31:将待使用长度比例区间和主干源文本进行拼接,得到第一文本,以使该第一文本包括待使用长度比例区间和主干源文本。
本申请实施例不限定步骤31的实施方式,例如,其具体可以包括:将待使用长度比例区间添加至主干源文本的头部位置,得到第一文本,以使该第一文本可以表示为{待使用长度比例区间,主干源文本}。
步骤32:对第一文本进行向量化处理,得到待使用文本特征。
需要说明的是,本申请实施例不限定“向量化处理”的实施方式,例如,其可以采用现有的或者未来出现的任意一种文本向量化方法(Word2Vec)进行实施。
基于上述步骤2112的第一种可能的实施方式可知,在获取到待使用长度比例区间和主干源文本之后,可以先将这两个数据进行拼接再进行向量化处理,得到待使用文本特征。
在第二种可能的实施方式下,步骤2112具体可以包括步骤41-步骤43:
步骤41:从预设映射关系中查找待使用长度比例区间对应的区间标识,得到待使用区间标识。其中,预设映射关系包括待使用长度比例区间与待使用区间标识之间的对应关系。
上述“预设映射关系”用于记录各个候选长度比例区间对应的区间标识;而且本申请实施例不限定该“预设映射关系”,例如,其具体可以包括:第1个候选长度比例区间与第1个区间标识之间的对应关系、第2个候选长度比例区间与第2个区间标识之间的对应关系、……(以此类推)、以及第Q个候选长度比例区间与第Q个区间标识之间的对应关系。其中,Q为正整数,且Q表示上述“至少一个候选长度比例区间”中候选长度比例区间的个数。
需要说明的是,上述“预设映射关系”的构建过程请参见下文步骤53的相关内容。
另外,上述“第q个区间标识”是指第q个候选长度比例区间对应的区间标识,以使该“第q个区间标识”用于代表该第q个候选长度比例区间;而且本申请实施例不限定第q个候选长度比例区间与第q个区间标识之间的关系,例如,该第q个区间标识(例如,[0.8])是根据该第q个候选长度比例区间(例如,0.7-1.1)中的一个比例值确定的。其中,q为正整数,q≤Q。
可见,在获取到待使用长度比例区间之后,可以先将该待使用长度比例区间与预设映射关系中各个候选长度比例区间分别进行匹配;再将匹配成功的候选长度比例区间所对应的区间标识,确定为待使用区间标识,以使该待使用区间标识能够代表该待使用长度比例区间。
步骤42:将待使用区间标识和主干源文本进行拼接,得到第二文本。
本申请实施例不限定步骤42的实施方式,例如,其具体可以包括:将待使用区间标识添加至主干源文本的头部位置,得到第二文本,以使该第二文本可以表示为{待使用区间标 识,主干源文本}。
步骤43:对第二文本进行向量化处理,得到待使用文本特征。
基于上述步骤2112的第二种可能的实施方式可知,在获取到待使用长度比例区间和主干源文本之后,可以先确定该待使用长度比例区间对应的区间标识;再针对该区间标识与主干源文本这两个数据依次进行拼接以及向量化处理,得到待使用文本特征。
在第三种可能的实施方式下,步骤2112具体可以包括:先对主干源文本进行向量化处理,得到文本表征向量,以使该文本表征向量能够表示出该主干源文本携带的字符信息;再将待使用长度比例区间和文本表征向量进行拼接,得到待使用文本特征。
需要说明的是,本申请实施例不限定上述“拼接”的实施方式,例如,可以将该待使用长度比例区间添加至该文本表征向量的头部位置,得到该待使用文本特征,以使该待使用文本特征中第1个特征值是上述“待使用长度比例区间”的左边界点、第2个特征值是上述“待使用长度比例区间”的右边界点、以及其他特征值均来自于上述“文本表征向量”。
基于上述步骤2112的第三种可能的实施方式可知,在获取到待使用长度比例区间和主干源文本之后,可以先将该主干源文本进行向量化处理,再将该待使用长度比例区间与向量化处理结果进行拼接,得到待使用文本特征。
在第四种可能的实施方式下,步骤2112具体可以包括:先对主干源文本进行向量化处理,得到文本表征向量,并从预设映射关系中查找待使用长度比例区间对应的区间标识,得到待使用区间标识;再将该待使用区间标识与该文本表征向量进行拼接,得到待使用文本特征。
需要说明的是,本申请实施例不限定上述“拼接”的实施方式,例如,可以将该待使用区间标识添加至该文本表征向量的头部位置,得到该待使用文本特征,以使该待使用文本特征中第1个特征值是上述“待使用区间标识”、以及其他特征值均来自于上述“文本表征向量”。
基于上述步骤211的相关内容可知,在获取到主干源文本之后,可以根据该主干源文本(以及译文长度描述数据),确定待使用文本特征,以使该待使用文本特征能够表示出该主干源文本携带的字符信息(以及该译文长度描述数据携带的译文长度信息),以便后续能够针对该待使用文本特征进行编码处理。
步骤212:根据待使用文本特征和译文长度描述数据,确定待使用位置特征。
上述“待使用位置特征”用于表示主干源文本携带的字符位置信息、以及译文长度描述数据携带的译文长度信息。
本申请实施例不限定步骤212的实施方式,例如,可以采用现有的或者未来出现的任意一种位置特征提取方法进行实施。
另外,为了进一步避免译文长度描述数据在整个模型处理过程被遗忘,本申请实施例还提供了步骤212的另一种可能的实施方式,下面结合示例进行说明。
作为示例,当上述“待使用文本特征”包括N个特征值时,步骤212具体可以包括步骤2121-步骤2122:
步骤2121:根据待使用文本特征中第n个特征值的位置索引、译文长度描述数据、以及第n个特征值的维度索引,确定该第n个特征值的位置编码结果。其中,n为正整数,n ≤N,N为正整数。
上述“第n个特征值的位置索引”用于表示该第n个特征值在上文“待使用文本特征”中所处位置。
上述“第n个特征值的维度索引”用于表示该第n个特征值的位置编码结果在上述“待使用位置特征”中所处位置。
本申请实施例不限定步骤2121的实施方式,为了便于理解,下面以两种可能的实施方式为例进行说明。
在第一种可能的实施方式中,步骤2121具体可以包括:根据译文期望长度与待使用文本特征中第n个特征值的位置索引之间的差值、以及该第n个特征值的维度索引,确定该第n个特征值的位置编码结果(如公式(1)-(2)所示)。
上述“译文期望长度”可以根据译文长度描述数据进行确定。例如,上述“译文长度描述数据”可以包括译文期望长度。又如,若上述“译文长度描述数据”包括译源期望长度比,则根据待处理源文本的文本长度与该译源期望长度比之间的乘积,确定为译文期望长度。
Figure PCTCN2022088961-appb-000001
Figure PCTCN2022088961-appb-000002
式中,若上述“第n个特征值的维度索引”为偶数,则2i表示第n个特征值的维度索引,且HLDPE (pos,len,2i)表示该第n个特征值的位置编码结果;若上述“第n个特征值的维度索引”为奇数,则2i+1表示第n个特征值的维度索引,且HLDPE (pos,len,2i+1)表示该第n个特征值的位置编码结果;pos表示第n个特征值的位置索引;len表示译文期望长度;d表示上述“待使用位置特征”的维度(也就是,上述“待使用文本特征”的维度)。
需要说明的是,基于公式(1)-(2)所确定的位置编码结果有利于更精准地控制译文长度。
在第二种可能的实施方式中,步骤2121具体可以包括:根据待使用文本特征中第n个特征值的位置索引与译文期望长度之间的比值、以及该第n个特征值的维度索引,确定该第n个特征值的位置编码结果(如公式(3)-(4)所示)。
Figure PCTCN2022088961-appb-000003
Figure PCTCN2022088961-appb-000004
式中,若上述“第n个特征值的维度索引”为偶数,则2i表示第n个特征值的维度索 引,且HLDPE (pos,len,2i)表示该第n个特征值的位置编码结果;若上述“第n个特征值的维度索引”为奇数,则2i+1表示第n个特征值的维度索引,且HLDPE (pos,len,2i+1)表示该第n个特征值的位置编码结果;pos表示第n个特征值的位置索引;len表示译文期望长度;d表示上述“待使用位置特征”的维度(也就是,上述“待使用文本特征”的维度)。
基于上述步骤2121的相关内容可知,对于待使用文本特征中第n个特征值来说,可以根据该第n个特征值的位置索引、译文长度描述数据、以及该第n个特征值的维度索引,确定该第n个特征值的位置编码结果,以使该位置编码结果不仅能够表示出该第n个特征值所表征的语义单元的文本位置信息,还能够表示出该译文长度描述数据携带的译文长度信息。其中,n为正整数,n≤N,N为正整数。
步骤2122:根据第1个特征值的位置编码结果至第N个特征值的位置编码结果,确定待使用位置特征。
本申请实施例中,在获取到第1个特征值的位置编码结果至第N个特征值的位置编码结果之后,可以将该N个特征值的位置编码结果进行集合,得到待使用位置特征,以使该待使用位置特征中第1维特征是上述“第1个特征值的位置编码结果”、第2维特征是上述“第2个特征值的位置编码结果”、……(以此类推)、以及第N维特征是上述“第N个特征值的位置编码结果”,从而使得该待使用位置特征的维度与上文“待使用文本特征”的维度保持一致,以便后续便于将待使用位置特征与该“待使用文本特征”进行加和处理。
基于上述步骤212的相关内容可知,在获取到待使用文本特征和译文长度描述数据之后,可以根据这两个数据,确定待使用位置特征,以使该待使用位置特征不仅能够表示出该主干源文本携带的字符位置信息,还能够表示出该译文长度描述数据携带的译文长度信息。
步骤213:根据待使用文本特征与待使用位置特征,得到待编码特征。
本申请实施例中,在获取到待使用文本特征与待使用位置特征之后,可以将这两个特征进行加和处理(或者,拼接处理),得到待编码特征,以使该待编码特征能够更好地表示出主干源文本携带的语义信息,以及该译文长度描述数据携带的译文长度信息。
基于上述步骤21的相关内容可知,在获取到主干源文本和译文长度描述数据之后,可以从这两个数据中提取出待编码特征,以使该待编码特征能够表示出主干源文本携带的语义信息,以及该译文长度描述数据携带的译文长度信息。
步骤22:将待编码特征输入编码器,得到该编码器输出的特征编码结果。
上述“编码器”用于针对该编码器的输入数据进行编码处理;而且本申请实施例不限定该“编码器”,可以采用现有的或者未来出现的任意一种编码网络进行实施。例如,当上文“压缩翻译模型”采用Transformer结构进行实施时,上述“编码器”可以包括多个编码层(也就是,Transformer中Encoder网络)。
步骤23:根据特征编码结果和解码器,确定待使用精简译文。
本申请实施例不限定上述“解码器”的实施方式,可以采用现有的或者未来出现的任意一种解码网络进行实施。例如,当上文“压缩翻译模型”采用Transformer结构进行实施时,上述“解码器”可以包括多个解码层(也就是,Transformer中Decoder网络)。
另外,为了进一步避免译文长度描述数据在整个模型处理过程被遗忘,可以将上文“译文长度描述数据”作为矢量融入到解码器,以激励该解码器能够尽可能地按照译文期望长度进行解码处理。基于此,本申请实施例还提供了步骤23的另一种可能的实施方式,其具体可以包括:根据特征编码结果、译文长度描述数据、以及解码器,确定待使用精简译文。
此外,为了进一步提高上文“译文长度描述数据”与解码器之间的融合效果,本申请实施例还提供了上述“解码器”的一种可能的实施方式,在该实施方式中,该“解码器”可以包括至少一个第一解码层。
上述“第一解码层”用于参考译文期望长度,对该第一解码层的输入数据进行解码处理(例如,图4中“解码网络0”所示的解码处理)。
另外,本申请实施例不限定上述“第一解码层”,例如,其可以包括:第一解码模块、信息融合模块和第一归一化模块;而且该第一归一化模块的输入数据包括该第一解码模块的输出数据和该信息融合模块的输出数据(如图4所示的“解码网络0”)。
上述“第一解码模块”用于针对该第一解码模块的输入数据进行解码处理;而且本申请实施例该“第一解码模块”,例如,如图4所示,其可以包括自注意力层(Self-Attention)、两个加和以及归一化层(Add&Normalize)、编解码注意力层(Encoder-Dncoder Attention)、以及前馈神经网络层(Feed Forward)。
上述“信息融合模块”用于将该信息融合模块的输入数据与译文期望长度进行相乘处理;而且本申请实施例不限定上述“信息融合模块的输入数据”,例如,该“信息融合模块的输入数据”可以是上述“第一解码层”的输入数据(例如,图4所示的解码网络0中自注意力层的输入数据)。
上述“第一归一化模块”用于将该第一归一化模块的输入数据进行加和以及归一化处理;而且本申请实施例不限定该“第一归一化模块”的实施方式,例如,如图4所示,其可以采用加和以及归一化层进行实施。
另外,本申请实施例不限定上述“第一归一化模块”的工作原理,例如,其具体可以包括:当第一解码层进行第一帧解码运算(也就是,针对上述“特征编码结果”所表征的第一个字符进行解码处理)时,该“第一归一化模块”可以针对上述“信息融合模块”的输出数据以及上述“第一解码模块”的输出数据进行加和以及归一化处理(如公式(5)所示);当第一解码层进行非首帧解码运算(也就是,针对上述“特征编码结果”所表征的非首个字符进行解码处理)时,该“第一归一化模块”可以只针对上述“第一解码模块”的输出数据进行加和以及归一化处理。
layer 1=LayerNorm(x×len+DM i(x))      (5)
式中,layer 1表示第一解码层的第一帧解码运算结果;x表示该第一解码层的输入数据(例如,图4所示的解码网络0中自注意力层的输入数据);len表示译文期望长度;x×len表示该第一解码层中信息融合模块的输出结果;DM i(x)表示该第一解码层中第一解码模块的输出结果;LayerNorm()表示该第一解码层中第一归一化模块的计算函数。
另外,本申请实施例还提供了上述“解码器”的另一种可能的实施方式,在该实施方式中,该“解码器”不仅包括至少一个第一解码层,可以还包括至少一个第二解码层。
上述“第二解码层”用于针对该解码层的输入数据进行解码处理;而且本申请实施例不限定该“第二解码层”,例如,其可以采用现有的或者未来出现的任意一种解码网络(例如,Transformer中Decoder网络)进行实施。为了便于理解,下面结合示例进行说明。
作为示例,上述“第二解码层”可以包括第二解码模块和第二归一化模块;而且该第二归一化模块的输入数据包括该第二解码模块的输出数据(如图4所示的“解码网络1”)。
上述“第二解码模块”类似于上文“第一解码模块”;而且上述“第二归一化模块”可以采用加和以及归一化层进行实施。
需要说明的是,上述“第二解码层”与上文“第一解码层”之间的区别是:该“第二解码层”进行解码时无需参考译文期望长度(如图4所示的“解码网络1”),但是该“第一解码层”进行解码时需要参考译文期望长度(如图4所示的“解码网络0”)。
为了便于理解上述“解码器”,下面结合示例进行说明。
作为示例,上述“解码器”可以包括1个第一解码层和J个第二解码层。其中,第1个第二解码层的输入数据包括第一解码层的输出数据;第j个第二解码层的输入数据包括第j-1个第二解码层的输出数据,j为正整数,2≤j≤J;J为正整数。
可见,对于包括1个第一解码层和J个第二解码层的解码器(如图4所示的J=1的解码器)来说,该解码器中第J个第二解码层的输入数据包括第J-1个第二解码层的输出数据,第J-1个第二解码层的输入数据包括第J-2个第二解码层的输出数据,……(以此类推),第3个第二解码层的输入数据包括第2个第二解码层的输出数据,第2个第二解码层的输入数据包括第1个第二解码层的输出数据,以及第1个第二解码层的输入数据包括该第一解码层的输出数据。
基于上述“解码器”的相关内容可知,对于解码器来说,该解码器可以将译文期望长度作为约束添加至该解码器的初始层,以使该译文期望长度能够在该解码器中逐层传递;而且其具体过程可以为:在该解码器进行初始层运算时,可以将期望长度信息作为矢量与该解码器的初始层输入进行相乘后,得到期望信息长度单元,以使该期望信息长度单元在解码器中逐层传播,并通过每一层中的前向传播运算与非线性映射变换实现逐层衰减,最终激励翻译模型翻译出更贴近期望长度的文本序列。
可见,上述“解码器”是采用启发式融合解码方式进行实施;而且该启发式融合解码方式能够将译文期望长度作为矢量融入到该解码器中,以激励包括该解码器的压缩翻译模型能够按照该译文期望长度进行句式改写,如此能够实现针对不重要信息的删减处理、以及在表达相同语义的情况下,将较长的表述转换为更简洁的表述,从而有利于提高压缩翻译效果。
需要说明的是,对于压缩翻译模型中每一网络层来说,每一网络层中非线性激活函数就像一扇门,它可以从每一网络层的特定单元中过滤一些信息。其中,因信息衰减在非线性激活函数中逐层发生,以使不同的期望长度具有不同程度的信息衰减,从而使得该压缩翻译模型在期望长度信息的启发下,能够通过其自身的长度信息衰减来学习生成句子结束符EOS的可能性,如此使得该压缩翻译模型能够根据在给定的译文期望长度约束下生成自然而完整的压缩翻译结果。
基于上述步骤21至步骤23的相关内容可知,如图5所示,对于包括编码器和译码器 的压缩翻译模型来说,在获取到主干源文本之后,可以先将译文长度描述数据引入编码器,以使该编码器能够参考该译文长度描述数据,对该主干源文本进行编码处理,得到特征编码结果;再将译文长度描述数据引入解码器,以使该解码器能够参考该译文长度描述数据,对该特征编码结果进行解码处理,得到待使用精简译文,如此能够实现在尽量少的删除信息的前提下针对表述方式进行简短改写,从而能够实现基于端到端的长度可控的简化压缩翻译处理,进而能够使得针对待处理源文本的翻译结果更加精炼。
需要说明的是,对于图4和图5来说,图4中“<r>”以及图5中“<r>”均代表待使用长度比例区间;而且本申请实施例不限定<r>,例如,其可以是一个区间,也可以是一个区间标识(例如,待使用区间标识)。另外,图4中“X”以及图5中“X”均代表主干源文本。
还需要说明的是,本申请实施例不限定图4中“线性层”的实施方式,可以采用现有的或者未来出现的任意一种线性层(Linear)进行实施。另外,本申请实施例也不限定图4中“决策层”的实施方式,可以采用现有的或者未来出现的任意一种决策层(例如,Softmax)进行实施。此外,本申请实施例也不限定图4中“编解码注意力层”的实施方式,可以采用现有的或者未来出现的任意一种基于编码器的输出数据进行注意力处理的方法(例如,Transformer中多头注意力层(Multi-Head Attention))进行实施。
另外,本申请实施例还提供了构建上述“压缩翻译模型”的一种可能的实施方式,其具体可以包括步骤51-步骤53:
步骤51:获取至少一个样本原文和该至少一个样本原文对应的实际译文。
上述“样本原文”是指在构建压缩翻译模型时所需使用的源语言下的文本数据;而且本申请实施例不限定上述“样本原文”的个数,例如,其可以为D。其中,D为正整数。
“第d个样本原文对应的实际译文”是指该第d个样本原文在目标语言下的实际翻译结果;而且本申请实施例不限定该“第d个样本原文对应的实际译文”,例如,为了尽可能地避免译文端词数多于源文端词数的现象,该“第d个样本原文对应的实际译文”的文本长度比较接近于(甚至小于)上述“样本原文”的文本长度。其中,d为正整数,d≤D。
需要说明的是,本申请实施例不限定各个样本原文及其对应的实际译文的获取过程。
步骤52:根据各样本原文对应的实际译文的文本长度,确定各样本原文对应的译文长度描述数据。
“第d个样本原文对应的译文长度描述数据”用于描述该第d个样本原文在目标语言下翻译结果的文本长度;而且本申请实施例不限定该“第d个样本原文对应的译文长度描述数据”,例如,其可以包括:第d个样本原文对应的实际译文的文本长度与该第d个样本原文的文本长度之间的比值、以及该第d个样本原文对应的实际译文的文本长度中的至少一个。其中,d为正整数,d≤D。
步骤53:根据至少一个样本原文、该至少一个样本原文对应的译文长度描述数据、以及该至少一个样本原文对应的实际译文,构建压缩翻译模型。
作为示例,步骤53具体可以包括步骤531-步骤538:
步骤531:根据至少一个样本原文对应的译文长度描述数据,确定至少一个候选长度比例区间、以及预设映射关系。
作为示例,步骤531具体可以包括步骤5311-步骤5316:
5311:根据第d个样本原文对应的译文长度描述数据,确定该第d个样本原文对应的译源长度比值。其中,d为正整数,d≤D。
上述“第d个样本原文对应的译源长度比值”是指第d个样本原文对应的实际译文的文本长度与该第d个样本原文的文本长度之间的比值。
5312:根据D个样本原文对应的译源长度比值,确定待使用比值范围。
本申请实施例中,在获取到D个样本原文对应的译源长度比值之后,可以将这D个译源长度比值中最大值,确定为待使用比值范围的上限值,并将这D个译源长度比值中最小值,确定为待使用比值范围的下限值。
5313:将待使用比值范围划分成至少一个候选长度比例区间。
本申请实施例中,在获取到待使用比值范围之后,可以将该待使用比值范围平均划分为Q个候选长度比例区间。其中,Q表示上述“至少一个候选长度比例区间”中候选长度比例区间个数。
5314:根据各个候选长度比例区间,分别确定各个候选长度比例区间对应的区间标识。
“第q个候选长度比例区间对应的区间标识”用于代表该第q个候选长度比例区间。其中,q为正整数,q≤Q。
另外,本申请实施例不限定上述“第q个候选长度比例区间对应的区间标识”的确定过程,例如,可以根据该第q个候选长度比例区间(例如,0.7-1.1)中的一个比例值,确定该第q个候选长度比例区间对应的区间标识(例如,[0.8])。其中,q为正整数,q≤Q。
5315:建立第q个候选长度比例区间与该第q个候选长度比例区间对应的区间标识之间的对应关系。
5316:将第1个候选长度比例区间与该第1个候选长度比例区间对应的区间标识之间的对应关系、……、以及第Q个候选长度比例区间与该第Q个候选长度比例区间对应的区间标识之间的对应关系进行集合处理,得到预设映射关系。
基于上述步骤531的相关内容可知,在获取到至少一个样本原文对应的译文长度描述数据之后,可以根据这些样本原文对应的译文长度描述数据,确定至少一个候选长度比例区间、以及预设映射关系。
步骤532:根据第d个样本原文对应的译文长度描述数据、至少一个候选长度比例区间、以及预设映射关系,确定该第d个样本原文对应的长度比例区间标识。其中,d为正整数,d≤D。
需要说明的是,上述“第d个样本原文对应的长度比例区间标识”的确定过程类似于上文“待使用区间标识”的确定过程。
步骤534:根据第d个样本原文、以及该第d个样本原文对应的长度比例区间标识,确定该第d个样本原文对应的文本提取特征。其中,d为正整数,d≤D。
需要说明的是,上述“第d个样本原文对应的文本提取特征”的确定过程类似于上文“待使用文本特征”的确定过程。
步骤535:将第d个样本原文对应的文本提取特征输入压缩翻译模型,得到该压缩翻译模型输出的该第d个样本原文对应的模型预测翻译结果。其中,d为正整数,d≤D。
步骤536:判断是否达到预设结束条件,若是,则执行步骤538;若否,则执行步骤537。
上述“预设结束条件”可以预先设定,例如,其可以包括:压缩翻译模型的模型损失值低于预设损失阈值,该压缩翻译模型的模型损失值变化率低于预设变化率阈值(也就是,模型达到收敛),以及该压缩翻译模型的更新次数达到预设次数阈值中的至少一个。
上述“压缩翻译模型的模型损失值”用于表示该压缩翻译模型的压缩翻译性能;而且本申请实施例不限定该“压缩翻译模型的模型损失值”的确定过程,例如,可以采用现有的或者未来出现的任意一种模型损失值确定方法进行实施。
步骤537:根据至少一个样本原文对应的模型预测翻译结果、以及该至少一个样本原文对应的实际译文,更新压缩翻译模型,并返回执行步骤535。
本申请实施例中,在确定未达到预设结束条件之后,可以确定当前轮的压缩翻译模型的压缩翻译效果不太好,故可以参考这些样本原文对应的模型预测翻译结果与其对应的实际译文之间的差异性,更新压缩翻译模型,以使更新后的压缩翻译模型具有更好的压缩翻译效果,以便后续基于更新后的压缩翻译模型,返回继续执行步骤535及其后续步骤,以实现下一轮训练过程。
步骤538:保存压缩翻译模型。
本申请实施例中,在确定达到预设结束条件之后,可以确定当前轮的压缩翻译模型的压缩翻译效果比较好,故可以保存该压缩翻译模型,以便后续能够利用该压缩翻译模型参与同传翻译过程。
基于上述步骤51至步骤53的相关内容可知,对于压缩翻译模型的构建过程来说,首先,针对双语训练数据,计算目标端文本与源端文本的长度比例;再将这些长度比例离散化到多个区间,并利用一个比例标记作为该区间的比例的代表。然后,在模型训练过程中,对不同比例标记下的句对进行采样,保持不同比例标记下的句对的数据量处于一个较为平衡的状态,以使压缩翻译模型中编码器可以将长度区间标记的信息融入到句子中的各个单词的隐层向量表示中,如此能够将带有同一比例标记的文本向量投影到编码器语义表示向量空间中的对应该比例信息的向量聚类簇下,从而使得整个编码器的语义表示向量空间中会形成多个比例标记对应的聚类簇。可见,通过模型整体的训练可以学习到带有不同比例标记的源文向量与不同长度的目标文本向量之间的映射。例如,为了期望对源文进行0.7-1.1之间的翻译压缩,可以在输入文本的首部位置添加一个长度标记(如[0.8]),以期望模型能够输出一段压缩比例位于该标记对应的区间范围的翻译文本。
方法实施例四
为了进一步提高针对待处理源文本的翻译结果的长度进行更精准地控制,可以针对每个语音片段的压缩翻译结果进行文本截取处理,以使各个语音片段的上屏译文严格地满足各个语音片段对应的译文期望长度。
为了实现上述需求,本申请实施例还提供了上述“翻译方法”的另一种可能的实施方式,在该实施方式中,该翻译方法不仅包括S1-S2,还包括S4-S6:
S4:根据主干源文本、译文长度描述数据、压缩翻译模型、以及至少一个历史遗留语 义单元,确定待使用精简译文。
上述“至少一个历史遗留语义单元”是指前一次压缩翻译结果中因超过译文期望长度而未使用(例如,未上屏、或者未发送给用户)的语义单元。作为示例,当前一次压缩翻译结果为“Artificial intelligence is loved by all countries”,且该前一次压缩翻译结果对应的译文期望长度为5个词汇时,则该前一次压缩翻译结果对应的待使用字符为“Artificial intelligence is loved by”,而且该前一次压缩翻译结果对应的未使用字符为“all countries”,以使当前次压缩翻译过程所对应的“至少一个历史遗留语义单元”为“all countries”。需要说明的是,上述“待使用字符”是指发送给用户的翻译字符。
上述“前一次压缩翻译结果”是指针对当前语音段的前一个语音片段进行压缩翻译所得的结果。可见,上述“至少一个历史遗留语义单元”可以根据当前语音段的前一个语音片段确定。
上述“当前语音段的前一个语音片段”的采集时间与该“当前语音段”的采集时间相邻;而且该“当前语音段的前一个语音片段”的采集时间早于该“当前语音段”的采集时间。例如,如图2所示,若当前语音段是“第三语音片段”,则该“当前语音段的前一个语音片段”就是“第二语音片段”。
另外,本申请实施例不限定S4的实施方式,例如,当上述“至少一个历史遗留语义单元”包括K个历史遗留语义单元,上述“待使用精简译文”中语义单元个数为G,且G≥K时,上述“待使用精简译文”中第g个语义单元的确定过程,包括步骤61-步骤62:
步骤61:若g≤K,则根据主干源文本、译文长度描述数据、压缩翻译模型、以及第g个历史遗留语义单元,确定待使用精简译文中第g个语义单元。其中,g为正整数,g≤K。
作为示例,步骤61具体可以包括步骤611-步骤614:
步骤611:根据主干源文本、译文长度描述数据、以及压缩翻译模型,确定第g状态下模型预测概率。
上述“第g状态下模型预测概率”是指由压缩翻译模型针对主干源文本进行压缩翻译所得的第g个语义单元的分布概率(例如,由图4所示的“决策层”所输出的第g个语义单元的预测概率),以使该“第g状态下模型预测概率”用于表示该主干源文本的压缩翻译结果中第g个语义单元是各个候选语义单元(例如,各个候选词汇)的可能性。
另外,本申请实施例不限定步骤611的实施方式,可以采用上文“压缩翻译模型”所具有的针对该主干源文本的压缩翻译结果中第g个语义单元进行预测的工作原理进行实施;而且该实施过程具体可以为:当g=1时,根据主干源文本、译文长度描述数据、以及压缩翻译模型,确定第g状态下模型预测概率;当2≤g≤K时,根据主干源文本、译文长度描述数据、第g-1状态下预测校正概率、以及压缩翻译模型,确定第g状态下模型预测概率。可见,对于图4所示的压缩翻译模型来说,当2≤g≤K时,解码器对应的“输出(以前的)”包括第g-1状态下预测校正概率。
需要说明的是,上文所示的“压缩翻译模型”的任一实施方式均可以应用于步骤611。
步骤612:根据第g状态下模型预测概率与第g个历史遗留语义单元的对象预测概率,确定惩罚因子值(如公式(6)所示)。
punish g=δ(y g,y′ g)      (6)
式中,punish g表示惩罚因子值;y g表示第g状态下模型预测概率;y′ g表示第g个历史遗留语义单元的对象预测概率;δ(y g,y′ g)用于表示第g状态下模型预测概率与第g个历史遗留语义单元的对象预测概率的模拟退火分布。
上述“第g个历史遗留语义单元的对象预测概率”是指前一次压缩翻译结果中倒数第K-g+1个语义单元的概率分布,以使该“第g个历史遗留语义单元的对象预测概率”用于表示前一次压缩翻译结果中倒数第K-g+1个语义单元是各个候选语义单元(例如,各个候选词汇)的可能性。
另外,本申请实施例不限定上述“第g个历史遗留语义单元的对象预测概率”,例如,如果在前一次压缩翻译结果的确定过程中不存在上述“倒数第K-g+1个语义单元”对应的惩罚因子值,则可以确定未针对该“倒数第K-g+1个语义单元”的模型预测概率进行校正处理,故可以直接将该“倒数第K-g+1个语义单元”的模型预测概率,确定为上述“第g个历史遗留语义单元的对象预测概率”即可;但是,如果在前一次压缩翻译结果的确定过程中存在上述“倒数第K-g+1个语义单元”对应的惩罚因子值,则可以确定已针对该“倒数第K-g+1个语义单元”的模型预测概率进行校正处理,故可以将该“倒数第K-g+1个语义单元”的预测校正概率,确定为上述“第g个历史遗留语义单元的对象预测概率”。
步骤613:将第g状态下模型预测概率和与惩罚因子值进行加权求和处理,得到该第g状态下预测校正概率(如公式(7)所示)。
p g=(1-β)×y g+β×δ(y g,y′ g)      (7)
式中,p g表示第g状态下预测校正概率;y g表示第g状态下模型预测概率;y′ g表示第g个历史遗留语义单元的对象预测概率;δ(y g,y′ g)用于表示第g状态下模型预测概率与第g个历史遗留语义单元的对象预测概率的模拟退火分布;β为调和率,其值的范围在0-1之间,可根据实际需要进行调整,而且该调整策略具体为:若需要翻译结果更加完整自然贴合上文,则需要将β值调大,反之若需要较短的翻译结果,则需要将β值调小。可见,通过设置调和比率β,我们可以在更精准控制翻译结果长度的同时得到更流畅的压缩结果。
步骤614:根据第g状态下预测校正概率,确定第g个语义单元。
本申请实施例中,在获取到第g状态下预测校正概率之后,可以按照该第g状态下预测校正概率,确定第g个语义单元(例如,直接将该第g状态下预测校正概率中具有最高概率值的候选语义单元,确定为该第g个语义单元)。
基于上述步骤61的相关内容可知,若历史遗留语义单元的个数为K,则在针对当前语音段进行压缩翻译时,可以参考这K个历史遗留语义单元,确定该当前语音段的翻译结果 中前K个语义单元,以使该K个语义单元能够尽可能地表示出这K个历史遗留语义单元携带的语义信息,如此能够有效地避免因针对前一次压缩翻译结果进行强制性截取处理而导致信息遗漏现象,从而使得针对语音流的实时翻译更加自然流畅,如此有利于提高压缩翻译效果。
步骤62:若K<g≤G,则根据主干源文本、译文长度描述数据、压缩翻译模型、以及第g个历史遗留语义单元,确定待使用精简译文中第g个语义单元。其中,g为正整数,K<g≤G,G为正整数,G≥K,G表示上述“待使用精简译文”中语义单元个数。
需要说明的是,本申请实施例不限定步骤62的实施方式,例如,其可以采用上文“压缩翻译模型”所具有的针对该主干源文本的压缩翻译结果中第g个语义单元进行预测的工作原理进行实施;而且该实施过程具体可以为:先根据主干源文本、译文长度描述数据、压缩翻译模型、以及第g个历史遗留语义单元,确定第g状态下模型预测概率;再根据该第g状态下模型预测概率,确定第g个语义单元(例如,直接将该第g状态下模型预测概率中具有最高概率值的候选语义单元,确定为该第g个语义单元)。
需要说明的是,上文所示的“压缩翻译模型”的任一实施方式均可以被应用于步骤62中实现“根据主干源文本、译文长度描述数据、压缩翻译模型、以及第g个历史遗留语义单元,确定第g状态下模型预测概率”。
基于上述步骤61至步骤62的相关内容可知,若历史遗留语义单元的个数为K,则对于当前语音段的翻译结果(也就是“待使用精简译文”)来说,该翻译结果中前K个语义单元是参考这K个历史遗留语义单元,但该翻译结果中第K+1语义单元及其以后的语义单元均是按照传统的模型预测方式进行实施的,以使该翻译结果不仅能够表示出该当前语音段携带的语义信息,还能够表示出这K个历史遗留语义单元携带的语义信息,如此能够有效地避免因针对前一次压缩翻译结果进行强制性截取处理而导致信息遗漏现象,从而使得针对语音流的实时翻译更加自然流畅,如此有利于提高压缩翻译效果。
基于上述S4的相关内容可知,在获取到主干源文本之后,可以由压缩翻译模型参考译文长度描述数据以及至少一个历史遗留语义单元,对该主干源文本进行压缩翻译处理,得到待使用精简译文,以使该待使用精简译文不仅能够表示出该当前语音段携带的语义信息,还能够表示出这K个历史遗留语义单元携带的语义信息,如此能够有效地避免因针对前一次压缩翻译结果进行强制性截取处理而导致信息遗漏现象,从而使得针对语音流的实时翻译更加自然流畅,如此有利于提高压缩翻译效果。
S5:按照译文期望长度,对待使用精简译文进行划分处理,得到待使用译文和待舍弃译文。
上述“待使用译文”是指在当前语音段(或者当前文本)的翻译结果(也就是,“待使用精简译文”)中需要发送给用户的文本信息(例如,类似于上文“Artificial intelligence is loved by”);而且该“待使用译文”的文本长度为译文期望长度。可见,在获取到待使用译文之后,可以将该待使用译文发送给用户(例如,显示在显示屏上),以使用户能够获知针对当前语音段的翻译结果。
上述“待舍弃译文”是指在当前语音段的翻译结果(也就是,“待使用精简译文”)中不需要发送给用户的文本信息(例如,类似于上文“all countries”)。
S6:根据待舍弃译文,更新至少一个历史遗留语义单元。
本申请实施例中,在获取到待舍弃译文之后,可以直接将该待舍弃译文,确定为更新后的历史遗留语义单元即可,以便在针对后一个语音片段的压缩翻译过程中能够参考更新后的历史遗留语义单元进行实施,如此能够有效地避免因针对当前语音段的翻译结果进行强制性截取处理而导致信息遗漏现象,从而使得针对语音流的实时翻译更加自然流畅,如此有利于提高压缩翻译效果。
基于上述S4至S6的相关内容可知,对于从当前语音段中提取的主干源文本来说,可以先由压缩翻译模型参考译文长度描述数据以及至少一个历史遗留语义单元,对该主干源文本进行压缩翻译处理,得到待使用精简译文;再按照该译文长度描述数据所表征的译文长度,针对该待使用精简译文进行裁剪处理,得到待使用译文,以使该待使用译文的文本长度为该译文长度,从而使得该待使用译文能够严格地遵循该译文长度约束,如此有利于提高压缩翻译效果。
基于上述方法实施例提供的翻译方法,本申请实施例还提供了一种翻译装置,下面结合附图进行解释和说明。
装置实施例
装置实施例对翻译装置进行介绍,相关内容请参见上述方法实施例。
参见图6,该图为本申请实施例提供的一种翻译装置的结构示意图。
本申请实施例提供的翻译装置600,包括:
文本获取单元601,用于获取待处理源文本;
主干抽取单元602,用于从所述待处理源文本中抽取主干源文本;
压缩翻译单元603,用于根据所述主干源文本、译文长度描述数据、以及预先构建的压缩翻译模型,确定待使用精简译文;其中,所述压缩翻译模型用于参考所述译文长度描述数据,对所述主干源文本进行压缩翻译。
在一种可能的实施方式中,所述压缩翻译模型包括编码器和解码器;所述压缩翻译单元603,具体用于:根据所述主干源文本和所述译文长度描述数据,确定待编码特征;将所述待编码特征输入所述编码器,得到所述编码器输出的特征编码结果;根据所述特征编码结果和所述解码器,确定所述待使用精简译文。
在一种可能的实施方式中,所述待编码特征的确定过程,包括:根据所述主干源文本,确定待使用文本特征;根据所述待使用文本特征和所述译文长度描述数据,确定待使用位置特征;根据所述待使用文本特征和所述待使用位置特征,确定所述待编码特征。
在一种可能的实施方式中,所述待使用文本特征的确定过程,包括:根据所述主干源文本和所述译文长度描述数据,确定待使用文本特征。
在一种可能的实施方式中,所述待使用文本特征的确定过程,包括:根据所述译文长度描述数据,确定待使用长度比例区间;根据所述待使用长度比例区间和所述主干源文本,确定所述待使用文本特征。
在一种可能的实施方式中,所述待使用文本特征的确定过程,包括:将所述待使用长度比例区间和所述主干源文本进行拼接,得到第一文本;对所述第一文本进行向量化处理, 得到所述待使用文本特征。
在一种可能的实施方式中,所述待使用文本特征的确定过程,包括:从预设映射关系中查找所述待使用长度比例区间对应的区间标识,得到待使用区间标识;将所述待使用区间标识和所述主干源文本进行拼接,得到第二文本;对所述第二文本进行向量化处理,得到所述待使用文本特征;其中,所述预设映射关系包括所述待使用长度比例区间与所述待使用区间标识之间的对应关系。
在一种可能的实施方式中,所述待使用文本特征的确定过程,包括:对所述主干源文本进行向量化处理,得到文本表征向量;将所述待使用长度比例区间和所述文本表征向量进行拼接,得到所述待使用文本特征。
在一种可能的实施方式中,所述待使用文本特征的确定过程,包括:对所述主干源文本进行向量化处理,得到文本表征向量;从预设映射关系中查找所述待使用长度比例区间对应的区间标识,得到待使用区间标识;将所述待使用区间标识与所述文本表征向量进行拼接,得到所述待使用文本特征;其中,所述预设映射关系包括所述待使用长度比例区间与所述待使用区间标识之间的对应关系。
在一种可能的实施方式中,所述待使用文本特征包括N个特征值;其中,N为正整数;所述待使用位置特征的确定过程,包括:根据所述待使用文本特征中第n个特征值的位置索引、所述译文长度描述数据、以及所述第n个特征值的维度索引,确定所述第n个特征值的位置编码结果;其中,n为正整数,n≤N;根据第1个特征值的位置编码结果至第N个特征值的位置编码结果,确定待使用位置特征。
在一种可能的实施方式中,所述所述第n个特征值的位置编码结果的确定过程,包括:根据译文期望长度与所述待使用文本特征中第n个特征值的位置索引之间的差值、以及所述第n个特征值的维度索引,确定所述第n个特征值的位置编码结果;其中,所述译文期望长度是根据所述译文长度描述数据确定的。
在一种可能的实施方式中,所述第n个特征值的位置编码结果的确定过程,包括:根据所述待使用文本特征中第n个特征值的位置索引与译文期望长度之间的比值、以及所述第n个特征值的维度索引,确定所述第n个特征值的位置编码结果;其中,所述译文期望长度是根据所述译文长度描述数据确定的。
在一种可能的实施方式中,所述待使用精简译文的确定过程,包括:根据所述特征编码结果、所述译文长度描述数据、以及所述解码器,确定所述待使用精简译文;其中,所述解码器用于参考所述译文长度描述数据,对所述特征编码结果进行解码处理。
在一种可能的实施方式中,所述解码器包括至少一个第一解码层;其中,所述第一解码层包括第一解码模块、信息融合模块和第一归一化模块;所述第一归一化模块的输入数据包括所述第一解码模块的输出数据和所述信息融合模块的输出数据;所述信息融合模块用于将所述信息融合模块的输入数据与译文期望长度进行相乘处理;其中,所述译文期望长度是根据所述译文长度描述数据确定的。
在一种可能的实施方式中,所述解码器还包括至少一个第二解码层;所述第二解码层包括第二解码模块和第二归一化模块;其中,所述第二归一化模块的输入数据包括所述第二解码模块的输出数据。
在一种可能的实施方式中,所述解码器包括1个第一解码层和J个第二解码层;其中,所述第1个第二解码层的输入数据包括所述第一解码层的输出数据;所述第j个第二解码层的输入数据包括所述第j-1个第二解码层的输出数据,j为正整数,2≤j≤J。
在一种可能的实施方式中,所述文本获取单元601,具体用于:在获取到当前语音段之后,对所述当前语音段进行语音识别处理,得到所述待处理源文本。
在一种可能的实施方式中,所述压缩翻译单元603,具体用于:根据所述主干源文本、所述译文长度描述数据、所述压缩翻译模型、以及至少一个历史遗留语义单元,确定待使用精简译文;其中,所述至少一个历史遗留语义单元是根据所述当前语音段的前一个语音片段确定的。
在一种可能的实施方式中,所述翻译装置600还包括:
文本划分单元,用于按照译文期望长度,对所述待使用精简译文进行划分处理,得到待使用译文和待舍弃译文;其中,所述待使用译文的文本长度为所述译文期望长度;所述译文期望长度是根据所述译文长度描述数据确定的;
历史更新单元,用于根据所述待舍弃译文,更新所述至少一个历史遗留语义单元。
在一种可能的实施方式中,所述历史遗留语义单元的个数为K;其中,K为正整数;所述待使用精简译文中语义单元个数≥K;
所述待使用精简译文中第k个语义单元的确定过程,包括:根据所述主干源文本、译文长度描述数据、以及所述压缩翻译模型,确定第k状态下模型预测概率;其中,k为正整数,k≤K;根据所述第k状态下模型预测概率与第k个历史遗留语义单元的对象预测概率,确定惩罚因子值;将所述第k状态下模型预测概率和与所述惩罚因子值进行加权求和处理,得到第k状态下预测校正概率;根据所述第k状态下预测校正概率,确定所述第k个语义单元。
在一种可能的实施方式中,所述主干抽取单元602,包括:
句法分析子单元,用于对所述待处理源文本进行依存句法分析处理,得到依存句法分析结果;
词性标注子单元,用于对所述待处理源文本进行词性标注处理,得到词性标注结果;
重要性表征子单元,用于根据所述依存句法分析结果和所述词性标注结果,确定词汇重要性表征数据;
文本确定子单元,用于根据所述词汇重要性表征数据和所述待处理源文本,确定所述主干源文本。
在一种可能的实施方式中,所述词汇重要性表征数据包括待使用多叉树;
所述文本确定子单元,具体用于:根据所述待使用多叉树,确定待删除节点;根据所述待删除节点对应的删除后文本长度与所述待处理源文本的文本长度,确定所述待删除节点的删除识别结果;若所述待删除节点的删除识别结果满足预设删除条件,则将所述待删除节点从所述待使用多叉树中删除,并继续执行所述根据所述待使用多叉树,确定待删除节点的步骤;若所述待删除节点的删除识别结果不满足预设删除条件,则继续执行所述根据所述待使用多叉树,确定待删除节点的步骤;直至在达到预设停止条件时,根据所述待使用多叉树和所述待处理源文本,确定所述主干源文本。
在一种可能的实施方式中,所述待删除节点的删除识别结果的确定过程,包括:从所述待使用多叉树中预删除所述待删除节点,得到预删除后多叉树;根据所述预删除后多叉树和所述待处理源文本,确定所述待删除节点对应的删除后文本长度;确定所述删除后文本长度与所述待处理源文本的文本长度之间的长度比值;将所述长度比值与预设比值阈值进行比较,得到待使用比较结果;根据所述待使用比较结果,确定所述待删除节点的删除识别结果。
在一种可能的实施方式中,所述翻译装置600,还包括:
模型构建单元,用于获取至少一个样本原文和所述至少一个样本原文对应的实际译文;根据各所述样本原文对应的实际译文的文本长度,确定各所述样本原文对应的译文长度描述数据;根据所述至少一个样本原文、所述至少一个样本原文对应的译文长度描述数据、以及所述至少一个样本原文对应的实际译文,构建所述压缩翻译模型。
进一步地,本申请实施例还提供了一种设备,包括:处理器、存储器、系统总线;
所述处理器以及所述存储器通过所述系统总线相连;
所述存储器用于存储一个或多个程序,所述一个或多个程序包括指令,所述指令当被所述处理器执行时使所述处理器执行上述翻译方法的任一种实现方法。
进一步地,本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当所述指令在终端设备上运行时,使得所述终端设备执行上述翻译方法的任一种实现方法。
进一步地,本申请实施例还提供了一种计算机程序产品,所述计算机程序产品在终端设备上运行时,使得所述终端设备执行上述翻译方法的任一种实现方法。
通过以上的实施方式的描述可知,本领域的技术人员可以清楚地了解到上述实施例方法中的全部或部分步骤可借助软件加必需的通用硬件平台的方式来实现。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者诸如媒体网关等网络通信设备,等等)执行本申请各个实施例或者实施例的某些部分所述的方法。
需要说明的是,本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。对于实施例公开的装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。
还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
对所公开的实施例的上述说明,使本领域专业技术人员能够实现或使用本申请。对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本申请的精神或范围的情况下,在其它实施例中实现。因此,本申请将不会被限制于本文所示的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。

Claims (24)

  1. 一种翻译方法,其特征在于,所述方法包括:
    获取待处理源文本;
    从所述待处理源文本中抽取主干源文本;
    根据所述主干源文本、译文长度描述数据、以及预先构建的压缩翻译模型,确定待使用精简译文;其中,所述压缩翻译模型用于参考所述译文长度描述数据,对所述主干源文本进行压缩翻译。
  2. 根据权利要求1所述的方法,其特征在于,所述压缩翻译模型包括编码器和解码器;
    所述待使用精简译文的确定过程,包括:
    根据所述主干源文本和所述译文长度描述数据,确定待编码特征;
    将所述待编码特征输入所述编码器,得到所述编码器输出的特征编码结果;
    根据所述特征编码结果和所述解码器,确定所述待使用精简译文。
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述主干源文本和所述译文长度描述数据,确定待编码特征,包括:
    根据所述主干源文本,确定待使用文本特征;
    根据所述待使用文本特征和所述译文长度描述数据,确定待使用位置特征;
    根据所述待使用文本特征和所述待使用位置特征,确定所述待编码特征。
  4. 根据权利要求3所述的方法,其特征在于,所述根据所述主干源文本,确定待使用文本特征,包括:
    根据所述主干源文本和所述译文长度描述数据,确定待使用文本特征。
  5. 根据权利要求4所述的方法,其特征在于,所述根据所述主干源文本和所述译文长度描述数据,确定待使用文本特征,包括:
    根据所述译文长度描述数据,确定待使用长度比例区间;
    根据所述待使用长度比例区间和所述主干源文本,确定所述待使用文本特征。
  6. 根据权利要求5所述的方法,其特征在于,所述根据所述待使用长度比例区间和所述主干源文本,确定所述待使用文本特征,包括:
    将所述待使用长度比例区间和所述主干源文本进行拼接,得到第一文本;对所述第一文本进行向量化处理,得到所述待使用文本特征;
    或者,
    所述根据所述待使用长度比例区间和所述主干源文本,确定所述待使用文本特征,包括:
    从预设映射关系中查找所述待使用长度比例区间对应的区间标识,得到待使用区间标识;将所述待使用区间标识和所述主干源文本进行拼接,得到第二文本;对所述第二文本进行向量化处理,得到所述待使用文本特征;其中,所述预设映射关系包括所述待使用长度比例区间与所述待使用区间标识之间的对应关系;
    或者,
    所述根据所述待使用长度比例区间和所述主干源文本,确定所述待使用文本特征,包括:
    对所述主干源文本进行向量化处理,得到文本表征向量;将所述待使用长度比例区间和所述文本表征向量进行拼接,得到所述待使用文本特征;
    或者,
    所述根据所述待使用长度比例区间和所述主干源文本,确定所述待使用文本特征,包括:
    对所述主干源文本进行向量化处理,得到文本表征向量;从预设映射关系中查找所述待使用长度比例区间对应的区间标识,得到待使用区间标识;将所述待使用区间标识与所述文本表征向量进行拼接,得到所述待使用文本特征;其中,所述预设映射关系包括所述待使用长度比例区间与所述待使用区间标识之间的对应关系。
  7. 根据权利要求3所述的方法,其特征在于,所述待使用文本特征包括N个特征值;其中,N为正整数;
    所述待使用位置特征的确定过程,包括:
    根据所述待使用文本特征中第n个特征值的位置索引、所述译文长度描述数据、以及所述第n个特征值的维度索引,确定所述第n个特征值的位置编码结果;其中,n为正整数,n≤N;
    根据第1个特征值的位置编码结果至第N个特征值的位置编码结果,确定待使用位置特征。
  8. 根据权利要求7所述的方法,其特征在于,所述根据所述待使用文本特征中第n个特征值的位置索引、所述译文长度描述数据、以及所述第n个特征值的维度索引,确定所述第n个特征值的位置编码结果,包括:
    根据译文期望长度与所述待使用文本特征中第n个特征值的位置索引之间的差值、以及所述第n个特征值的维度索引,确定所述第n个特征值的位置编码结果;其中,所述译文期望长度是根据所述译文长度描述数据确定的。
  9. 根据权利要求7所述的方法,其特征在于,所述根据所述待使用文本特征中第n个特征值的位置索引、所述译文长度描述数据、以及所述第n个特征值的维度索引,确定所述第n个特征值的位置编码结果,包括:
    根据所述待使用文本特征中第n个特征值的位置索引与译文期望长度之间的比值、以及所述第n个特征值的维度索引,确定所述第n个特征值的位置编码结果;其中,所述译文期望长度是根据所述译文长度描述数据确定的。
  10. 根据权利要求2所述的方法,其特征在于,所述根据所述特征编码结果和所述解码器,确定所述待使用精简译文,包括:
    根据所述特征编码结果、所述译文长度描述数据、以及所述解码器,确定所述待使用精简译文;其中,所述解码器用于参考所述译文长度描述数据,对所述特征编码结果进行解码处理。
  11. 根据权利要求10所述的方法,其特征在于,所述解码器包括至少一个第一解码层;其中,所述第一解码层包括第一解码模块、信息融合模块和第一归一化模块;所述第一归一化模块的输入数据包括所述第一解码模块的输出数据和所述信息融合模块的输出数据;所述信息融合模块用于将所述信息融合模块的输入数据与译文期望长度进行相乘处理;其 中,所述译文期望长度是根据所述译文长度描述数据确定的。
  12. 根据权利要求11所述的方法,其特征在于,所述解码器还包括至少一个第二解码层;
    所述第二解码层包括第二解码模块和第二归一化模块;其中,所述第二归一化模块的输入数据包括所述第二解码模块的输出数据。
  13. 根据权利要求12所述的方法,其特征在于,所述解码器包括1个第一解码层和J个第二解码层;其中,所述第1个第二解码层的输入数据包括所述第一解码层的输出数据;所述第j个第二解码层的输入数据包括所述第j-1个第二解码层的输出数据,j为正整数,2≤j≤J。
  14. 根据权利要求1所述的方法,其特征在于,所述根据所述主干源文本、译文长度描述数据、以及预先构建的压缩翻译模型,确定待使用精简译文,包括:
    根据所述主干源文本、所述译文长度描述数据、所述压缩翻译模型、以及至少一个历史遗留语义单元,确定待使用精简译文。
  15. 根据权利要求14所述的方法,其特征在于,所述方法还包括:
    按照译文期望长度,对所述待使用精简译文进行划分处理,得到待使用译文和待舍弃译文;其中,所述待使用译文的文本长度为所述译文期望长度;所述译文期望长度是根据所述译文长度描述数据确定的;
    根据所述待舍弃译文,更新所述至少一个历史遗留语义单元。
  16. 根据权利要求14或者15所述的方法,其特征在于,所述历史遗留语义单元的个数为K;其中,K为正整数;所述待使用精简译文中语义单元个数≥K;
    所述待使用精简译文中第k个语义单元的确定过程,包括:
    根据所述主干源文本、译文长度描述数据、以及所述压缩翻译模型,确定第k状态下模型预测概率;其中,k为正整数,k≤K;
    根据所述第k状态下模型预测概率与第k个历史遗留语义单元的对象预测概率,确定惩罚因子值;
    将所述第k状态下模型预测概率和与所述惩罚因子值进行加权求和处理,得到第k状态下预测校正概率;
    根据所述第k状态下预测校正概率,确定所述第k个语义单元。
  17. 根据权利要求1所述的方法,其特征在于,所述从所述待处理源文本中抽取主干源文本,包括:
    对所述待处理源文本进行依存句法分析处理,得到依存句法分析结果;
    对所述待处理源文本进行词性标注处理,得到词性标注结果;
    根据所述依存句法分析结果和所述词性标注结果,确定词汇重要性表征数据;
    根据所述词汇重要性表征数据和所述待处理源文本,确定所述主干源文本。
  18. 根据权利要求17所述的方法,其特征在于,所述词汇重要性表征数据包括待使用多叉树;
    所述根据所述词汇重要性表征数据和所述待处理源文本,确定所述主干源文本,包括:
    根据所述待使用多叉树,确定待删除节点;
    根据所述待删除节点对应的删除后文本长度与所述待处理源文本的文本长度,确定所述待删除节点的删除识别结果;
    若所述待删除节点的删除识别结果满足预设删除条件,则将所述待删除节点从所述待使用多叉树中删除,并继续执行所述根据所述待使用多叉树,确定待删除节点的步骤;
    若所述待删除节点的删除识别结果不满足预设删除条件,则继续执行所述根据所述待使用多叉树,确定待删除节点的步骤;
    直至在达到预设停止条件时,根据所述待使用多叉树和所述待处理源文本,确定所述主干源文本。
  19. 根据权利要求18所述的方法,其特征在于,所述待删除节点的删除识别结果的确定过程,包括:
    从所述待使用多叉树中预删除所述待删除节点,得到预删除后多叉树;
    根据所述预删除后多叉树和所述待处理源文本,确定所述待删除节点对应的删除后文本长度;
    确定所述删除后文本长度与所述待处理源文本的文本长度之间的长度比值;
    将所述长度比值与预设比值阈值进行比较,得到待使用比较结果;
    根据所述待使用比较结果,确定所述待删除节点的删除识别结果。
  20. 根据权利要求1所述的方法,其特征在于,所述压缩翻译模型的构建过程,包括:
    获取至少一个样本原文和所述至少一个样本原文对应的实际译文;
    根据各所述样本原文对应的实际译文的文本长度,确定各所述样本原文对应的译文长度描述数据;
    根据所述至少一个样本原文、所述至少一个样本原文对应的译文长度描述数据、以及所述至少一个样本原文对应的实际译文,构建所述压缩翻译模型。
  21. 一种翻译装置,其特征在于,包括:
    文本获取单元,用于获取待处理源文本;
    主干抽取单元,用于从所述待处理源文本中抽取主干源文本;
    压缩翻译单元,用于根据所述主干源文本、译文长度描述数据、以及预先构建的压缩翻译模型,确定待使用精简译文;其中,所述压缩翻译模型用于参考所述译文长度描述数据,对所述主干源文本进行压缩翻译。
  22. 一种设备,其特征在于,所述设备包括:处理器、存储器、系统总线;
    所述处理器以及所述存储器通过所述系统总线相连;
    所述存储器用于存储一个或多个程序,所述一个或多个程序包括指令,所述指令当被所述处理器执行时使所述处理器执行权利要求1至20任一项所述的方法。
  23. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有指令,当所述指令在终端设备上运行时,使得所述终端设备执行权利要求1至20任一项所述的方法。
  24. 一种计算机程序产品,其特征在于,所述计算机程序产品在终端设备上运行时,使得所述终端设备执行权利要求1至20任一项所述的方法。
PCT/CN2022/088961 2021-12-23 2022-04-25 一种翻译方法及其相关设备 WO2023115770A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111592412.5 2021-12-23
CN202111592412.5A CN114254657B (zh) 2021-12-23 2021-12-23 一种翻译方法及其相关设备

Publications (1)

Publication Number Publication Date
WO2023115770A1 true WO2023115770A1 (zh) 2023-06-29

Family

ID=80794781

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/088961 WO2023115770A1 (zh) 2021-12-23 2022-04-25 一种翻译方法及其相关设备

Country Status (2)

Country Link
CN (1) CN114254657B (zh)
WO (1) WO2023115770A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114254657B (zh) * 2021-12-23 2023-05-30 中国科学技术大学 一种翻译方法及其相关设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090240486A1 (en) * 2008-03-24 2009-09-24 Microsof Corporation Hmm alignment for combining translation systems
CN109271643A (zh) * 2018-08-08 2019-01-25 北京捷通华声科技股份有限公司 一种翻译模型的训练方法、翻译方法和装置
CN111079449A (zh) * 2019-12-19 2020-04-28 北京百度网讯科技有限公司 平行语料数据的获取方法、装置、电子设备和存储介质
CN113051935A (zh) * 2019-12-26 2021-06-29 Tcl集团股份有限公司 智能翻译方法、装置、终端设备及计算机可读存储介质
CN114254657A (zh) * 2021-12-23 2022-03-29 科大讯飞股份有限公司 一种翻译方法及其相关设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090240486A1 (en) * 2008-03-24 2009-09-24 Microsof Corporation Hmm alignment for combining translation systems
CN109271643A (zh) * 2018-08-08 2019-01-25 北京捷通华声科技股份有限公司 一种翻译模型的训练方法、翻译方法和装置
CN111079449A (zh) * 2019-12-19 2020-04-28 北京百度网讯科技有限公司 平行语料数据的获取方法、装置、电子设备和存储介质
CN113051935A (zh) * 2019-12-26 2021-06-29 Tcl集团股份有限公司 智能翻译方法、装置、终端设备及计算机可读存储介质
CN114254657A (zh) * 2021-12-23 2022-03-29 科大讯飞股份有限公司 一种翻译方法及其相关设备

Also Published As

Publication number Publication date
CN114254657B (zh) 2023-05-30
CN114254657A (zh) 2022-03-29

Similar Documents

Publication Publication Date Title
CN110348016B (zh) 基于句子关联注意力机制的文本摘要生成方法
CN109344391B (zh) 基于神经网络的多特征融合中文新闻文本摘要生成方法
CN106502985B (zh) 一种用于生成标题的神经网络建模方法及装置
WO2022198868A1 (zh) 开放式实体关系的抽取方法、装置、设备及存储介质
US20120323554A1 (en) Systems and methods for tuning parameters in statistical machine translation
CN111339765B (zh) 文本质量评估方法、文本推荐方法及装置、介质及设备
CN110688862A (zh) 一种基于迁移学习的蒙汉互译方法
CN111666764B (zh) 一种基于XLNet的自动摘要方法与装置
CN114880461A (zh) 一种结合对比学习和预训练技术的中文新闻文本摘要方法
WO2023134083A1 (zh) 基于文本的情感分类方法和装置、计算机设备、存储介质
CN113128431B (zh) 视频片段检索方法、装置、介质与电子设备
CN114969304A (zh) 基于要素图注意力的案件舆情多文档生成式摘要方法
CN111813923A (zh) 文本摘要方法、电子设备及存储介质
CN117609421A (zh) 基于大语言模型的电力专业知识智能问答系统构建方法
WO2023115770A1 (zh) 一种翻译方法及其相关设备
CN112765983A (zh) 一种基于结合知识描述的神经网络的实体消歧的方法
CN115309915A (zh) 知识图谱构建方法、装置、设备和存储介质
CN116468009A (zh) 文章生成方法、装置、电子设备和存储介质
CN111859950A (zh) 一种自动化生成讲稿的方法
CN112765201A (zh) 一种sql语句解析为特定领域查询语句的方法及装置
CN117251562A (zh) 一种基于事实一致性增强的文本摘要生成方法
CN116204622A (zh) 一种跨语言稠密检索中的查询表示增强方法
CN116089601A (zh) 对话摘要生成方法、装置、设备及介质
CN114328910A (zh) 文本聚类方法以及相关装置
CN114328924A (zh) 一种基于预训练模型结合句法子树的关系分类方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22909107

Country of ref document: EP

Kind code of ref document: A1