CN114254657B - Translation method and related equipment thereof - Google Patents

Translation method and related equipment thereof Download PDF

Info

Publication number
CN114254657B
CN114254657B CN202111592412.5A CN202111592412A CN114254657B CN 114254657 B CN114254657 B CN 114254657B CN 202111592412 A CN202111592412 A CN 202111592412A CN 114254657 B CN114254657 B CN 114254657B
Authority
CN
China
Prior art keywords
translation
text
length
determining
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111592412.5A
Other languages
Chinese (zh)
Other versions
CN114254657A (en
Inventor
林超
刘微微
刘聪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
iFlytek Co Ltd
Original Assignee
University of Science and Technology of China USTC
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC, iFlytek Co Ltd filed Critical University of Science and Technology of China USTC
Priority to CN202111592412.5A priority Critical patent/CN114254657B/en
Publication of CN114254657A publication Critical patent/CN114254657A/en
Priority to PCT/CN2022/088961 priority patent/WO2023115770A1/en
Application granted granted Critical
Publication of CN114254657B publication Critical patent/CN114254657B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The application discloses a translation method and related equipment, wherein the method comprises the following steps: after the source text to be processed is acquired, extracting a main source text from the source text to be processed, so that the main source text is used for representing core main information in the source text to be processed; and determining the to-be-used reduced translation according to the main source text, the translation length description data and the pre-constructed compression translation model, so that the to-be-used reduced translation can represent semantic information carried by the to-be-processed source text with fewer translation characters, the phenomenon that the number of end words of the translation is more than that of the source text can be effectively avoided, the length of the translation text can be shortened on the premise that the core meaning is not lost, further the translation delay is reduced, and the translation effect is improved.

Description

Translation method and related equipment thereof
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to a translation method and related devices.
Background
A speech co-transmission translation scenario is one that lacks context information, is local, instantaneous, and requires consideration of at least two language characteristics.
The real-time requirement of the voice simultaneous transmission translation scene is generally higher, but under the same semantic expression, the condition that the number of words at the translation end is more than that of words at the source end is easy to occur, so that the actual translation speed is easy to fail to meet the real-time translation speed requirement, thereby causing delay accumulation phenomenon and further causing poor translation effect.
Disclosure of Invention
The embodiment of the application mainly aims to provide a translation method and related equipment, which can improve the translation effect.
The embodiment of the application provides a translation method, which comprises the following steps: acquiring a source text to be processed; extracting a main source text from the source text to be processed; determining a to-be-used simplified translation according to the backbone source text, the translation length description data and a pre-constructed compression translation model; the compression translation model is used for compressing and translating the trunk source text by referring to the translation length description data.
The embodiment of the application also provides a translation device, which comprises: the text acquisition unit is used for acquiring a source text to be processed; the trunk extraction unit is used for extracting trunk source texts from the source texts to be processed; the compression translation unit is used for determining a reduced translation to be used according to the trunk source text, the translation length description data and a pre-constructed compression translation model; the compression translation model is used for compressing and translating the trunk source text by referring to the translation length description data.
The embodiment of the application also provides equipment, which is characterized in that the equipment comprises: a processor, memory, system bus; the processor and the memory are connected through the system bus; the memory is used to store one or more programs, the one or more programs comprising instructions, which when executed by the processor, cause the processor to perform any of the implementations of the translation methods provided by the embodiments of the present application.
The embodiment of the application also provides a computer readable storage medium, which is characterized in that the computer readable storage medium stores instructions, and when the instructions are run on a terminal device, the terminal device is caused to execute any implementation mode of the translation method provided by the embodiment of the application.
The embodiment of the application also provides a computer program product, which when being run on a terminal device, causes the terminal device to execute any implementation mode of the translation method provided by the embodiment of the application.
Based on the technical scheme, the application has the following beneficial effects:
in the technical scheme provided by the application, after a source text to be processed (for example, a voice recognition text of a current voice segment in a concurrent voice stream) is acquired, a main source text is extracted from the source text to be processed, so that the main source text is used for representing core main information in the source text to be processed; and determining the to-be-used reduced translation according to the main source text, the translation length description data and the pre-constructed compression translation model, so that the to-be-used reduced translation can represent semantic information carried by the to-be-processed source text by fewer translation characters, the phenomenon that the number of end words of the translation is more than that of the source text can be effectively avoided, the length of the translation text can be shortened on the premise that the core meaning is not lost, the translation delay can be effectively reduced, the translation instantaneity can be improved, and the translation effect can be improved.
In addition, the main source text is obtained by main extraction of the source text to be processed, so that the text length of the main source text is smaller than that of the source text to be processed, and the aim of simplifying text data in a source language is fulfilled; the reduced translation to be used is obtained by compression translation of the main source text by the compression translation model, so that the reduced translation to be used can represent semantic information carried by the source text to be processed by fewer translation characters, and the aim of simplifying text data under a target language is fulfilled. Therefore, the embodiment of the application realizes the compression translation of the source text to be processed in a simplified mode of two ends (namely, the source language end and the target language end), so that the compression translation result of the source text to be processed can be ensured to represent semantic information carried by the source text to be processed by fewer translation characters as much as possible, the phenomenon that the number of words at the translation end is more than that at the source text end can be effectively avoided, the length of the translation text can be shortened on the premise that the core meaning is not lost, the translation delay can be effectively reduced, the translation instantaneity can be improved, and the translation effect can be improved.
In addition, the reduced translation to be used is obtained by compression translation by referring to the translation length description data through the compression translation model, so that the text length of the reduced translation to be used is almost close to, even equal to, the expected translation length represented by the translation length description data, the text length of the reduced translation to be used is controllable, the fact that the reduced translation to be used can express semantic information carried by the source text to be processed in a word number which is reasonable can be effectively guaranteed, adverse effects caused by uncontrollable word numbers of the translation can be effectively avoided, and further the translation effect can be improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a translation method according to an embodiment of the present application;
Fig. 2 is a schematic diagram of a concurrent voice stream according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a dependency syntax tree provided by an embodiment of the present application;
FIG. 4 is a schematic diagram of a compressed translation model according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a compression translation process according to an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of a translation device according to an embodiment of the present application.
Detailed Description
The inventor finds that the speech recognition text in the source language can be translated to obtain the translation text in the target speech in the research of the speech simultaneous transmission translation scene. However, since the word number of the translated text is more likely than that of the speech recognition text, the actual translation speed cannot meet the real-time translation speed requirement, and thus a delay accumulation phenomenon is caused.
Based on the above findings, in order to solve the technical problems shown in the background section, an embodiment of the present application provides a translation method, which includes: after obtaining a source text to be processed (for example, a voice recognition text of a current voice segment in a concurrent voice stream), extracting a main source text from the source text to be processed, so that the main source text is used for representing core main information in the source text to be processed; and determining the to-be-used reduced translation according to the main source text, the translation length description data and the pre-constructed compression translation model, so that the to-be-used reduced translation can represent semantic information carried by the to-be-processed source text by fewer translation characters, the phenomenon that the number of end words of the translation is more than that of the source text can be effectively avoided, the length of the translation text can be shortened on the premise that the core meaning is not lost, the translation delay can be effectively reduced, the translation instantaneity can be improved, and the translation effect can be improved.
In addition, the main source text is obtained by main extraction of the source text to be processed, so that the text length of the main source text is smaller than that of the source text to be processed, and the aim of simplifying text data in a source language is fulfilled; the reduced translation to be used is obtained by compression translation of the main source text by the compression translation model, so that the reduced translation to be used can represent semantic information carried by the source text to be processed by fewer translation characters, and the aim of simplifying text data under a target language is fulfilled. Therefore, the embodiment of the application realizes the compression translation of the source text to be processed in a simplified mode of two ends (namely, the source language end and the target language end), so that the compression translation result of the source text to be processed can be ensured to represent semantic information carried by the source text to be processed by fewer translation characters as much as possible, the phenomenon that the number of words at the translation end is more than that at the source text end can be effectively avoided, the length of the translation text can be shortened on the premise that the core meaning is not lost, the translation delay can be effectively reduced, the translation instantaneity can be improved, and the translation effect can be improved.
In addition, the reduced translation to be used is obtained by compression translation by referring to the translation length description data through the compression translation model, so that the text length of the reduced translation to be used is almost close to, even equal to, the expected translation length represented by the translation length description data, the text length of the reduced translation to be used is controllable, semantic information carried by the source text to be processed can be effectively guaranteed to be represented by a reasonable number of words of the reduced translation to be used, adverse effects caused by uncontrollable number of words of the translation can be effectively avoided, the length of the translation text can be shortened on the premise that core meaning is not lost, translation delay can be effectively reduced, translation instantaneity can be improved, and translation effect can be improved.
Further, the embodiment of the present application does not limit the execution subject of the translation method, and for example, the translation method provided in the embodiment of the present application may be applied to a data processing device such as a terminal device or a server. The terminal device may be a smart phone, a computer, a personal digital assistant (Personal Digital Assitant, PDA), a tablet computer, or the like. The servers may be stand alone servers, clustered servers, or cloud servers.
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
Method embodiment one
Referring to fig. 1, a flowchart of a translation method according to an embodiment of the present application is shown.
The translation method provided by the embodiment of the application comprises the following steps of S1-S3:
s1: and acquiring the source text to be processed.
The above-mentioned "source text to be processed" refers to text data in a source language; and the "pending source text" needs to be translated into text content in the target language. For example, for a Chinese-English translation scenario, the source text to be processed refers to Chinese text data; and the source text to be processed needs to be translated into english text data.
In addition, the embodiment of the application is not limited to the "source text to be processed" described above, and for example, for the concurrent translation scene, the "source text to be processed" may include a sentence in order to improve the real-time performance.
In addition, the embodiment of the present application is not limited to the implementation of S1, and for example, it may specifically be: current text collected in real-time from a text stream. For another example, for the concurrent translation scenario, in order to improve translation real-time performance, S1 may specifically include: after the current voice segment is obtained, voice recognition processing is carried out on the current voice segment, and the source text to be processed is obtained. (hereinafter, description will be made by taking the simultaneous translation scenario as an example)
The "current speech segment" is used to represent a speech segment that is collected in real-time from a speech stream (e.g., a speech stream in a concurrent translation scenario). For example, as shown in fig. 2, when a "third speech segment" is collected from the speech stream shown in fig. 2, the third speech segment may be determined as a current speech segment, so that the translation result of the third speech segment can be determined in time by means of a compressed translation process for the current speech segment.
It should be noted that the embodiments of the present application are not limited to the foregoing implementation of the "speech recognition processing", and may be implemented by any speech recognition method that exists in the present or future.
It should be further noted that the embodiment of the present application is not limited to the above-mentioned collection frequency of the "voice clip", and may be set according to an application scenario, for example. For another example, the collection frequency of the "speech segment" may be set according to the sentence length in the source language, so that the "current speech segment" includes a sentence.
Based on the above-mentioned related content of S1, for a voice stream input in real time in a concurrent translation scenario, after a current voice segment is collected in real time from the voice stream, voice recognition processing may be performed on the current voice segment to obtain a source text to be processed, so that real-time translation processing for the current voice segment can be implemented based on the source text to be processed.
S2: and extracting the main source text from the source text to be processed.
The "extraction backbone source text" is used to represent core backbone information in the source text to be processed. For example, when the source text to be processed is "the rapid development of artificial intelligence brings about opportunities for various countries", the extraction of the main source text may be "the development of artificial intelligence brings about opportunities for various countries".
The embodiment of the present application is not limited to the implementation of S2, and may be implemented by any method that can perform trunk extraction for one text data (for example, a sentence pattern simplification method) existing in the present or occurring in the future, for example. As another example, to enhance the extraction of the backbone source text, one may employMethod embodiment IIAny of the possible embodiments of S2 shown are implemented.
S3: and determining the reduced translation to be used according to the backbone source text, the translation length description data and the pre-constructed compression translation model.
The "translation length description data" is used for describing the text length of the translation result of the source text to be processed; further, embodiments of the present application are not limited to this "translation length description data," which may include, for example, at least one of a translation desired length, a translation source desired length ratio.
The above-described "text length" is used to describe the length of one text data; moreover, embodiments of the present application are not limited to this "text length" representation, and may be represented, for example, by the number of semantic units (e.g., words and/or the number of words) in a text datum.
It should be noted that, the "semantic unit" refers to a unit of semantic representation in one language; moreover, embodiments of the present application are not limited to the "semantic unit," which may be, for example, a word, or a character (e.g., a semantic unit in chinese may be a chinese character or a word, etc.).
The "expected translation length" refers to the length of the text that the user wants the translation result of the source text to be processed to have; moreover, embodiments of the present application are not limited to this "expected length of translation," which may be close to the text length of the source text to be processed, for example, in order to avoid the phenomenon that the number of end words of the translation is greater than the number of end words of the source text as much as possible. For example, if the text length of the source text to be processed is 6 words, the "desired length of translation" may be 6 words.
In addition, the embodiment of the present application is not limited to the above-mentioned method for obtaining the "expected length of translation", and may be preset, for example. As another example, the determination may be made based on a setup operation triggered by the user for the "desired length of translation". For another example, a length statistical analysis may be performed on a large number of sentences in the target language to obtain the "expected length of translation".
The "translation source expected length ratio" refers to a ratio between a text length of a translation result of a source text to be processed, which the user wants to have, and the text length of the source text to be processed; furthermore, the embodiment of the present application is not limited to the "translation source expected length ratio", and for example, in order to avoid the phenomenon that the number of translation end words is greater than that of source end words as much as possible, the "translation source expected length ratio" may take a value relatively close to 1 (for example, the "translation source expected length ratio" may be 1 or 0.8).
The embodiment of the present application is not limited to the above-described method of acquiring the "translation source expected length ratio", and may be set in advance, for example. As another example, the determination may be made based on a setup operation triggered by the user for the "translation source desired length ratio". For another example, the length ratio statistical analysis can be performed on a large number of sentence pairs in the target language and the source language to obtain the 'translated source expected length ratio'.
The compression translation model is used for performing compression translation processing with controllable translation length on input data of the compression translation model. For example, the above-mentioned "compression translation model" can refer to the translation length description data to perform compression translation on the trunk source text, so that the compression translation result for the trunk source text can reach the expected translation length represented by the translation length description data as far as possible, so as to implement the controllable compression translation processing of the translation length.
In addition, embodiments of the present application are not limited to the "compressed translation model" described above, which may be a machine learning model, for example. As another example, to enhance the compression translation effect, one can employMethod example IIIAny of the possible implementations of the compressed translation model shown are implemented.
Based on the above-mentioned related content of S1 to S3, after obtaining the source text to be processed (for example, the speech recognition text of the current speech segment in the concurrent speech stream), extracting the trunk source text from the source text to be processed, so that the trunk source text is used for representing the core trunk information in the source text to be processed; and determining the to-be-used reduced translation according to the main source text, the translation length description data and the pre-constructed compression translation model, so that the to-be-used reduced translation can represent semantic information carried by the to-be-processed source text by fewer translation characters, the phenomenon that the number of end words of the translation is more than that of the source text can be effectively avoided, the length of the translation text can be shortened on the premise that the core meaning is not lost, the translation delay can be effectively reduced, the translation instantaneity can be improved, and the translation effect can be improved.
In addition, the main source text is obtained by main extraction of the source text to be processed, so that the text length of the main source text is smaller than that of the source text to be processed, and the aim of simplifying text data in a source language is fulfilled; the reduced translation to be used is obtained by compression translation of the main source text by the compression translation model, so that the reduced translation to be used can represent semantic information carried by the source text to be processed by fewer translation characters, and the aim of simplifying text data under a target language is fulfilled. Therefore, the embodiment of the application realizes the compression translation of the source text to be processed in a simplified mode of two ends (namely, the source language end and the target language end), so that the compression translation result of the source text to be processed can be ensured to represent semantic information carried by the source text to be processed by fewer translation characters as much as possible, the phenomenon that the number of words at the translation end is more than that at the source text end can be effectively avoided, the length of the translation text can be shortened on the premise that the core meaning is not lost, the translation delay can be effectively reduced, the translation instantaneity can be improved, and the translation effect can be improved.
In addition, the reduced translation to be used is obtained by compression translation by referring to the translation length description data through the compression translation model, so that the text length of the reduced translation to be used is almost close to, even equal to, the expected translation length represented by the translation length description data, the text length of the reduced translation to be used is controllable, semantic information carried by the source text to be processed can be effectively guaranteed to be represented by a reasonable number of words of the reduced translation to be used, adverse effects caused by uncontrollable number of words of the translation can be effectively avoided, the length of the translation text can be shortened on the premise that core meaning is not lost, translation delay can be effectively reduced, translation instantaneity can be improved, and translation effect can be improved.
Method embodiment II
In order to improve the filtering effect of unimportant words in text data in a source language, the text data can be filtered by means of a dependency syntax analysis technology and a part-of-speech tagging technology. Based on this, the present application embodiment also provides a possible implementation manner of the above S2, which may specifically include S21-S24:
S21: and performing dependency syntax analysis processing on the source text to be processed to obtain a dependency syntax analysis result.
The dependency syntax analysis process is used for identifying the directed dependency relationship between different words in one text data; moreover, the embodiments of the present application are not limited to the implementation of the "dependency syntax analysis process" described above, and may be implemented using any dependency syntax analysis technique that exists in the present or future.
The dependency syntax theory based on the "dependency syntax analysis processing" is specifically: the words are considered to have a master-slave relationship, which is a binary non-equivalent relationship. In a sentence, if one word modifies another word, the modified word is called a dependent word (dependency), the modified word is called a dominant word (head), and the grammatical relationship between the two is called a dependency relationship (dependency relation). For example, as shown in fig. 3, for the sentence "fast development of artificial intelligence lays a very solid foundation for companies," the "development" is modified by the "fast" so that the "fast" and the "development" have the directional dependency relationship shown in fig. 3.
The dependency syntax analysis result is used for representing the directed dependency relationship among different vocabularies in the source text to be processed; further, the embodiment of the present application is not limited to the expression of the "dependency syntax analysis result", and for example, the expression may be performed by means of a dependency syntax tree (as shown in fig. 3).
It should be noted that, for the dependency syntax tree shown in fig. 3, the dependency syntax tree is a multi-way tree. The root node of the dependency syntax tree lays a 'foundation' for a core word in a sentence, and all child nodes of the dependency syntax tree are words or components governed by a parent node. In fig. 3 "Root" is used to mark "lay" as the Root node.
S22: and performing part-of-speech tagging on the source text to be processed to obtain a part-of-speech tagging result.
The part-of-speech tagging is used for identifying and tagging parts of speech of each vocabulary in text data; the embodiment of the application is not limited to the part-of-speech tagging, and can be implemented by any part-of-speech tagging technology existing in the prior art or appearing in the future.
The part-of-speech tagging result is used to represent the part of speech of each word in the source text to be processed.
S23: and determining the vocabulary importance characterization data according to the dependency syntactic analysis result and the part-of-speech tagging result.
The vocabulary importance characterization data are used for representing the importance degree of each vocabulary in the source text to be processed; moreover, embodiments of the present application are not limited to this "lexical importance characterization data," which may include, for example, the multi-way tree to be used.
The "multi-tree to be used" is a multi-tree structure; the multi-way tree to be used can not only show the importance degree of each word in the source text to be processed, but also show the directional dependency relationship between different words in the source text to be processed (and the part of speech of each word in the source text to be processed).
In addition, the embodiment of the present application is not limited to the above-mentioned method for representing importance of the "multi-tree to be used", for example, when the distribution position of the root node is higher than the distribution positions of other non-root nodes and the distribution position of the parent node is higher than the distribution position of the child node under the parent node in the "multi-tree to be used" described above, the "multi-tree to be used" includes multiple layers of nodes, and the method for representing importance of the "multi-tree to be used" may specifically be: for any two nodes at different layers, the importance of the node at the higher layer is higher than that of the node at the lower layer; further, for any two nodes located at the same layer, the importance of the node located to the left is higher than that of the node located to the right. It can be seen that, in order to achieve the importance level distribution effect, in the process of constructing the "to-be-used multi-tree", the plurality of subtrees belonging to the same parent node may be ordered from left to right in order of importance from high to low.
In addition, the embodiment of the present application is not limited to the above-described determination process of "to use multi-tree", and for example, when the above-described "dependency syntax analysis result" is represented by a dependency syntax tree, the determination process of "to use multi-tree" may specifically include: firstly, referring to a preset importance analysis rule and a part-of-speech tagging result, carrying out importance analysis processing on each sub-tree in the dependency syntax tree to obtain the importance degree of each sub-tree in the dependency syntax tree; and then, according to the importance degree of each subtree in the dependency syntax tree, the distribution position of each subtree in the dependency syntax tree is adjusted so that the arrangement sequence from left to right of all subtrees with the same father node in the adjusted dependency syntax tree is presented according to the arrangement sequence from high to low of the importance degree of the highest-level node in each subtree.
The "importance analysis rule" may be preset, and the embodiment of the present application is not limited to the "importance analysis rule", and may include, for example: (1) The importance of a parent node is higher than the importance of the individual child nodes under that parent node. (2) For a plurality of child nodes under a parent node, the importance of a first part of speech (e.g., noun) is higher than the importance of a second part of speech (e.g., verb), which is higher than the importance of a third part of speech (e.g., adjective), … ….
S24: and determining a main source text according to the vocabulary importance characterization data and the source text to be processed.
As an example, when the above-described "vocabulary importance characterizing data" includes a multi-way tree to be used, S24 may specifically include S241-S247:
s241: and determining the nodes to be deleted according to the multi-way tree to be used.
The "node to be deleted" refers to a node that needs to be determined whether to delete from the multi-tree to be used.
In addition, the embodiment of the present application is not limited to the "node to be deleted" described above, and for example, the "node to be deleted" may include one leaf node in the multi-tree to be used, and may also include one sub-tree (i.e., one parent node and all nodes below the parent node) in the multi-tree to be used. It can be seen that the "node to be deleted" may include at least one node in the multi-way tree to be used. Note that, the "leaf node" refers to a node that has no bifurcation; the above "parent node" refers to a node having a bifurcation.
In addition, the embodiment of the present application is not limited to the above determination process of the "node to be deleted", and for example, it may specifically be: selecting the node with the lowest importance degree from all nodes which are not traversed in the multi-way tree to be used; if the node with the lowest importance degree is a leaf node, determining the node with the lowest importance degree as a node to be deleted; if the node with the lowest importance degree is a father node, the node with the lowest importance degree and all the nodes below the node are determined as the nodes to be deleted.
It can be seen that, for the "multi-tree to be used" described above, the traversal can be performed in a bottom-up and right-to-left manner, and the currently traversed node (and all nodes below the currently traversed node) is determined as the node to be deleted, so that it can be subsequently determined whether to delete the currently traversed node (and all nodes below the currently traversed node) from the multi-tree to be used. Where "currently traversed node" refers to a node (e.g., leaf node or parent node) to which the current round is traversed in the multi-way tree to be used.
S242: and determining a deletion identification result of the node to be deleted according to the text length after deletion corresponding to the node to be deleted and the text length of the source text to be processed.
The "text length after deletion corresponding to the node to be deleted" refers to the text length of the vocabulary represented by all the remaining nodes after deleting the node to be deleted from the multi-tree to be used.
The "deletion recognition result of the node to be deleted" is used to indicate whether the node to be deleted is deleted from the source text to be processed.
In addition, the embodiment of the present application is not limited to the above-mentioned determination process of the "deletion identification result of the node to be deleted", and for example, it may specifically include steps 11 to 15:
Step 11: and pre-deleting the nodes to be deleted from the multi-way tree to be used to obtain a pre-deleted multi-way tree.
The above-described "pre-deleted multi-tree" is used to denote a multi-tree to be used that does not include nodes to be deleted, such that the "pre-deleted multi-tree" includes all remaining nodes after the nodes to be deleted are deleted from the multi-tree to be used. It should be noted that, the "pre-deletion" is a deletion demonstration action; and the "pre-delete" does not change the multi-way tree to be used.
Step 12: and determining the length of the deleted text corresponding to the node to be deleted according to the pre-deleted multi-tree and the source text to be processed.
In this embodiment of the present application, after the pre-deleted multi-tree is obtained, a pre-deleted text may be determined according to the pre-deleted multi-tree and a source text to be processed, so that the pre-deleted text only includes semantic units represented by all nodes in the pre-deleted multi-tree; and determining the text length of the pre-deleted text as the deleted text length corresponding to the node to be deleted.
Step 13: a length ratio between the deleted text length and the text length of the source text to be processed is determined.
Step 14: and comparing the length ratio with a preset ratio threshold value to obtain a comparison result to be used.
The "preset ratio threshold" may be preset, or may be obtained by mining a large number of sentence contrast values in the target language and the source language.
The "comparison result to be used" is used to indicate the relative size between the "length ratio between the length of the deleted text and the length of the text of the source text to be processed" and the "preset ratio threshold".
Step 15: and determining a deletion identification result of the node to be deleted according to the comparison result to be used.
In this embodiment of the present application, after the comparison result to be used is obtained, if the comparison result to be used indicates that the length ratio between the length of the deleted text and the length of the text of the source text to be processed is "higher than the" preset ratio threshold ", it may be determined that too much character information is not deleted when the node to be deleted is deleted from the multi-tree to be used, so that it may be determined that the node to be deleted may be deleted from the multi-tree to be used, and therefore a deleted flag (e.g.," 1 ") may be determined as a deletion identification result of the node to be deleted; however, if the comparison result to be used indicates that the length ratio between the length of the text after deletion and the length of the text of the source text to be processed is not higher than the preset ratio threshold, it may be determined that too many character information is likely to be deleted when deleting the node to be deleted from the multi-tree to be used, so to avoid deleting too much information, a non-deletion flag (e.g., "0") may be determined as a deletion identification result of the node to be deleted.
Based on the above-mentioned content related to S242, after the node to be deleted is obtained, a deletion identification result of the node to be deleted may be determined according to a ratio between the length of the deleted text corresponding to the node to be deleted and the length of the text of the source text to be processed, so that the deletion identification result may indicate whether the node to be deleted is deleted from the source text to be processed.
S243: judging whether the deleting identification result of the node to be deleted meets the preset deleting condition, if so, executing S244-S245; if not, then S245 is performed.
The "preset deletion condition" may be preset, and for example, it may specifically include: the "deletion identification result of the node to be deleted" described above indicates that the node to be deleted can be deleted (e.g., the "deletion identification result of the node to be deleted" described above includes a deleted flag).
After obtaining the deletion identification result of the node to be deleted, if the deletion identification result indicates that the node to be deleted can be deleted, the deletion identification result can be determined to meet the preset deletion condition, so that the node to be deleted can be directly deleted from the multi-tree to be used; if the deletion identification result indicates that the node to be deleted cannot be deleted, it can be determined that the deletion identification result does not meet a preset deletion condition, so that the node to be deleted can be reserved in the multi-tree to be used.
S244: and deleting the node to be deleted from the multi-tree to be used.
In this embodiment of the present application, after determining that the deletion identification result of the node to be deleted meets the preset deletion condition, the node to be deleted may be deleted from the multi-tree to be used, so as to obtain the multi-tree to be used after deletion, so that an update process for the multi-tree to be used can be implemented, so that a subsequent operation (for example, a next round of traversal process) can be performed based on the multi-tree to be used after deletion.
S245: judging whether a preset stopping condition is reached, if so, executing S246; if not, the process returns to S241.
The "preset stop condition" may be preset, for example, it may specifically be: all nodes in the multi-way tree to be used except the root node are traversed.
It can be seen that if it is determined that all nodes except the root node in the current multi-tree to be used have been traversed, it can be determined that the pruning process for the multi-tree to be used is completed, so that it can be determined that a preset stopping condition is reached, and therefore, according to the multi-tree to be used, a trunk source text can be extracted from a source text to be processed; however, if it is determined that there are still non-root nodes in the current multi-tree to be used that are not traversed, it may be determined that the pruning process for the multi-tree to be used is still completed, so that it may be determined that the preset stop condition is not reached, so that S241 and subsequent steps may be returned to be continuously performed to start the next round of traversal process for the multi-tree to be used.
S246: and determining the trunk source text according to the multi-tree to be used and the source text to be processed.
In the embodiment of the present application, when it is determined that a preset stopping condition is reached, it may be determined that a pruning process for the multi-tree to be used is completed, so that a main source text may be extracted from a source text to be processed according to the multi-tree to be used, so that the main source text only includes semantic units represented by all nodes in the multi-tree to be used, and therefore, the main source text may represent semantic information carried by the source text to be processed with fewer characters.
Based on the above-mentioned related content from S21 to S24, after the source text to be processed is obtained, the source text to be processed may be subjected to extraction processing of trunk information by means of dependency syntax analysis technology and part-of-speech tagging technology, so as to obtain a trunk source text, so that the trunk source text may represent semantic information carried by the source text to be processed in terms of as few words as possible, which is beneficial to improving the extraction effect of trunk information, thereby being beneficial to shortening the length of the translated text on the premise of not losing the core meaning, further effectively reducing the translation delay, and improving the translation real-time, thereby being beneficial to improving the translation effect.
Method example III
In order to better realize controllable translation length and ensure that the model performs compression translation under the condition of minimum information loss, the method can comprehensively integrate the expected length information from a translation model source end, a translation model target end and a translation model decoding stage so as to realize more natural and complete compression translation processing under the condition of controllable length.
Based on this, the present application example provides one possible implementation of the "compressed translation model" described above, which may include an encoder and a decoder (e.g., the compressed translation model shown in fig. 4).
To facilitate understanding of the above "compressed translation model", the following description will take the determination process of the above "reduced translation to be used" as an example.
As an example, when the above "compressed translation model" includes an encoder and a decoder, the above determination process of "reduced translation to be used" may specifically include steps 21-23:
step 21: and determining the characteristics to be encoded according to the backbone source text and the translation length description data.
The "feature to be encoded" mentioned above refers to a feature that needs to be subjected to encoding processing.
The embodiment of the present application is not limited to the implementation of step 21, and may specifically include steps 211 to 213, for example:
Step 211: and determining the text characteristics to be used according to the backbone source text.
The "text feature to be used" may be used to represent character information carried by the backbone source text.
The embodiment of the present application is not limited to the implementation of step 211, and may be implemented by any text feature extraction method (e.g., word2 Vec) that exists in the present or future, for example.
In addition, to further avoid forgetting the translation length description data during the whole model processing, another possible implementation manner of step 211 is provided in the embodiment of the present application, which may specifically include: according to the main source text and the translation length description data, determining the text feature to be used, so that the text feature to be used can not only show character information carried by the main source text, but also show the translation length information carried by the translation length description data.
In fact, because the translation lengths of different sentences in the source language are different, the ratios between the different sentences and the translations are inconsistent, so that the translation source length ratios of a large number of sentences are diversified, and the number of translation source length ratios which need to be learned in advance in the compression model is large, so that the construction time consumption of the compression model is large, and in order to further improve the construction efficiency of the compression model, a large number of translation source length ratios can be discretized into a plurality of ratio sections, so that the compression model only needs to be learned for the ratio sections. It should be noted that, for the construction process of the "compression model", please refer to the following related contents of steps 51-53.
Accordingly, the present embodiments also provide yet another possible implementation manner of step 211, which may specifically include steps 2111-2112:
step 2111: and determining a length proportion interval to be used according to the translation length description data.
In example 1, if the "translation length description data" includes the desired length of the translation, the step 2111 may specifically include: firstly, determining the ratio between the expected length of the translation and the text length of the source text to be processed as the expected length ratio of the translation source; and searching a candidate length proportion interval comprising the translation source expected length ratio from at least one candidate length proportion interval, and determining the candidate length proportion interval as a length proportion interval to be used so that the length proportion interval to be used comprises the translation source expected length ratio.
It should be noted that, the "at least one candidate length scale interval" refers to a scale interval that needs to be learned in the process of constructing the "compression model" above; and the relevant content of the "at least one candidate length scale interval" is referred to as the relevant content of step 53 below.
In example 2, if the "translation length description data" includes the translation source expected length ratio, the step 2111 may specifically include: searching a candidate length proportion interval comprising the translation source expected length ratio from at least one candidate length proportion interval, and determining the candidate length proportion interval as a length proportion interval to be used, so that the length proportion interval to be used comprises the translation source expected length ratio.
Step 2112: and determining the characteristics of the text to be used according to the length proportion interval to be used and the main source text.
To facilitate an understanding of step 2112, four possible implementations are described below.
In a first possible implementation manner, the step 2112 may specifically include the steps 31 to 32:
step 31: and splicing the length proportion interval to be used and the main source text to obtain a first text, so that the first text comprises the length proportion interval to be used and the main source text.
The embodiment of the present application is not limited to the implementation of step 31, and may specifically include: and adding the length proportion interval to be used to the head position of the main source text to obtain a first text, so that the first text can be expressed as { the length proportion interval to be used, the main source text }.
Step 32: and vectorizing the first text to obtain the text feature to be used.
It should be noted that the embodiments of the present application are not limited to the implementation of "vectorization", and may be implemented by any text vectorization method (Word 2 Vec) existing or appearing in the future, for example.
Based on the first possible implementation manner of the step 2112, after the length scale interval to be used and the main source text are obtained, the two data may be spliced and then vectorized, so as to obtain the text feature to be used.
In a second possible embodiment, step 2112 may include steps 41-43:
step 41: and searching the interval identifier corresponding to the length proportion interval to be used from the preset mapping relation to obtain the interval identifier to be used. The preset mapping relation comprises a corresponding relation between a length proportion interval to be used and an interval identifier to be used.
The 'preset mapping relation' is used for recording interval identifiers corresponding to the candidate length proportion intervals; moreover, the embodiment of the present application does not limit the "preset mapping relationship", and for example, it may specifically include: a correspondence between the 1 st candidate length scale section and the 1 st section identifier, a correspondence between the 2 nd candidate length scale section and the 2 nd section identifier, … … (and so on), and a correspondence between the Q-th candidate length scale section and the Q-th section identifier. Wherein Q is a positive integer, and Q represents the number of candidate length proportion intervals in the above-mentioned "at least one candidate length proportion interval".
It should be noted that, the above-mentioned construction process of the "preset mapping relationship" is referred to the following related content of step 53.
The "q-th section identifier" refers to a section identifier corresponding to the q-th candidate length scale section, such that the "q-th section identifier" is used to represent the q-th candidate length scale section; moreover, embodiments of the present application do not limit the relationship between the q-th candidate length scale interval and the q-th interval identifier, e.g., the q-th interval identifier (e.g., [0.8 ]) is determined according to a scale value in the q-th candidate length scale interval (e.g., 0.7-1.1). Wherein Q is a positive integer, and Q is less than or equal to Q.
After the length proportion interval to be used is obtained, the length proportion interval to be used can be matched with each candidate length proportion interval in the preset mapping relation; and determining the interval identifier corresponding to the successfully matched candidate length proportion interval as an interval identifier to be used, so that the interval identifier to be used can represent the length proportion interval to be used.
Step 42: and splicing the interval identifier to be used and the main source text to obtain a second text.
The embodiment of the present application is not limited to the implementation of step 42, and may specifically include: and adding the interval identifier to be used to the head position of the main source text to obtain a second text, so that the second text can be expressed as { the interval identifier to be used, the main source text }.
Step 43: and vectorizing the second text to obtain the text feature to be used.
Based on the second possible implementation manner of the step 2112, after the length scale interval to be used and the main source text are obtained, the interval identifier corresponding to the length scale interval to be used may be determined first; and then splicing and vectorizing the two data of the interval identifier and the main source text in sequence to obtain the text feature to be used.
In a third possible implementation manner, step 2112 may specifically include: firstly, carrying out vectorization processing on a main source text to obtain a text characterization vector, so that the text characterization vector can express character information carried by the main source text; and then splicing the length proportion interval to be used and the text characterization vector to obtain the text feature to be used.
It should be noted that, the embodiment of the present application is not limited to the above-mentioned "stitching" implementation, for example, the length scale section to be used may be added to the head position of the text token vector to obtain the text feature to be used, so that the 1 st feature value in the text feature to be used is the left boundary point of the "length scale section to be used" and the 2 nd feature value is the right boundary point of the "length scale section to be used" and other feature values are all derived from the "text token vector".
Based on the third possible implementation manner of the step 2112, after the length scale interval to be used and the main source text are obtained, the main source text may be vectorized, and then the length scale interval to be used and the vectorized result may be spliced to obtain the text feature to be used.
In a fourth possible implementation manner, step 2112 may specifically include: firstly, vectorizing a main source text to obtain a text characterization vector, and searching an interval identifier corresponding to a length proportion interval to be used from a preset mapping relation to obtain the interval identifier to be used; and then splicing the interval mark to be used with the text characterization vector to obtain the text feature to be used.
It should be noted that, the embodiment of the present application is not limited to the above-mentioned "stitching" implementation, for example, the interval identifier to be used may be added to the head position of the text token vector to obtain the text feature to be used, so that the 1 st feature value in the text feature to be used is the "interval identifier to be used", and other feature values are all from the "text token vector".
Based on the above-mentioned related content of step 211, after the main source text is obtained, the text feature to be used may be determined according to the main source text (and the translation length description data), so that the text feature to be used may represent the character information carried by the main source text (and the translation length information carried by the translation length description data), so that the encoding process may be performed on the text feature to be used.
Step 212: and determining the position feature to be used according to the text feature to be used and the translation length description data.
The "position feature to be used" is used to represent the character position information carried by the trunk source text and the translation length information carried by the translation length description data.
The embodiments of the present application are not limited to the implementation of step 212, and may be implemented, for example, by any location feature extraction method that exists in the present or future.
In addition, to further avoid forgetting the translation length description data during the entire model processing, another possible implementation of step 212 is provided in the embodiments of the present application, which is described below with reference to examples.
As an example, when the "text feature to be used" includes N feature values, step 212 may specifically include steps 2121-2122:
step 2121: and determining a position coding result of the nth feature value according to the position index of the nth feature value in the text feature to be used, the translation length description data and the dimension index of the nth feature value. Wherein N is a positive integer, N is less than or equal to N, and N is a positive integer.
The "position index of the nth feature value" described above is used to indicate the position where the nth feature value is located in the "text feature to be used" described above.
The "dimension index of the nth feature value" is used to indicate the position of the position coding result of the nth feature value in the "position feature to be used".
The present example is not limited to the implementation of step 2121, and for ease of understanding, two possible implementations are described below as examples.
In a first possible implementation, step 2121 may specifically include: and determining the position coding result of the nth feature value according to the difference value between the expected length of the translation and the position index of the nth feature value in the text feature to be used and the dimension index of the nth feature value (as shown in formulas (1) - (2)).
The "expected translation length" may be determined based on the translation length description data. For example, the "translation length description data" described above may include a desired length of the translation. For another example, if the "translation length description data" includes a translation source expected length ratio, the translation expected length is determined according to a product between the text length of the source text to be processed and the translation source expected length ratio.
Figure BDA0003429641680000131
Figure BDA0003429641680000132
Wherein if the "dimension index of the nth eigenvalue" is even, 2i represents the dimension index of the nth eigenvalue, and HLDPE (pos,len,2i) A position encoding result indicating the nth characteristic value; if the "dimension index of the nth eigenvalue" is odd, 2i+1 represents the dimension index of the nth eigenvalue, andHLDPE (pos,len,2i+1) a position encoding result indicating the nth characteristic value; pos represents the position index of the nth eigenvalue; len represents the desired length of the translation; d represents the dimension of the "location feature to be used" described above (i.e., the dimension of the "text feature to be used" described above).
It should be noted that the position encoding results determined based on the formulas (1) - (2) are beneficial to more precisely controlling the translation length.
In a second possible implementation, step 2121 may specifically include: and determining the position coding result of the nth feature value according to the ratio of the position index of the nth feature value in the text feature to be used to the expected length of the translation and the dimension index of the nth feature value (as shown in formulas (3) - (4)).
Figure BDA0003429641680000133
Figure BDA0003429641680000141
Wherein if the "dimension index of the nth eigenvalue" is even, 2i represents the dimension index of the nth eigenvalue, and HLDPE (pos,len,2i) A position encoding result indicating the nth characteristic value; if the "dimension index of nth eigenvalue" is odd, 2i+1 represents the dimension index of nth eigenvalue, and HLDPE (pos,len,2i+1) A position encoding result indicating the nth characteristic value; pos represents the position index of the nth eigenvalue; len represents the desired length of the translation; d represents the dimension of the "location feature to be used" described above (i.e., the dimension of the "text feature to be used" described above).
Based on the above-mentioned related content of step 2121, for the nth feature value in the text feature to be used, the position encoding result of the nth feature value may be determined according to the position index of the nth feature value, the translation length description data, and the dimension index of the nth feature value, so that the position encoding result may not only represent text position information of the semantic unit represented by the nth feature value, but also represent translation length information carried by the translation length description data. Wherein N is a positive integer, N is less than or equal to N, and N is a positive integer.
Step 2122: and determining the position feature to be used according to the position coding result of the 1 st feature value to the position coding result of the N th feature value.
In this embodiment of the present application, after the position encoding result of the 1 st feature value to the position encoding result of the nth feature value are obtained, the position encoding results of the nth feature value may be collected to obtain a to-be-used position feature, so that the 1 st dimension feature in the to-be-used position feature is the position encoding result of the 1 st feature value, the 2 nd dimension feature is the position encoding result of the 2 nd feature value, … … (and so on), and the nth dimension feature is the position encoding result of the nth feature value, so that the dimension of the to-be-used position feature is consistent with the dimension of the above text feature to be used, so that the to-be-used position feature and the text feature to be used are conveniently added and processed subsequently.
Based on the above-mentioned related content of step 212, after the text feature to be used and the translation length description data are obtained, the position feature to be used may be determined according to the two data, so that the position feature to be used may represent not only the character position information carried by the main source text, but also the translation length information carried by the translation length description data.
Step 213: and obtaining the feature to be coded according to the text feature to be used and the position feature to be used.
In this embodiment of the present application, after obtaining the text feature to be used and the position feature to be used, the two features may be added (or spliced) to obtain the feature to be encoded, so that the feature to be encoded may better represent semantic information carried by the main source text and translation length information carried by the translation length description data.
Based on the above-mentioned related content in step 21, after the main source text and the translation length description data are obtained, the feature to be encoded may be extracted from the two data, so that the feature to be encoded may represent the semantic information carried by the main source text and the translation length information carried by the translation length description data.
Step 22: and inputting the feature to be encoded into an encoder to obtain a feature encoding result output by the encoder.
The "encoder" is used for performing encoding processing on input data of the encoder; moreover, embodiments of the present application are not limited to such "encoders" and may be implemented using any encoding network, either existing or future. For example, when the above "compressed translation model" is implemented using a transform structure, the above "Encoder" may include multiple encoding layers (i.e., an Encoder network in the transform).
Step 23: and determining the reduced translation to be used according to the feature encoding result and the decoder.
The embodiments of the present application are not limited to the implementation of the "decoder" described above, and may be implemented using any decoding network that exists in the present or future. For example, when the above "compressed translation model" is implemented using a transducer structure, the above "Decoder" may include multiple decoding layers (i.e., a Decoder network in the transducer).
In addition, to further avoid forgetting the translation length description data during the entire model process, the above "translation length description data" may be incorporated as a vector into the decoder to excite the decoder to perform decoding processing as much as possible in accordance with the desired length of the translation. Based on this, the present embodiment also provides another possible implementation manner of step 23, which may specifically include: and determining the reduced translation to be used according to the feature coding result, the translation length description data and the decoder.
In addition, in order to further improve the fusion effect between the above "translation length description data" and the decoder, the embodiment of the present application further provides a possible implementation manner of the above "decoder", where the "decoder" may include at least one first decoding layer.
The "first decoding layer" is used to refer to a desired length of a translation, and the input data of the first decoding layer is subjected to a decoding process (for example, a decoding process shown as "decoding network 0" in fig. 4).
In addition, the embodiment of the present application is not limited to the above "first decoding layer", and for example, it may include: the system comprises a first decoding module, an information fusion module and a first normalization module; and the input data of the first normalization module includes the output data of the first decoding module and the output data of the information fusion module (as "decoding network 0" shown in fig. 4).
The first decoding module is used for decoding the input data of the first decoding module; also, the "first decoding module" in the embodiment of the present application may include, for example, as shown in fig. 4, a Self-Attention layer (Self-Attention), two addition and normalization layers (Add & normal), a codec Attention layer (Encoder-Dncoder Attention), and a Feed Forward neural network layer (Feed Forward).
The information fusion module is used for multiplying the input data of the information fusion module with the expected length of the translation; furthermore, the embodiment of the present application is not limited to the "input data of the information fusion module" described above, and for example, the "input data of the information fusion module" may be the input data of the "first decoding layer" described above (for example, the input data of the self-attention layer in the decoding network 0 shown in fig. 4).
The first normalization module is used for adding and normalizing the input data of the first normalization module; moreover, embodiments of the present application are not limited to this implementation of the "first normalization module", and may be implemented using a summation and normalization layer, for example, as shown in fig. 4.
In addition, the embodiment of the present application does not limit the working principle of the "first normalization module" described above, and for example, it may specifically include: when the first decoding layer performs a first frame decoding operation (i.e., performs decoding processing on a first character represented by the "feature encoding result"), the "first normalization module" may perform addition and normalization processing on the output data of the "information fusion module" and the output data of the "first decoding module" (as shown in formula (5)); when the first decoding layer performs a non-first frame decoding operation (i.e., performs decoding processing for a non-first character characterized by the "feature encoding result"), the "first normalization module" may perform addition and normalization processing only for the output data of the "first decoding module".
layer 1 =LayerNorm(x×len+DM 1 (x)) (5)
In the layer 1 A first frame decoding operation result representing a first decoding layer; x represents the input data of the first decoding layer (e.g., the input data of the self-attention layer in decoding network 0 shown in fig. 4); len represents the desired length of the translation; x×len represents the output result of the information fusion module in the first decoding layer; DM (DM) 1 (x) Representing an output result of a first decoding module in the first decoding layer; layerNorm () represents the calculation function of the first normalization module in this first decoding layer.
In addition, the embodiment of the present application also provides another possible implementation manner of the "decoder" described above, where the "decoder" includes not only at least one first decoding layer, but also at least one second decoding layer.
The "second decoding layer" is used for decoding the input data of the decoding layer; moreover, embodiments of the present application are not limited to this "second decoding layer," which may be implemented, for example, using any decoding network (e.g., a Decoder network in a Transformer) that exists in the present or future. For ease of understanding, the following description is provided in connection with examples.
As an example, the "second decoding layer" described above may include a second decoding module and a second normalization module; and the input data of the second normalization module includes the output data of the second decoding module (e.g. "decoding network 1" as shown in fig. 4).
The above "second decoding module" is similar to the above "first decoding module"; and the "second normalization module" may be implemented using a summation and normalization layer.
It should be noted that the difference between the above "second decoding layer" and the above "first decoding layer" is: the "second decoding layer" does not need to refer to the desired length of the translation (as in "decoding network 1" shown in fig. 4), but the "first decoding layer" does need to refer to the desired length of the translation (as in "decoding network 0" shown in fig. 4).
To facilitate an understanding of the above "decoder", the following description is made in connection with an example.
As an example, the above-described "decoder" may include 1 first decoding layer and J second decoding layers. Wherein the input data of the 1 st second decoding layer includes the output data of the first decoding layer; the input data of the J-th second decoding layer comprises the output data of the J-1 th second decoding layer, J is a positive integer, and J is more than or equal to 2 and less than or equal to J; j is a positive integer.
It can be seen that for a decoder comprising 1 first decoding layer and J second decoding layers (a decoder with j=1 as shown in fig. 4), the input data of the J second decoding layer in the decoder comprises the output data of the J-1 th second decoding layer, the input data of the J-1 th second decoding layer comprises the output data of the J-2 th second decoding layer, … … (and so on), the input data of the 3 rd second decoding layer comprises the output data of the 2 nd second decoding layer, the input data of the 2 nd second decoding layer comprises the output data of the 1 st second decoding layer, and the input data of the 1 st second decoding layer comprises the output data of the first decoding layer.
Based on the above description of the "decoder", it is known that, for a decoder, the decoder may add a desired length of a translation as a constraint to an initial layer of the decoder so that the desired length of the translation can be transferred layer by layer in the decoder; and the specific process can be as follows: when the decoder performs initial layer operation, the expected length information can be multiplied by the initial layer input of the decoder as a vector to obtain an expected information length unit, so that the expected information length unit propagates layer by layer in the decoder, layer by layer attenuation is realized through forward propagation operation and nonlinear mapping transformation in each layer, and finally, a translation model is excited to translate a text sequence closer to the expected length.
It can be seen that the above-mentioned "decoder" is implemented by adopting heuristic fusion decoding mode; in addition, the heuristic fusion decoding mode can integrate the expected length of the translation into the decoder as a vector so as to excite a compression translation model comprising the decoder to rewrite sentence patterns according to the expected length of the translation, thus realizing the deletion processing of unimportant information and converting longer expression into simpler expression under the condition of expressing the same semantics, and being beneficial to improving the compression translation effect.
It should be noted that for each network layer in the compressed translation model, the nonlinear activation function in each network layer acts like a gate that filters some information from the specific elements in each network layer. The method comprises the steps of generating a compressed translation model according to a self-nevertheless complete compressed translation result under the constraint of a given translation expected length, wherein information attenuation occurs layer by layer in a nonlinear activation function so as to enable different expected lengths to have different degrees of information attenuation, so that the compressed translation model can learn the possibility of generating an end-of-sentence symbol EOS through the attenuation of the own length information under the heuristic of the expected length information.
Based on the above-mentioned related content of step 21 to step 23, as shown in fig. 5, for the compressed translation model including an encoder and a decoder, after the backbone source text is obtained, the translation length description data may be first introduced into the encoder, so that the encoder can refer to the translation length description data, and perform encoding processing on the backbone source text to obtain a feature encoding result; and then the translation length description data is introduced into a decoder, so that the decoder can refer to the translation length description data, and decode the feature coding result to obtain the reduced translation to be used, so that the short rewrite of the expression mode can be realized on the premise of deleting information as little as possible, the simplified compression translation processing based on the controllable length from end to end can be realized, and the translation result of the source text to be processed can be more refined.
It should be noted that, for fig. 4 and 5, both "< r >" in fig. 4 and "< r >" in fig. 5 represent the length scale section to be used; moreover, embodiments of the present application are not limited to < r >, and may be, for example, an interval, or an interval identifier (for example, an interval identifier to be used). In addition, "X" in fig. 4 and "X" in fig. 5 each represent a backbone source text.
It should also be noted that the embodiment of the present application is not limited to the implementation of the "Linear layer" in fig. 4, and may be implemented using any Linear layer (Linear) existing or appearing in the future. In addition, the embodiments of the present application are not limited to the implementation of the "decision layer" in fig. 4, and may be implemented using any decision layer (e.g., softmax) existing or occurring in the future. The embodiment of the present application is not limited to the implementation of the "codec Attention layer" in fig. 4, and may be implemented by any method of performing Attention processing based on output data of an encoder (e.g., multi-Head Attention layer (Multi-Head Attention) in a transducer) existing or appearing in the future.
In addition, the embodiment of the present application further provides a possible implementation manner of the foregoing "compressed translation model", which specifically may include steps 51-53:
Step 51: at least one sample text and an actual translation corresponding to the at least one sample text are obtained.
The "sample original text" refers to text data in a source language which is required to be used in constructing a compressed translation model; the number of "sample texts" is not limited, and may be D, for example. Wherein D is a positive integer.
The "actual translation corresponding to the d-th sample original text" refers to the actual translation result of the d-th sample original text in the target language; in addition, the embodiment of the application is not limited to the "actual translation corresponding to the d-th sample original text", for example, in order to avoid the phenomenon that the number of end words of the translation is greater than that of end words of the source text as much as possible, the text length of the "actual translation corresponding to the d-th sample original text" is relatively close to (even smaller than) the text length of the "sample original text". Wherein D is a positive integer, and D is less than or equal to D.
It should be noted that, the embodiment of the present application is not limited to the process of obtaining each sample original text and the corresponding actual translation.
Step 52: and determining the text length description data corresponding to each sample original text according to the text length of the actual translation corresponding to each sample original text.
The 'translation length description data corresponding to the d-th sample original text' is used for describing the text length of the translation result of the d-th sample original text in the target language; moreover, the embodiment of the present application is not limited to the "translation length description data corresponding to the d-th sample original," and for example, it may include: at least one of a ratio between a text length of an actual translation corresponding to a d-th sample original and a text length of the d-th sample original, and a text length of an actual translation corresponding to the d-th sample original. Wherein D is a positive integer, and D is less than or equal to D.
Step 53: and constructing a compression translation model according to the at least one sample original text, the translation length description data corresponding to the at least one sample original text and the actual translation corresponding to the at least one sample original text.
As an example, step 53 may specifically include steps 531-538:
step 531: and determining at least one candidate length proportion interval and a preset mapping relation according to the translation length description data corresponding to the at least one sample text.
As an example, step 531 may specifically include steps 5311-5316:
5311: and determining the translation source length ratio corresponding to the d sample original according to the translation length description data corresponding to the d sample original. Wherein D is a positive integer, and D is less than or equal to D.
The "translation source length ratio corresponding to the d-th sample original text" refers to the ratio between the text length of the actual translation corresponding to the d-th sample original text and the text length of the d-th sample original text.
5312: and determining a ratio range to be used according to the translation source length ratio corresponding to the D sample texts.
In this embodiment of the present application, after the translated source length ratios corresponding to the D sample texts are obtained, the maximum value of the D translated source length ratios may be determined as the upper limit value of the to-be-used ratio range, and the minimum value of the D translated source length ratios may be determined as the lower limit value of the to-be-used ratio range.
5313: the range of ratios to be used is divided into at least one candidate length ratio interval.
In this embodiment of the present application, after the to-be-used ratio range is obtained, the to-be-used ratio range may be divided into Q candidate length proportion intervals on average. Wherein Q represents the number of candidate length proportion intervals in the above-mentioned "at least one candidate length proportion interval".
5314: and respectively determining the section identifiers corresponding to the candidate length proportion sections according to the candidate length proportion sections.
The "section identifier corresponding to the q-th candidate length scale section" is used to represent the q-th candidate length scale section. Wherein Q is a positive integer, and Q is less than or equal to Q.
In addition, the embodiment of the present application is not limited to the determination process of the "section identifier corresponding to the q-th candidate length scale section", and for example, the section identifier corresponding to the q-th candidate length scale section (for example, [0.8 ]) may be determined according to one scale value in the q-th candidate length scale section (for example, 0.7-1.1). Wherein Q is a positive integer, and Q is less than or equal to Q.
5315: and establishing a corresponding relation between the q candidate length proportion interval and an interval identifier corresponding to the q candidate length proportion interval.
5316: and carrying out aggregation processing on the corresponding relation between the 1 st candidate length proportion interval and the interval identifier corresponding to the 1 st candidate length proportion interval, … … and the corresponding relation between the Q candidate length proportion interval and the interval identifier corresponding to the Q candidate length proportion interval to obtain a preset mapping relation.
Based on the above-mentioned related content of step 531, after the translation length description data corresponding to at least one sample text is obtained, at least one candidate length proportion interval and a preset mapping relationship can be determined according to the translation length description data corresponding to the sample text.
Step 532: and determining a length proportion interval identifier corresponding to the d sample original according to the translation length description data corresponding to the d sample original, at least one candidate length proportion interval and a preset mapping relation. Wherein D is a positive integer, and D is less than or equal to D.
It should be noted that, the determination process of the length proportion interval identifier corresponding to the "d-th sample primitive" is similar to the determination process of the "interval identifier to be used" above.
Step 534: and determining text extraction characteristics corresponding to the d-th sample original text according to the d-th sample original text and the length proportion interval identifier corresponding to the d-th sample original text. Wherein D is a positive integer, and D is less than or equal to D.
It should be noted that, the determination process of the "text extraction feature corresponding to the d-th sample primitive" described above is similar to the determination process of the "text feature to be used" described above.
Step 535: inputting the text extraction characteristics corresponding to the d sample text into a compression translation model to obtain a model prediction translation result corresponding to the d sample text output by the compression translation model. Wherein D is a positive integer, and D is less than or equal to D.
Step 536: judging whether a preset end condition is reached, if so, executing step 538; if not, go to step 537.
The "preset end condition" may be preset, and for example, it may include: the model loss value of the compressed translation model is lower than a preset loss threshold, the model loss value change rate of the compressed translation model is lower than a preset change rate threshold (i.e., the model reaches convergence), and the update times of the compressed translation model reach at least one of the preset times thresholds.
The "model loss value of the compression translation model" is used to represent the compression translation performance of the compression translation model; moreover, the embodiment of the application is not limited to the determination process of the model loss value of the compression translation model, and for example, the determination process can be implemented by adopting any model loss value determination method existing or appearing in the future.
Step 537: and updating the compression translation model according to the model prediction translation result corresponding to the at least one sample text and the actual translation corresponding to the at least one sample text, and returning to the step 535.
In this embodiment of the present application, after it is determined that the preset end condition is not reached, it may be determined that the compression translation effect of the compression translation model of the current round is not good, so the compression translation model may be updated by referring to the difference between the model prediction translation results corresponding to the sample texts and the actual translations corresponding to the sample texts, so that the updated compression translation model has a better compression translation effect, so that the next round of training process is implemented by returning to continue to execute step 535 and the subsequent steps thereof based on the updated compression translation model.
Step 538: the compressed translation model is saved.
In the embodiment of the application, after the preset end condition is determined to be reached, it can be determined that the compression translation effect of the compression translation model of the current round is relatively good, so that the compression translation model can be saved, and the compression translation model can be used for participating in the concurrent translation process later.
Based on the above-mentioned related content of step 51 to step 53, for the construction process of the compressed translation model, firstly, for bilingual training data, calculating the length ratio of the target text to the source text; these length scales are then discretized into a plurality of intervals, and a scale label is used as a representation of the scale of the interval. And then, in the model training process, sentence pairs marked by different proportions are sampled, and the data quantity of the sentence pairs marked by the different proportions is kept in a relatively balanced state, so that the encoder in the compressed translation model can integrate the information marked by the length interval into the hidden layer vector representation of each word in the sentence, and therefore, text vectors with the same proportion mark can be projected under vector cluster corresponding to the proportion information in the semantic representation vector space of the encoder, and the semantic representation vector space of the whole encoder can form cluster corresponding to a plurality of proportion marks. It can be seen that the mapping between the source text vector with different scale marks and the target text vector with different lengths can be learned through the training of the whole model. For example, to expect the translation compression between 0.7-1.1 for the source text, a length tag (e.g., [0.8 ]) may be added to the header of the input text, so that the model is expected to output a segment of translation text with a compression ratio within the range of the interval corresponding to the tag.
Method example IV
In order to further improve the control of the length of the translation result of the source text to be processed more accurately, text interception processing can be performed on the compressed translation result of each voice segment, so that the on-screen translation of each voice segment strictly meets the expected length of the translation corresponding to each voice segment.
To achieve the above-mentioned needs, the present embodiment also provides another possible implementation manner of the above-mentioned "translation method", where the translation method includes not only S1-S2 but also S4-S6:
s4: and determining the reduced translation to be used according to the backbone source text, the translation length description data, the compressed translation model and at least one historical legacy semantic unit.
The above-mentioned "at least one history legacy semantic unit" refers to a semantic unit that was not used (e.g., not on screen, or not sent to the user) in the previous compressed translation result because it exceeds the expected length of the translation. As an example, when the previous compression translation result is "Artificial intelligence is loved by all countries" and the expected length of the translation corresponding to the previous compression translation result is 5 words, the character to be used corresponding to the previous compression translation result is "Artificial intelligence is loved by", and the unused character corresponding to the previous compression translation result is "all counts", so that the "at least one historical legacy semantic unit" corresponding to the current compression translation process is "all counts". It should be noted that, the "character to be used" refers to a translation character sent to the user.
The "previous compressed translation result" refers to a result obtained by compressing and translating a speech segment previous to the current speech segment. It can be seen that the "at least one historic legacy semantic unit" described above can be determined from a previous speech segment of the current speech segment.
The acquisition time of the voice segment before the current voice segment is adjacent to the acquisition time of the current voice segment; and the acquisition time of the "preceding speech segment of the current speech segment" is earlier than the acquisition time of the "current speech segment". For example, as shown in fig. 2, if the current speech segment is the "third speech segment", the "preceding speech segment of the current speech segment" is the "second speech segment".
In addition, the embodiment of the present application is not limited to the implementation manner of S4, for example, when the "at least one history legacy semantic unit" includes K history legacy semantic units, the number of semantic units in the "to-be-used reduced translation" is G, and G is greater than or equal to K, the determining process of the G-th semantic unit in the "to-be-used reduced translation" includes steps 61-62:
step 61: if g is less than or equal to K, determining the g semantic units in the reduced translation to be used according to the main source text, the translation length description data, the compressed translation model and the g historical legacy semantic units. Wherein g is a positive integer, and g is less than or equal to K.
As an example, step 61 may specifically include steps 611-614:
step 611: and determining the model prediction probability in the g state according to the backbone source text, the translation length description data and the compressed translation model.
The "model prediction probability under g" refers to a distribution probability of the g-th semantic unit (for example, a prediction probability of the g-th semantic unit output by the "decision layer" shown in fig. 4) obtained by performing compression translation on the main source text by the compression translation model, so that the "model prediction probability under g-th state" is used to represent a likelihood that the g-th semantic unit is each candidate semantic unit (for example, each candidate vocabulary) in the compression translation result of the main source text.
In addition, the embodiment of the present application is not limited to the implementation of step 611, and may be implemented by using the working principle of predicting the g-th semantic unit in the compression translation result of the main source text, which is provided in the above "compression translation model"; and the implementation process specifically comprises the following steps: when g=1, determining model prediction probability in the g state according to the backbone source text, the translation length description data and the compressed translation model; when g is more than or equal to 2 and less than or equal to K, determining the model prediction probability in the g state according to the main source text, the translation length description data, the prediction correction probability in the g-1 state and the compression translation model. It can be seen that for the compressed translation model shown in FIG. 4, when 2.ltoreq.g.ltoreq.K, the "output (before)" for the decoder includes the predictive correction probability in the g-1 state.
It should be noted that any of the embodiments of the "compressed translation model" shown above may be applied to step 611.
Step 612: and determining a penalty factor value (shown in a formula (6)) according to the model prediction probability in the g state and the object prediction probability of the g historical legacy semantic unit.
punish g =δ(y g ,y′ g ) (6)
In the push g Representing penalty factor values; y is g Representing model prediction probability in the g state; y' g Representing an object prediction probability of a g-th historical legacy semantic unit; delta (y) g ,y′ g ) And the simulated annealing distribution is used for representing the model prediction probability and the object prediction probability of the g historical legacy semantic units in the g state.
The "object prediction probability of the g-th history legacy semantic unit" refers to a probability distribution of the K-g+1 st semantic unit in the previous compression translation result, so that the "object prediction probability of the g-th history legacy semantic unit" is used to represent the probability that the K-g+1 st semantic unit in the previous compression translation result is a candidate semantic unit (e.g., candidate vocabulary).
In addition, the embodiment of the application is not limited to the "object prediction probability of the g-th historical legacy semantic unit", for example, if the penalty factor value corresponding to the "K-th to g+1 semantic units" does not exist in the previous determination process of the compression translation result, it may be determined that the correction processing is not performed on the model prediction probability of the "K-th to g+1 semantic units", so that the model prediction probability of the "K-th to g+1 semantic units" may be directly determined as the object prediction probability of the "g-th historical legacy semantic unit"; however, if there is a penalty factor value corresponding to the "K-last+1 semantic units" in the previous determination of the compression translation result, it can be determined that correction processing has been performed on the model prediction probabilities of the "K-last+1 semantic units", and therefore the prediction correction probability of the "K-last+1 semantic units" can be determined as the object prediction probability of the "g-th history legacy semantic units".
Step 613: and carrying out weighted summation processing on the model prediction probability sum and the penalty factor value in the g state to obtain the prediction correction probability (shown in a formula (7)) in the g state.
p g =(1-β)×y g +β×δ(y g ,y′ g ) (7)
Wherein p is g Representing a prediction correction probability in a g-th state; y is g Representing model prediction probability in the g state; y' g Representing an object prediction probability of a g-th historical legacy semantic unit; delta (y) g ,y′ g ) A simulated annealing distribution for representing model prediction probabilities in the g-th state and object prediction probabilities for the g-th historical legacy semantic unit; beta is a harmonic ratio, the range of the value is between 0 and 1, the adjustment can be carried out according to actual needs, and the adjustment strategy is specifically as follows: if the translation result is required to be more complete and natural, the beta value is required to be adjusted to be larger, otherwise, if the translation result is required to be shorter, the beta value is required to be adjusted to be smaller. It can be seen that by setting the harmonic ratio beta, we can obtain a smoother compression junction while controlling the translation result length more preciselyAnd (5) fruits.
Step 614: and determining the g semantic unit according to the prediction correction probability in the g state.
In this embodiment of the present application, after the prediction correction probability in the g-th state is obtained, the g-th semantic unit may be determined according to the prediction correction probability in the g-th state (for example, the candidate semantic unit having the highest probability value in the prediction correction probability in the g-th state is directly determined as the g-th semantic unit).
Based on the above-mentioned related content of step 61, if the number of the historical legacy semantic units is K, the first K semantic units in the translation result of the current speech segment can be determined by referring to the K historical legacy semantic units when performing compression translation on the current speech segment, so that the K semantic units can represent semantic information carried by the K historical legacy semantic units as far as possible, and thus, information omission phenomenon caused by compulsory interception processing on the previous compression translation result can be effectively avoided, and real-time translation on the speech stream is more natural and smooth, which is beneficial to improving compression translation effect.
Step 62: if K is smaller than G and smaller than G, determining the G semantic unit in the reduced translation to be used according to the main source text, the translation length description data, the compression translation model and the G historical legacy semantic unit. Wherein G is a positive integer, K is less than or equal to G, G is a positive integer, G is more than or equal to K, and G represents the number of semantic units in the simplified translation to be used.
It should be noted that, the embodiment of the present application is not limited to the implementation of step 62, for example, it may be implemented by using the working principle of predicting the g-th semantic unit in the compression translation result of the backbone source text, which is provided in the "compression translation model" above; and the implementation process specifically comprises the following steps: firstly, determining model prediction probability in a g state according to a main source text, translation length description data, a compression translation model and a g historical legacy semantic unit; and determining a g-th semantic unit according to the g-th state model prediction probability (for example, directly determining a candidate semantic unit with the highest probability value in the g-th state model prediction probability as the g-th semantic unit).
It should be noted that any of the embodiments of the "compressed translation model" shown above may be applied to the determination of the model prediction probability in the g state according to the backbone source text, the translation length description data, the compressed translation model, and the g-th historical legacy semantic unit in step 62.
Based on the above-mentioned related content of steps 61 to 62, if the number of the historical legacy semantic units is K, for the translation result (i.e. "to be used reduced translation") of the current speech segment, the first K semantic units in the translation result refer to the K historical legacy semantic units, but the k+1st semantic unit and the subsequent semantic units in the translation result are implemented according to the conventional model prediction manner, so that the translation result can not only represent the semantic information carried by the current speech segment, but also represent the semantic information carried by the K historical legacy semantic units, so that the information omission phenomenon caused by the mandatory interception processing for the previous compression translation result can be effectively avoided, and the real-time translation for the speech stream is more natural and smooth, thereby being beneficial to improving the compression translation effect.
Based on the above-mentioned related content of S4, after the trunk source text is obtained, the compression translation model refers to the translation length description data and at least one historical legacy semantic unit, and performs compression translation processing on the trunk source text to obtain a to-be-used reduced translation, so that the to-be-used reduced translation can not only represent semantic information carried by the current speech segment, but also represent semantic information carried by the K historical legacy semantic units, so that information omission phenomenon caused by forced interception processing on the previous compression translation result can be effectively avoided, and real-time translation on a speech stream is more natural and smooth, thereby being beneficial to improving compression translation effect.
S5: and dividing the reduced translation to be used according to the expected length of the translation to obtain the translation to be used and the translation to be abandoned.
The "translation to be used" refers to text information (e.g., similar to the "Artificial intelligence is loved by" above) that needs to be sent to the user in the translation result (i.e., "reduced translation to be used") of the current speech segment (or current text); and the text length of the 'to-be-used translation' is the expected length of the translation. It can be seen that after the translation to be used is obtained, the translation to be used may be sent to the user (e.g., displayed on a display screen) so that the user can learn the translation result for the current speech segment.
The above "translation to be discarded" refers to text information (e.g., similar to the above "all counts") that does not need to be sent to the user in the translation result of the current speech segment (i.e., "reduced translation to be used").
S6: and updating at least one historical legacy semantic unit according to the translation to be discarded.
In the embodiment of the present application, after the translation to be discarded is obtained, the translation to be discarded may be directly determined as an updated historical legacy semantic unit, so that the updated historical legacy semantic unit can be referred to for implementation in the compression translation process for the next speech segment, so that the information omission phenomenon caused by forced interception processing for the translation result of the current speech segment can be effectively avoided, so that real-time translation for the speech stream is more natural and smooth, and the compression translation effect is improved.
Based on the above-mentioned content related to S4 to S6, for the trunk source text extracted from the current speech segment, the compression translation model may refer to the translation length description data and at least one historical legacy semantic unit, and perform compression translation processing on the trunk source text to obtain a to-be-used reduced translation; and then cutting the reduced translation to obtain the translation to be used according to the length of the translation represented by the length description data of the translation, so that the length of the text of the translation to be used is the length of the translation, and the translation to be used can strictly follow the length constraint of the translation, thereby being beneficial to improving the compression translation effect.
Based on the translation method provided by the embodiment of the method, the embodiment of the application also provides a translation device, which is explained and illustrated below with reference to the accompanying drawings.
Device embodiment
The device embodiment describes the translation device, and the related content is referred to the above method embodiment.
Referring to fig. 6, the structure of a translation device according to an embodiment of the present application is shown.
The translation device 600 provided in the embodiment of the present application includes:
a text obtaining unit 601, configured to obtain a source text to be processed;
a trunk extraction unit 602, configured to extract a trunk source text from the source text to be processed;
a compression translation unit 603, configured to determine a reduced translation to be used according to the backbone source text, the translation length description data, and a pre-constructed compression translation model; the compression translation model is used for compressing and translating the trunk source text by referring to the translation length description data.
In one possible implementation, the compressed translation model includes an encoder and a decoder; the compression translation unit 603 is specifically configured to: determining the feature to be encoded according to the main source text and the translation length description data; inputting the feature to be encoded into the encoder to obtain a feature encoding result output by the encoder; and determining the to-be-used reduced translation according to the feature coding result and the decoder.
In a possible implementation manner, the determining process of the feature to be encoded includes: determining text characteristics to be used according to the main source text; determining the position feature to be used according to the text feature to be used and the translation length description data; and determining the feature to be encoded according to the text feature to be used and the position feature to be used.
In one possible implementation manner, the determining process of the text feature to be used includes: and determining the text characteristics to be used according to the backbone source text and the translation length description data.
In one possible implementation manner, the determining process of the text feature to be used includes: determining a length proportion interval to be used according to the translation length description data; and determining the text characteristics to be used according to the length proportion interval to be used and the main source text.
In one possible implementation manner, the determining process of the text feature to be used includes: splicing the length proportion interval to be used and the main source text to obtain a first text; and vectorizing the first text to obtain the text feature to be used.
In one possible implementation manner, the determining process of the text feature to be used includes: searching an interval identifier corresponding to the length proportion interval to be used from a preset mapping relation to obtain the interval identifier to be used; splicing the interval identifier to be used and the main source text to obtain a second text; vectorizing the second text to obtain the text feature to be used; the preset mapping relation comprises a corresponding relation between the length proportion interval to be used and the interval mark to be used.
In one possible implementation manner, the determining process of the text feature to be used includes: vectorizing the main source text to obtain a text characterization vector; and splicing the length proportion interval to be used and the text characterization vector to obtain the text feature to be used.
In one possible implementation manner, the determining process of the text feature to be used includes: vectorizing the main source text to obtain a text characterization vector; searching an interval identifier corresponding to the length proportion interval to be used from a preset mapping relation to obtain the interval identifier to be used; splicing the interval identifier to be used with the text characterization vector to obtain the text feature to be used; the preset mapping relation comprises a corresponding relation between the length proportion interval to be used and the interval mark to be used.
In one possible implementation manner, the text feature to be used includes N feature values; wherein N is a positive integer; the determining process of the position feature to be used comprises the following steps: determining a position coding result of the nth feature value according to the position index of the nth feature value in the text feature to be used, the translation length description data and the dimension index of the nth feature value; wherein N is a positive integer, N is less than or equal to N; and determining the position feature to be used according to the position coding result of the 1 st feature value to the position coding result of the N th feature value.
In one possible implementation manner, the determining process of the position coding result of the nth feature value includes: determining a position coding result of an nth feature value according to a difference value between a translation expected length and a position index of the nth feature value in the text feature to be used and a dimension index of the nth feature value; wherein the expected length of the translation is determined according to the translation length description data.
In one possible implementation manner, the determining process of the position coding result of the nth feature value includes: determining a position coding result of an nth feature value according to a ratio between a position index of the nth feature value in the text feature to be used and a expected length of the translation and a dimension index of the nth feature value; wherein the expected length of the translation is determined according to the translation length description data.
In one possible implementation manner, the determining process of the to-be-used reduced translation includes: determining the to-be-used reduced translation according to the feature coding result, the translation length description data and the decoder; the decoder is used for referring to the translation length description data and decoding the characteristic coding result.
In one possible implementation, the decoder includes at least one first decoding layer; the first decoding layer comprises a first decoding module, an information fusion module and a first normalization module; the input data of the first normalization module comprises the output data of the first decoding module and the output data of the information fusion module; the information fusion module is used for multiplying the input data of the information fusion module with the expected length of the translation; wherein the expected length of the translation is determined according to the translation length description data.
In a possible implementation, the decoder further comprises at least one second decoding layer; the second decoding layer comprises a second decoding module and a second normalization module; wherein the input data of the second normalization module includes output data of the second decoding module.
In one possible implementation, the decoder includes 1 first decoding layer and J second decoding layers; wherein the input data of the 1 st second decoding layer includes output data of the first decoding layer; the input data of the J-th second decoding layer comprises the output data of the J-1 th second decoding layer, wherein J is a positive integer, and J is more than or equal to 2 and less than or equal to J.
In a possible implementation manner, the text obtaining unit 601 is specifically configured to: after the current voice segment is obtained, voice recognition processing is carried out on the current voice segment, and the source text to be processed is obtained.
In a possible implementation manner, the compression translation unit 603 is specifically configured to: determining a simplified translation to be used according to the backbone source text, the translation length description data, the compressed translation model and at least one historical legacy semantic unit; wherein the at least one historic legacy semantic unit is determined from a previous speech segment of the current speech segment.
In one possible implementation, the translation device 600 further includes:
the text dividing unit is used for dividing the reduced translation to be used according to the expected length of the translation to obtain the translation to be used and the translation to be abandoned; the text length of the translation to be used is the expected length of the translation; the expected length of the translation is determined according to the length description data of the translation;
And the history updating unit is used for updating the at least one history legacy semantic unit according to the translation to be discarded.
In one possible implementation, the number of the historical legacy semantic units is K; wherein K is a positive integer; the number of semantic units in the to-be-used simplified translation is more than or equal to K;
the determining process of the kth semantic unit in the to-be-used reduced translation comprises the following steps: determining model prediction probability in a kth state according to the backbone source text, the translation length description data and the compression translation model; wherein K is a positive integer, and K is less than or equal to K; determining a penalty factor value according to the model prediction probability in the kth state and the object prediction probability of the kth historical legacy semantic unit; carrying out weighted summation processing on the model prediction probability sum and the penalty factor value in the kth state to obtain prediction correction probability in the kth state; and determining the kth semantic unit according to the prediction correction probability in the kth state.
In one possible implementation, the trunk extraction unit 602 includes:
the syntax analysis subunit is used for performing dependency syntax analysis processing on the source text to be processed to obtain a dependency syntax analysis result;
The part-of-speech tagging subunit is used for performing part-of-speech tagging on the source text to be processed to obtain a part-of-speech tagging result;
an importance characterization subunit, configured to determine vocabulary importance characterization data according to the dependency syntax analysis result and the part-of-speech tagging result;
and the text determining subunit is used for determining the main source text according to the vocabulary importance representation data and the source text to be processed.
In one possible implementation, the lexical importance characterization data includes a multi-way tree to be used;
the text determination subunit is specifically configured to: determining nodes to be deleted according to the multi-tree to be used; determining a deletion identification result of the node to be deleted according to the deleted text length corresponding to the node to be deleted and the text length of the source text to be processed; if the deletion identification result of the node to be deleted meets a preset deletion condition, deleting the node to be deleted from the multi-way tree to be used, and continuously executing the step of determining the node to be deleted according to the multi-way tree to be used; if the deletion identification result of the node to be deleted does not meet the preset deletion condition, continuing to execute the step of determining the node to be deleted according to the multi-way tree to be used; and determining the main source text according to the multi-tree to be used and the source text to be processed until a preset stopping condition is reached.
In one possible implementation manner, the determining process of the deletion identification result of the node to be deleted includes: pre-deleting the nodes to be deleted from the multi-way tree to be used to obtain a pre-deleted multi-way tree; determining the length of the deleted text corresponding to the node to be deleted according to the pre-deleted multi-tree and the source text to be processed; determining a length ratio between the deleted text length and the text length of the source text to be processed; comparing the length ratio with a preset ratio threshold to obtain a comparison result to be used; and determining a deletion identification result of the node to be deleted according to the comparison result to be used.
In one possible implementation manner, the translation device 600 further includes:
the model construction unit is used for acquiring at least one sample original text and an actual translation corresponding to the at least one sample original text; determining translation length description data corresponding to each sample original text according to the text length of the actual translation corresponding to each sample original text; and constructing the compression translation model according to the at least one sample original text, the translation length description data corresponding to the at least one sample original text and the actual translation corresponding to the at least one sample original text.
Further, an embodiment of the present application further provides an apparatus, including: a processor, memory, system bus;
the processor and the memory are connected through the system bus;
the memory is for storing one or more programs, the one or more programs comprising instructions, which when executed by the processor, cause the processor to perform any of the implementations of the translation methods described above.
Further, the embodiment of the application also provides a computer readable storage medium, wherein the computer readable storage medium stores instructions, and when the instructions run on a terminal device, the instructions cause the terminal device to execute any implementation method of the translation method.
Further, the embodiment of the application also provides a computer program product, which when run on a terminal device, causes the terminal device to execute any implementation method of the translation method.
From the above description of embodiments, it will be apparent to those skilled in the art that all or part of the steps of the above described example methods may be implemented in software plus necessary general purpose hardware platforms. Based on such understanding, the technical solutions of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions to cause a computer device (which may be a personal computer, a server, or a network communication device such as a media gateway, etc.) to perform the methods described in the embodiments or some parts of the embodiments of the present application.
It should be noted that, in the present description, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different manner from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
It is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (23)

1. A method of translation, the method comprising:
acquiring a source text to be processed;
extracting a main source text from the source text to be processed;
determining a to-be-used simplified translation according to the backbone source text, the translation length description data and a pre-constructed compression translation model; the compression translation model is used for compressing and translating the trunk source text by referring to the translation length description data;
the determination process of the to-be-used reduced translation comprises the following steps:
determining text characteristics to be used according to the main source text;
determining the position feature to be used according to the text feature to be used and the translation length description data;
Determining the feature to be coded according to the text feature to be used and the position feature to be used;
and processing the feature to be coded by using the compression translation model to obtain the reduced translation to be used.
2. The method of claim 1, wherein the compressed translation model comprises an encoder and a decoder;
the processing the feature to be encoded by using the compression translation model to obtain the reduced translation to be used includes:
inputting the feature to be encoded into the encoder to obtain a feature encoding result output by the encoder;
and determining the to-be-used reduced translation according to the feature coding result and the decoder.
3. The method of claim 1, wherein the determining text features to be used from the backbone source text comprises:
and determining the text characteristics to be used according to the backbone source text and the translation length description data.
4. A method according to claim 3, wherein said determining text features to be used based on said backbone source text and said translation length description data comprises:
determining a length proportion interval to be used according to the translation length description data;
And determining the text characteristics to be used according to the length proportion interval to be used and the main source text.
5. The method of claim 4, wherein the determining the text feature to be used based on the length scale interval to be used and the backbone source text comprises:
splicing the length proportion interval to be used and the main source text to obtain a first text; vectorizing the first text to obtain the text feature to be used;
or alternatively, the process may be performed,
the determining the text feature to be used according to the length proportion interval to be used and the main source text comprises the following steps:
searching an interval identifier corresponding to the length proportion interval to be used from a preset mapping relation to obtain the interval identifier to be used; splicing the interval identifier to be used and the main source text to obtain a second text; vectorizing the second text to obtain the text feature to be used; the preset mapping relation comprises a corresponding relation between the length proportion interval to be used and the interval mark to be used;
or alternatively, the process may be performed,
the determining the text feature to be used according to the length proportion interval to be used and the main source text comprises the following steps:
Vectorizing the main source text to obtain a text characterization vector; splicing the length proportion interval to be used and the text characterization vector to obtain the text feature to be used;
or alternatively, the process may be performed,
the determining the text feature to be used according to the length proportion interval to be used and the main source text comprises the following steps:
vectorizing the main source text to obtain a text characterization vector; searching an interval identifier corresponding to the length proportion interval to be used from a preset mapping relation to obtain the interval identifier to be used; splicing the interval identifier to be used with the text characterization vector to obtain the text feature to be used; the preset mapping relation comprises a corresponding relation between the length proportion interval to be used and the interval mark to be used.
6. The method of claim 1, wherein the text feature to be used comprises N feature values; wherein N is a positive integer;
the determining process of the position feature to be used comprises the following steps:
determining a position coding result of the nth feature value according to the position index of the nth feature value in the text feature to be used, the translation length description data and the dimension index of the nth feature value; wherein N is a positive integer, N is less than or equal to N;
And determining the position feature to be used according to the position coding result of the 1 st feature value to the position coding result of the N th feature value.
7. The method according to claim 6, wherein determining the position coding result of the nth feature value according to the position index of the nth feature value in the text feature to be used, the translation length description data, and the dimension index of the nth feature value includes:
determining a position coding result of an nth feature value according to a difference value between a translation expected length and a position index of the nth feature value in the text feature to be used and a dimension index of the nth feature value; wherein the expected length of the translation is determined according to the translation length description data.
8. The method according to claim 6, wherein determining the position coding result of the nth feature value according to the position index of the nth feature value in the text feature to be used, the translation length description data, and the dimension index of the nth feature value includes:
determining a position coding result of an nth feature value according to a ratio between a position index of the nth feature value in the text feature to be used and a expected length of the translation and a dimension index of the nth feature value; wherein the expected length of the translation is determined according to the translation length description data.
9. The method of claim 2, wherein said determining the reduced translation to use based on the feature encoding result and the decoder comprises:
determining the to-be-used reduced translation according to the feature coding result, the translation length description data and the decoder; the decoder is used for referring to the translation length description data and decoding the characteristic coding result.
10. The method of claim 9, wherein the decoder comprises at least one first decoding layer; the first decoding layer comprises a first decoding module, an information fusion module and a first normalization module; the input data of the first normalization module comprises the output data of the first decoding module and the output data of the information fusion module; the information fusion module is used for multiplying the input data of the information fusion module with the expected length of the translation; wherein the expected length of the translation is determined according to the translation length description data.
11. The method of claim 10, wherein the decoder further comprises at least one second decoding layer;
The second decoding layer comprises a second decoding module and a second normalization module; wherein the input data of the second normalization module includes output data of the second decoding module.
12. The method of claim 11, wherein the decoder comprises 1 first decoding layer and J second decoding layers; wherein the input data of the 1 st second decoding layer includes output data of the first decoding layer; the input data of the J-th second decoding layer comprises the output data of the J-1 th second decoding layer, wherein J is a positive integer, and J is more than or equal to 2 and less than or equal to J.
13. The method of claim 1, wherein the determining that a reduced translation is to be used based on the backbone source text, translation length description data, and a pre-constructed compressed translation model comprises:
and determining the reduced translation to be used according to the backbone source text, the translation length description data, the compressed translation model and at least one historical legacy semantic unit.
14. The method of claim 13, wherein the method further comprises:
dividing the reduced translation to be used according to the expected length of the translation to obtain the translation to be used and the translation to be discarded; the text length of the translation to be used is the expected length of the translation; the expected length of the translation is determined according to the length description data of the translation;
And updating the at least one historical legacy semantic unit according to the translation to be discarded.
15. The method according to claim 13 or 14, wherein the number of history legacy semantic units is K; wherein K is a positive integer; the number of semantic units in the to-be-used simplified translation is more than or equal to K;
the determining process of the kth semantic unit in the to-be-used reduced translation comprises the following steps:
determining model prediction probability in a kth state according to the backbone source text, the translation length description data and the compression translation model; wherein K is a positive integer, and K is less than or equal to K;
determining a penalty factor value according to the model prediction probability in the kth state and the object prediction probability of the kth historical legacy semantic unit;
carrying out weighted summation processing on the model prediction probability sum and the penalty factor value in the kth state to obtain prediction correction probability in the kth state;
and determining the kth semantic unit according to the prediction correction probability in the kth state.
16. The method of claim 1, wherein extracting the backbone source text from the source text to be processed comprises:
performing dependency syntax analysis processing on the source text to be processed to obtain a dependency syntax analysis result;
Performing part-of-speech tagging on the source text to be processed to obtain a part-of-speech tagging result;
determining vocabulary importance characterization data according to the dependency syntax analysis result and the part-of-speech tagging result;
and determining the main source text according to the vocabulary importance characterization data and the source text to be processed.
17. The method of claim 16, wherein the lexical importance characterization data comprises a multi-way tree to be used;
the step of determining the backbone source text according to the vocabulary importance characterization data and the source text to be processed comprises the following steps:
determining nodes to be deleted according to the multi-tree to be used;
determining a deletion identification result of the node to be deleted according to the deleted text length corresponding to the node to be deleted and the text length of the source text to be processed;
if the deletion identification result of the node to be deleted meets a preset deletion condition, deleting the node to be deleted from the multi-way tree to be used, and continuously executing the step of determining the node to be deleted according to the multi-way tree to be used;
if the deletion identification result of the node to be deleted does not meet the preset deletion condition, continuing to execute the step of determining the node to be deleted according to the multi-way tree to be used;
And determining the main source text according to the multi-tree to be used and the source text to be processed until a preset stopping condition is reached.
18. The method according to claim 17, wherein the determining of the deletion identification result of the node to be deleted includes:
pre-deleting the nodes to be deleted from the multi-way tree to be used to obtain a pre-deleted multi-way tree;
determining the length of the deleted text corresponding to the node to be deleted according to the pre-deleted multi-tree and the source text to be processed;
determining a length ratio between the deleted text length and the text length of the source text to be processed;
comparing the length ratio with a preset ratio threshold to obtain a comparison result to be used;
and determining a deletion identification result of the node to be deleted according to the comparison result to be used.
19. The method of claim 1, wherein the process of constructing the compressed translation model comprises:
acquiring at least one sample original text and an actual translation corresponding to the at least one sample original text;
determining translation length description data corresponding to each sample original text according to the text length of the actual translation corresponding to each sample original text;
And constructing the compression translation model according to the at least one sample original text, the translation length description data corresponding to the at least one sample original text and the actual translation corresponding to the at least one sample original text.
20. A translation apparatus, comprising:
the text acquisition unit is used for acquiring a source text to be processed;
the trunk extraction unit is used for extracting trunk source texts from the source texts to be processed;
the compression translation unit is used for determining a reduced translation to be used according to the trunk source text, the translation length description data and a pre-constructed compression translation model; the compression translation model is used for compressing and translating the trunk source text by referring to the translation length description data;
the compression translation unit is specifically configured to:
determining text characteristics to be used according to the main source text;
determining the position feature to be used according to the text feature to be used and the translation length description data;
determining the feature to be coded according to the text feature to be used and the position feature to be used;
and processing the feature to be encoded by using the compression translation model to obtain the reduced translation to be used, and processing the feature to be encoded by using the compression translation model to obtain the reduced translation to be used.
21. An apparatus, the apparatus comprising: a processor, memory, system bus;
the processor and the memory are connected through the system bus;
the memory is for storing one or more programs, the one or more programs comprising instructions, which when executed by the processor, cause the processor to perform the method of any of claims 1-19.
22. A computer readable storage medium having instructions stored therein which, when run on a terminal device, cause the terminal device to perform the method of any of claims 1 to 19.
23. A computer program product, characterized in that the computer program product, when run on a terminal device, causes the terminal device to perform the method of any of claims 1 to 19.
CN202111592412.5A 2021-12-23 2021-12-23 Translation method and related equipment thereof Active CN114254657B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202111592412.5A CN114254657B (en) 2021-12-23 2021-12-23 Translation method and related equipment thereof
PCT/CN2022/088961 WO2023115770A1 (en) 2021-12-23 2022-04-25 Translation method and related device therefor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111592412.5A CN114254657B (en) 2021-12-23 2021-12-23 Translation method and related equipment thereof

Publications (2)

Publication Number Publication Date
CN114254657A CN114254657A (en) 2022-03-29
CN114254657B true CN114254657B (en) 2023-05-30

Family

ID=80794781

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111592412.5A Active CN114254657B (en) 2021-12-23 2021-12-23 Translation method and related equipment thereof

Country Status (2)

Country Link
CN (1) CN114254657B (en)
WO (1) WO2023115770A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114254657B (en) * 2021-12-23 2023-05-30 中国科学技术大学 Translation method and related equipment thereof

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8060358B2 (en) * 2008-03-24 2011-11-15 Microsoft Corporation HMM alignment for combining translation systems
CN109271643A (en) * 2018-08-08 2019-01-25 北京捷通华声科技股份有限公司 A kind of training method of translation model, interpretation method and device
CN111079449B (en) * 2019-12-19 2023-04-11 北京百度网讯科技有限公司 Method and device for acquiring parallel corpus data, electronic equipment and storage medium
CN113051935A (en) * 2019-12-26 2021-06-29 Tcl集团股份有限公司 Intelligent translation method and device, terminal equipment and computer readable storage medium
CN114254657B (en) * 2021-12-23 2023-05-30 中国科学技术大学 Translation method and related equipment thereof

Also Published As

Publication number Publication date
WO2023115770A1 (en) 2023-06-29
CN114254657A (en) 2022-03-29

Similar Documents

Publication Publication Date Title
CN106502985B (en) neural network modeling method and device for generating titles
CN111858932A (en) Multiple-feature Chinese and English emotion classification method and system based on Transformer
CN111339765B (en) Text quality assessment method, text recommendation method and device, medium and equipment
CN110688862A (en) Mongolian-Chinese inter-translation method based on transfer learning
CN110442880B (en) Translation method, device and storage medium for machine translation
CN116501306B (en) Method for generating interface document code based on natural language description
CN111666764B (en) Automatic abstracting method and device based on XLNet
CN115759119B (en) Financial text emotion analysis method, system, medium and equipment
CN116661805B (en) Code representation generation method and device, storage medium and electronic equipment
CN115831102A (en) Speech recognition method and device based on pre-training feature representation and electronic equipment
CN111859950A (en) Method for automatically generating lecture notes
CN115309915A (en) Knowledge graph construction method, device, equipment and storage medium
CN114254657B (en) Translation method and related equipment thereof
CN112287641B (en) Synonym sentence generating method, system, terminal and storage medium
CN113343717A (en) Neural machine translation method based on translation memory library
CN110717316B (en) Topic segmentation method and device for subtitle dialog flow
CN115033706A (en) Method for automatically complementing and updating knowledge graph
CN114168720A (en) Natural language data query method and storage device based on deep learning
CN114328910A (en) Text clustering method and related device
CN114139561A (en) Multi-field neural machine translation performance improving method
CN113536797A (en) Slice document key information single model extraction method and system
CN114328894A (en) Document processing method, document processing device, electronic equipment and medium
CN112015891A (en) Method and system for classifying messages of network inquiry platform based on deep neural network
CN111931496A (en) Text style conversion system and method based on recurrent neural network model
CN111783465A (en) Named entity normalization method, system and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20230506

Address after: 230026 Jinzhai Road, Baohe District, Hefei, Anhui Province, No. 96

Applicant after: University of Science and Technology of China

Applicant after: IFLYTEK Co.,Ltd.

Address before: NO.666, Wangjiang West Road, hi tech Zone, Hefei City, Anhui Province

Applicant before: IFLYTEK Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant