CN113239707A - Text translation method, text translation device and storage medium - Google Patents

Text translation method, text translation device and storage medium Download PDF

Info

Publication number
CN113239707A
CN113239707A CN202110226769.5A CN202110226769A CN113239707A CN 113239707 A CN113239707 A CN 113239707A CN 202110226769 A CN202110226769 A CN 202110226769A CN 113239707 A CN113239707 A CN 113239707A
Authority
CN
China
Prior art keywords
text
target entity
translated
translation
name
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110226769.5A
Other languages
Chinese (zh)
Inventor
孙于惠
李响
刘凯
成亦薇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xiaomi Mobile Software Co Ltd
Beijing Xiaomi Pinecone Electronic Co Ltd
Original Assignee
Beijing Xiaomi Mobile Software Co Ltd
Beijing Xiaomi Pinecone Electronic Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xiaomi Mobile Software Co Ltd, Beijing Xiaomi Pinecone Electronic Co Ltd filed Critical Beijing Xiaomi Mobile Software Co Ltd
Priority to CN202110226769.5A priority Critical patent/CN113239707A/en
Publication of CN113239707A publication Critical patent/CN113239707A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation

Abstract

The present disclosure relates to a text translation method, a text translation apparatus, and a storage medium. The text translation method comprises the following steps: acquiring a text to be translated, and identifying a target entity word and an abbreviation corresponding to the target entity word in the text to be translated; replacing all the abbreviations with the target entity words to obtain a first text corresponding to the text to be translated; and determining a translation result of the text to be translated based on the first text. By the embodiment of the disclosure, the target entity words and the abbreviations corresponding to the target entity words in the text to be translated have the same translation results, and the translation consistency of the target entity words and the abbreviations in the text is ensured.

Description

Text translation method, text translation device and storage medium
Technical Field
The present disclosure relates to the field of language processing technologies, and in particular, to a text translation method, a text translation apparatus, and a storage medium.
Background
With frequent international collaboration, translation quality and efficiency of the translation industry have met great challenges, and with the rapid development of artificial intelligence, the great potential of machine translation in the translation industry begins to emerge gradually. Machine translation, using a computer to convert one natural language into another. Under the support of large-scale training data, machine translation achieves high quality, great breakthrough is made in the aspect of accuracy, and the degree comparable to that of manual translation can be achieved in some fields.
However, in some practical translation applications, some problems are still faced. For a text content comprising a plurality of sentences, it is necessary to ensure that the same entity referring to the same object appearing therein is consistent during translation. At present, most of machine translation is to split a text into single sentences and translate the sentences one by one, so that different translations may be generated in different sentences by the same entity, and the problems of inconsistent translations before and after translation, inconsistent entity translation and the like occur in translation. Particularly, when an entity and an abbreviation of the entity exist in the same text at the same time, the phenomenon of inconsistent translation is more serious, so that the machine translation efficiency is low, and the translation effect is poor.
Disclosure of Invention
To overcome the problems in the related art, the present disclosure provides a text translation method, a text translation apparatus, and a storage medium.
According to an aspect of an embodiment of the present disclosure, there is provided a text translation method including: acquiring a text to be translated, and identifying a target entity word and an abbreviation corresponding to the target entity word in the text to be translated; replacing all the abbreviations with the target entity words to obtain a first text corresponding to the text to be translated; and determining a translation result of the text to be translated based on the first text.
In some embodiments, determining a translation result of the text to be translated based on the first text comprises: replacing the target entity words in the first text by the same replacement symbol to obtain a second text corresponding to the text to be translated; translating other texts in the second text except the replacing symbol to obtain a first translation result, and translating the target entity word to obtain a translation result of the target entity word; and replacing the replacement symbol in the first translation result with the translation result of the target entity word to obtain the final translation result of the text to be translated.
In some embodiments, the identifying a target entity word included in the text to be translated and an abbreviation corresponding to the target entity word includes: determining rules for identifying the target entity words and abbreviations corresponding to the target entity words; and identifying the target entity words and the abbreviations corresponding to the target entity words included in the text to be translated based on the rules.
In some embodiments, the identifying a target entity word included in the text to be translated and an abbreviation corresponding to the target entity word includes: determining the target entity words and the abbreviations corresponding to the target entity words included in the text to be translated based on the reference resolution model, and/or determining the target entity words and the abbreviations corresponding to the target entity words included in the text to be translated based on the corresponding relation between the entity words and the abbreviations.
In some embodiments, the target entity word comprises a person name comprising a first type of person name and a second type of person name, the person name comprising a first portion and a second portion; the identifying of the target entity word and the abbreviation corresponding to the target entity word included in the text to be translated comprises: determining the name of the person included in the text to be translated based on a regular expression for determining the name of the person; if the person name is the first type of person name and the first part of the first type of person name exists in the text to be translated, determining to identify an abbreviation corresponding to the first type of person name; and if the name is the second type name and the second part of the second type name exists in the text to be translated, determining to identify the abbreviation corresponding to the second type name.
According to still another aspect of the embodiments of the present disclosure, there is provided a text translation apparatus including: the acquisition module is used for acquiring a text to be translated; the recognition module is used for recognizing target entity words and abbreviations corresponding to the target entity words in the text to be translated; and the determining module is used for replacing all the abbreviations with the target entity words to obtain a first text corresponding to the text to be translated, and determining a translation result of the text to be translated based on the first text.
In some embodiments, the determining module determines the translation result of the text to be translated based on the first text in the following manner: replacing the target entity words in the first text by the same replacement symbol to obtain a second text corresponding to the text to be translated; translating other texts in the second text except the replacing symbol to obtain a first translation result, and translating the target entity word to obtain a translation result of the target entity word; and replacing the replacement symbol in the first translation result with the translation result of the target entity word to obtain the final translation result of the text to be translated.
In some embodiments, the recognition module recognizes a target entity word included in the text to be translated and an abbreviation corresponding to the target entity word as follows: determining rules for identifying the target entity words and abbreviations corresponding to the target entity words; and identifying the target entity words and the abbreviations corresponding to the target entity words included in the text to be translated based on the rules.
In some embodiments, the recognition module recognizes a target entity word included in the text to be translated and an abbreviation corresponding to the target entity word as follows: determining the target entity words and the abbreviations corresponding to the target entity words included in the text to be translated based on the reference resolution model, and/or determining the target entity words and the abbreviations corresponding to the target entity words included in the text to be translated based on the corresponding relation between the entity words and the abbreviations.
In some embodiments, the target entity word comprises a person name comprising a first type of person name and a second type of person name, the person name comprising a first portion and a second portion; the recognition module recognizes a target entity word and an abbreviation corresponding to the target entity word included in the text to be translated by adopting the following modes, and the recognition module comprises the following steps: determining the name of the person included in the text to be translated based on a regular expression for determining the name of the person; if the person name is the first type of person name and the first part of the first type of person name exists in the text to be translated, determining to identify an abbreviation corresponding to the first type of person name; and if the name is the second type name and the second part of the second type name exists in the text to be translated, determining to identify the abbreviation corresponding to the second type name.
According to still another aspect of the embodiments of the present disclosure, there is provided a text translation apparatus including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to: performing the text translation method of any of the preceding claims.
According to yet another aspect of embodiments of the present disclosure, there is provided a non-transitory computer-readable storage medium having instructions stored thereon which, when executed by a processor of a mobile terminal, enable the mobile terminal to perform any one of the text translation methods described above.
The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: according to the embodiment of the disclosure, the text to be translated is obtained, the target entity word and the abbreviation corresponding to the target entity word in the text to be translated are identified, all the abbreviations in the text to be translated are replaced by the target entity word, the replaced text to be translated is translated, the target entity word and the abbreviation corresponding to the target entity word in the text to be translated can have the same translation result through the replacement of the abbreviations, and the translation consistency of the target entity word and the abbreviation thereof in the text is ensured.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.
FIG. 1 is a flow chart illustrating a method of text translation according to an exemplary embodiment of the present disclosure.
Fig. 2 is a flowchart illustrating a method of determining a translation result of a text to be translated according to an exemplary embodiment of the present disclosure.
Fig. 3 is a flowchart illustrating a method of recognizing a target entity word and an abbreviation corresponding to the target entity word according to an exemplary embodiment of the present disclosure.
Fig. 4 is a block diagram illustrating a text translation apparatus according to an exemplary embodiment of the present disclosure.
FIG. 5 illustrates a block diagram of an apparatus for text translation in accordance with an exemplary embodiment of the present disclosure.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
Machine translation, also known as automatic translation, is a method of converting one natural language, the source language, to another natural language, the target language, using a computer. Under the support of large-scale training data, machine translation achieves high quality, great breakthrough is made in the aspect of accuracy, and the degree comparable to that of manual translation can be achieved in some fields. The machine translation research method is divided into rules and statistics. Due to the long development period of the rule system and the large capital and manpower requirements, the rule system is slow to progress. In contrast, the statistical method has the advantages of short development period, convenience in processing large-scale corpora and the like, and the statistical method is advantageous. Among the statistical machine translation methods, phrase-based translation methods are well developed.
In some practical translation applications, some problems are still faced. For a text content comprising a plurality of sentences, it is necessary to ensure that the same entity referring to the same object appearing therein is consistent during translation. In the current machine translation, a text is mostly split into single sentences, and translation is carried out sentence by sentence.
For example, in some possible implementations, one of the pieces of data Y ═ Y1, Y2, …, yn ] is extracted from a large number of chinese monolingual chapter data Y, where n represents n clauses included in the chapter data Y. Based on the chinese-to-english translation model, the translation result of the data y at the sentence level is x, x ═ x1, x2, … …, xn. Using the english translation model, the translation result of the data x is y ', y ═ y 1', y2 ', …, yn'). And (y', y) training a translation model to ensure entity translation consistency.
Figure RE-GDA0003124573170000041
Figure RE-GDA0003124573170000051
The above table shows the translation obtained when the translation model is used to perform chinese translation of an english text, and the english text to be translated includes two sentences. Through reading and analyzing the text, the "Chen Wei" in the first sentence and the "Chen" in the second sentence refer to the same object, and the "Chen" in the second sentence is an abbreviation of the "Ch en Wei" in the first sentence. However, "Chen Wei" and "Chen" generate different translations in Chinese translations under sentence-level translation based on the translation model. In order to ensure consistency of name translation, modeling may be performed based on a chapter translation model, for example, a first Sentence is used as Source file content (Source Document Context), and a second Sentence, i.e., a Source Current Sentence (Source Current sequence), is decoded. When modeling the second sentence, Context (Context) information is introduced to guide the translation model to maintain consistency with the part of the generated translation and the translation consistency of the related entity when generating the translation. However, when the model learning Context information is insufficient, the method for modeling the chapter model cannot ensure the consistency of the translation results of the names and the abbreviations thereof.
Therefore, the text translation method provided by the disclosure includes the steps of obtaining a text to be translated, identifying a target entity word and an abbreviation corresponding to the target entity word included in the text to be translated, replacing all the abbreviations in the text to be translated with the target entity word, and enabling the entity word and the abbreviation corresponding to the entity word to have the same translation result through the replacement of the abbreviations.
Fig. 1 is a flowchart illustrating a text translation method according to an exemplary embodiment of the present disclosure, and as shown in fig. 1, the text translation method includes the following steps.
In step S101, a text to be translated is acquired, and a target entity word included in the text to be translated and an abbreviation corresponding to the target entity word are recognized.
In step S102, the abbreviations are all replaced with target entity words, so as to obtain a first text corresponding to the text to be translated.
In step S103, a translation result of the text to be translated is determined based on the first text.
In the embodiment of the present disclosure, the text translation method may be applied to various types of electronic devices, where the electronic devices may be mobile devices and fixed devices, the mobile devices may be mobile phones, tablet computers, and the like, and the fixed devices include personal computers, smart palm assistants, and the like. The text translation method can also be applied to terminal applications or web pages. The text to be translated may be text including a plurality of sentences, such as discourse text and the like; the target entity words are included in the text to be translated, and the target entity words can be names of people, places, organizations, proper nouns, terms or organizations and the like.
In the embodiment of the disclosure, when a text to be translated is translated, the text to be translated is obtained, wherein the text to be translated includes a target entity word, and the text to be translated also includes an abbreviation corresponding to the target entity word. For example, the text to be translated is English language text, which includes two sentences, namely "A year later, Dr. Chen Wei wa ordered to help the country build a later. Her husband and sand the hen the wee told a frame of Chen wa aid on a night TV show, he wa surposed". The target entity word of the text to be translated is the name of a person "Chen Wei", and the abbreviation corresponding to the target entity word is "Chen". And replacing the abbreviation "Chen" in the text to be translated with the target entity word "Chen Wei" corresponding to the abbreviation "Chen" to obtain a first text corresponding to the text to be translated. It is understood that "Chen" in the first text is entirely replaced with "Chen Wei". The first text mentioned above is "A year later, Dr. Chen Wei wa ordered to help the company build a below way. her husband and sand the hen way bottom frame of Chen Wei wa ordered on a night TV show, he wa highlighted". And through replacement, keeping the target entity words and the abbreviations corresponding to the target entity words in the text to be translated consistent. When the replaced first text is translated, the translation may be performed by using a translation model, and the translation model may be a neural network model, for example, obtained by training based on a convolutional neural network model (CNN), a recurrent neural network model (RNN), a long-term memory system (LSTM), or the like, which is not limited in this disclosure. And obtaining a translation result of the text to be translated through translation.
According to the embodiment of the disclosure, the target entity word and the abbreviation corresponding to the target entity word in the text to be translated are identified by obtaining the text to be translated, the abbreviation in the text to be translated is completely replaced by the target entity word, the replaced text to be translated is translated, and the target entity word and the abbreviation corresponding to the target entity word in the text to be translated can have the same translation result through the replacement of the abbreviation, so that the translation consistency of the target entity word and the abbreviation thereof in the text is ensured.
Fig. 2 is a flowchart illustrating a method for determining a translation result of a text to be translated according to an exemplary embodiment of the present disclosure, and as shown in fig. 2, the method for determining the translation result of the text to be translated includes the following steps.
In step S201, the target entity word in the first text is replaced by the same replacement symbol, so as to obtain a second text corresponding to the text to be translated.
In step S202, the other texts except for the substitute in the second text are translated to obtain a first translation result, and the target entity word is translated to obtain a translation result of the target entity word.
In step S203, the replacement symbol in the first translation result is replaced with the translation result of the target entity word, so as to obtain a final translation result of the text to be translated.
In the embodiment of the disclosure, when a text to be translated is translated, the text to be translated is obtained, wherein the text to be translated includes a target entity word, and the text to be translated also includes an abbreviation corresponding to the target entity word. And replacing all the abbreviations corresponding to the target entity words in the text to be translated with the target entity words to obtain a first text corresponding to the text to be translated. And replacing all target entity words in the first text by the same replacing symbol to obtain a second text corresponding to the text to be translated. The text to be translated, the first text corresponding to the text to be translated, and the second text corresponding to the text to be translated have a corresponding relationship.
Still taking the above-mentioned text to be translated as an example, the text to be translated is English text, "A year later, Dr. Chen Wei wa ordered to help the desk build a below. The target entity word in the text to be translated is "Chen Wei", and the abbreviation corresponding to "Chen Wei" is "Chen". The first text corresponding to the text to be translated is "A year later, Dr. Chen Wei wa ordered to help the floor of the building build a subway. her husband and sand where the power bottom a frame of Chen Wei wa ordered on a night TV show, he wa highlighted". And replacing the two target entity words 'Chen Wei' in the first text by the same replacing symbol '$ tag' to obtain a second text corresponding to the text to be translated. The second text is "A year later, Dr. $ tag wa ordered to help the company floor, her husband and sand where the company tool bottom a frame of $ tag wa aid on a night TV show, he wa submitted". And translating other texts except the replacement symbol in the second text in an English translation to obtain a first translation result, namely the first translation result is 'one year later', and the $ tag doctor acts on the subway construction in the country. Her husband says that she is surprised when they are told that a frame of $ tag is shown in a night television program. And translating the target entity word 'Chen Wei' to obtain a translation result 'ChenWEI' of the target entity word. And replacing the replacing symbol $ tag in the first translation result with the translation result of the target entity word to obtain the final translation result of the text to be translated. The translation result corresponding to the text to be translated is' one year later, the Chenwei doctor worries about helping the country build the subway. Her husband says that she is surprised when a frame of a video that they are told of is shown in a night television program, ensuring consistency of translation of the target physical word and the abbreviation corresponding to the target physical word in the text.
According to the embodiment of the disclosure, a target entity word in a first text is replaced by the same replacement symbol to obtain a second text corresponding to a text to be translated, other texts except the replacement symbol in the second text are translated to obtain a first translation result, and the target entity word is translated to obtain a translation result of the target entity word; and replacing the replacement symbol in the first translation result with the translation result of the target entity word to obtain the final translation result of the text to be translated, wherein the target entity word and the abbreviation corresponding to the target entity word in the text to be translated have the same translation result through replacing the abbreviation, and the translation consistency of the target entity word and the abbreviation in the text is ensured.
Fig. 3 is a flowchart illustrating a method of recognizing a target entity word and an abbreviation corresponding to the target entity word according to an exemplary embodiment of the present disclosure, and the method of recognizing the target entity word and the abbreviation corresponding to the target entity word, as shown in fig. 3, includes the following steps.
In step S301, a rule for identifying the target entity word and the abbreviation of the target entity word is determined.
In step S302, a target entity word included in the text to be translated and an abbreviation corresponding to the target entity word are identified based on the rule.
In the embodiment of the disclosure, when a text to be translated is translated, the text to be translated is obtained, wherein the text to be translated includes a target entity word, and the text to be translated also includes an abbreviation corresponding to the target entity word. For example, when the target entity word is a name, the name may be a chinese name or a foreign name, and both the chinese name and the foreign name are composed of a surname and a first name, and when the text to be translated is an english text, the initial of each part in the english expression of the name is in an uppercase form. The Chinese names are generally first and last names, such as the Chinese name "Li Lei" in English text, where "Li" is the last name and "Lei" is the first name. The first foreign name is generally the first name and the last name, for example, the first foreign name "Jim Green" in english text, where "Green" is the last name and "Jim" is the first name. The rule for identifying the abbreviation of the target entity word included in the text to be translated may be, for example, a rule for formulating a regular expression, "[ english name/chinese surname ] [ a-Z ] +". The regular expression is a filtering logic formula which adopts a plurality of predefined specific characters and the combination of the specific characters to form a 'regular character string' to operate on the character string. The regular expression in the embodiment of the disclosure extracts Chinese names or foreign names matched with the regular expression from the text to be translated. When the target entity word is a person name, the person name may include a first type of person name and a second type of person name; whatever the type of person name, the person name may include two parts, a first part and a second part; it is understood that the first type of name may be a chinese name and the second type of name may be an english name. The corresponding formats of the names of different languages are different, under the condition that the names are Chinese names, the first part comprises Chinese surnames, and the second part comprises Chinese names; in the case of a person name in English, the first part includes the first English name and the second part includes the last English name. Therefore, after the name of the person is identified, the type of the name of the person needs to be judged first, and then the corresponding part of the abbreviation of the name of the person needs to be determined according to the type of the name of the person.
Determining the names included in the text to be translated based on the regular expression, and if the names are first type names, such as Chinese names, and the first part of the Chinese names exists in the text to be translated, determining to identify the abbreviations corresponding to the Chinese names. And if the person name is a second type of person name, such as an English person name, and a second part of the English person name exists in the text to be translated, determining to recognize the abbreviation corresponding to the English person name. In an example, when the text to be translated is an English text, the name "Chen Wei" in the text to be translated is determined and recognized based on the regular expression. Further, it is determined whether the identified name, i.e., the surname of the chinese name, exists in the two parts constituting the name, and it is determined whether the identified name is the chinese name or the foreign name. The abbreviation of Chinese name is the first part of the name, and the abbreviation of foreign name is the second part of the name. For example, the abbreviation of the chinese name "Li Lei" in the english text to be translated is "Li", and the abbreviation of the foreign name "Jim Green" in the english text to be translated is "Green". For the text to be translated in the english language in the above example, the person name "Chen Wei" is recognized in the text to be translated based on the regular expression, and the abbreviation of the person name "Chen Wei" or "Chen" is determined based on the rule. And (3) completely replacing the abbreviation "Chen" corresponding to the name of the person in the text to be translated with the name "Chen Wei" to obtain a first text corresponding to the text to be translated. Through replacement, the target entity words included in the text to be translated and the abbreviations corresponding to the target entity words are kept consistent, and the first text after replacement is translated to obtain a translation result.
It is understood that, in the recognition of the target entity words included in the text to be translated and the recognition of the abbreviations corresponding to the target entity words, the automatic recognition by using the named entity recognition tool can also be used. The named body recognition tool can be trained from sentences that include words of different parts of speech.
According to the embodiment of the disclosure, the rule for identifying the target entity word and the abbreviation of the target entity word in the text to be translated is determined, the target entity word and the abbreviation corresponding to the target entity word in the text to be translated are identified based on the determined rule, the target entity word and the abbreviation corresponding to the target entity word can be accurately identified, the translation consistency of the target entity word and the abbreviation in the text is ensured, and the text translation quality is improved.
In the embodiment of the disclosure, based on the reference resolution model, the target entity words and the abbreviations corresponding to the target entity words included in the text to be translated are determined. The reference resolution model is used for analyzing word sequences, carrying out target replacement on the substitute words and the invisible references in the word result, and perfecting the semantic structure of the sentence. When determining a target entity word included in a text to be translated based on the reference resolution model, performing word segmentation on the text to be translated, which may be determining all entity words in the text to be translated through an entity recognition model, where the number of the entity words may be multiple. And for the plurality of recognized entity words, establishing a word pair for every two entity words, and judging whether the word pairs established by the two entity words refer to the same entity or not by utilizing a model of two classifications. Taking The text to be translated "The World Health Organization (WHO) is a specialized administration of The United Nations restricted for international public Health" as an example, The text is participled, an entity recognition model is used for recognition, and a plurality of entities included in The text, namely "World Health Organization", "WHO" and "The United Nations", are determined. Every two entities form a word pair, and < World Health Organization, WHO >, < World Health Organization, the United Nations > are input into a model of two categories, and the < World Health Organization, WHO > refers to the same entity is judged, and the WHO is an abbreviation of the World Health Organization.
According to the embodiment of the disclosure, the target entity words and the abbreviations corresponding to the target entity words included in the text to be translated are determined based on the reference resolution model, so that effective recognition of the target entity words and the abbreviations corresponding to the target entity words in various scenes and in various categories can be ensured, the translation consistency of the target entity words and the abbreviations thereof in the text is ensured, and the text translation quality is improved.
In the embodiment of the disclosure, a target entity word and a corresponding relation between abbreviations corresponding to the target entity word are created, and when the target entity word and the abbreviation corresponding to the target entity word included in a text to be translated are determined, the target entity word included in the text to be translated is determined by using the corresponding relation, and the abbreviation corresponding to the target entity word is determined. For example, creating the target entity word and the corresponding relationship between the abbreviations corresponding to the target entity word may be creating a table characterizing the corresponding relationship, through which the target entity word and the abbreviations corresponding to the target entity word may be obtained. The corresponding relation may include the English full name of Personal Computer, and the abbreviation PC corresponding to Personal Computer. Also includes World Health Organization's English full name of World Health Organization, and the World Health Organization corresponding to the abbreviation WHO. For The text to be translated "The World Health Organization a specific Organization of The United Nations for The international public Health, The WHO restriction, The study of The Organization's colleting structure and principles, The state main objective as" The attribute by all peers of The high term positional relationship of The Health Organization ", The World Health Organization included in The text to be translated and The corresponding WHO thereof are determined based on The corresponding relationship between The entity words and The abbreviations.
According to the embodiment of the disclosure, the target entity words and the abbreviations corresponding to the target entity words included in the text to be translated are determined based on the corresponding relation between the entity words and the abbreviations, so that effective recognition of the target entity words and the abbreviations corresponding to the target entity words in various scenes and in various categories can be ensured, the translation consistency of the target entity words and the abbreviations thereof in the text is ensured, and the text translation quality is improved.
In the embodiment of the present disclosure, the target entity word may be a name of a person, for example, when the text to be translated is an english text, when the english text is translated in an english translation, a name of a chinese person, a name of a foreign person, and the like included in the text. It is understood that the local target entity words may also be names of places, organizations, proper nouns, terms, or organizations.
Figure RE-GDA0003124573170000101
The above table shows the process of text translation by applying the text translation of the embodiment of the present disclosure, and the text to be translated is an english text, which includes two sentences, "a year later, dr. Chen Wei wa ordered to help the country build a subway," and "Her husband and sand book the wee told a frame of Chen wa aid on a night TV show, he ws surposed". The target entity word included in the text to be translated is identified as "Chen Wei", the abbreviation corresponding to "Chen Wei" is identified as "Chen", and "Chen Wei" and "Chen" are identified in the text to be translated. And replacing the abbreviation "Chen" in the text to be translated with the target entity word "Chen Wei" corresponding to the abbreviation "Chen" to obtain a first text corresponding to the text to be translated. It can be known that the first text corresponding to the text to be translated is "A year later, Dr. Chen Wei wa ordered to help the journal build a subway. Her husband and sand book the wer told a frame of Chen Wei wa aid on a night TV show, he wa highlighted". And replacing the two target entity words 'Chen Wei' in the first text by the same replacing symbol '$ tag' to obtain a second text corresponding to the text to be translated. The second text is "A year later, Dr. $ tag wa ordered to help the company floor, her husband and sand where the company tool bottom a frame of $ tag wa aid on a night TV show, he wa submitted". And translating other texts except the replacement symbol in the second text in an English translation to obtain a first translation result, namely the first translation result is 'one year later', and the $ tag doctor acts on the subway construction in the country. Her husband says that she is surprised when they are told that a frame of $ tag is shown in a night television program. And translating the target entity word 'Chen Wei' to obtain a translation result 'ChenWEI' of the target entity word. And replacing the replacement symbol in the first translation result with the translation result of the target entity word to obtain the final translation result of the text to be translated. The translation result corresponding to the text to be translated is' one year later, the Chenwei doctor worries about helping the country build the subway. Her husband says that she is surprised when a frame of a video that they are told of is shown in a night television program, ensuring consistency of translation of the target physical word and the abbreviation corresponding to the target physical word in the text.
According to the embodiment of the disclosure, a target entity word in a first text is replaced by the same replacement symbol to obtain a second text corresponding to a text to be translated, other texts except the replacement symbol in the second text are translated to obtain a first translation result, and the target entity word is translated to obtain a translation result of the target entity word; and replacing the replacement symbol in the first translation result with the translation result of the target entity word to obtain the final translation result of the text to be translated, wherein the target entity word and the abbreviation corresponding to the target entity word in the text to be translated have the same translation result through replacing the abbreviation, and the translation consistency of the target entity word and the abbreviation in the text is ensured.
Fig. 4 is a block diagram illustrating a text translation apparatus according to an exemplary embodiment of the present disclosure, and as shown in fig. 4, the text translation apparatus 100 includes: an acquisition module 101, a recognition module 102 and a determination module 103.
The obtaining module 101 is configured to obtain a text to be translated.
The recognition module 102 is configured to recognize a target entity word included in the text to be translated and an abbreviation corresponding to the target entity word.
The determining module 103 is configured to replace all the abbreviations with target entity words to obtain a first text corresponding to the text to be translated, and determine a translation result of the text to be translated based on the first text.
In some embodiments, the determining module 103 determines the translation result of the text to be translated based on the first text in the following manner:
replacing the target entity words in the first text by the same replacing characters to obtain a second text corresponding to the text to be translated;
translating other texts except the replacing characters in the second text to obtain a first translation result, and translating the target entity word to obtain a translation result of the target entity word;
and replacing the replacement symbol in the first translation result with the translation result of the target entity word to obtain the final translation result of the text to be translated.
In some embodiments, the recognition module 102 recognizes the target entity word and the abbreviation corresponding to the target entity word included in the text to be translated as follows:
determining rules for identifying target entity words and abbreviations corresponding to the target entity words;
and identifying a target entity word and an abbreviation corresponding to the target entity word included in the text to be translated based on the rule.
In some embodiments, the recognition module 102 recognizes the target entity word and the abbreviation corresponding to the target entity word included in the text to be translated as follows:
and determining a target entity word and an abbreviation corresponding to the target entity word included in the text to be translated based on the reference resolution model, and/or determining the target entity word and the abbreviation corresponding to the target entity word included in the text to be translated based on the corresponding relation between the entity word and the abbreviation.
In some embodiments, the target entity word comprises a person name comprising a first type of person name and a second type of person name, the person name comprising a first portion and a second portion; the recognition module 102 recognizes a target entity word and an abbreviation corresponding to the target entity word included in the text to be translated in the following manner: determining the name of a person included in the text to be translated based on a regular expression for determining the name of the person; if the person name is the first type of person name and a first part of the first type of person name exists in the text to be translated, determining to identify an abbreviation corresponding to the first type of person name; and if the person name is the second type of person name and a second part of the second type of person name exists in the text to be translated, determining to identify the abbreviation corresponding to the second type of person name.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
Fig. 5 is a block diagram illustrating an apparatus 200 for text translation in accordance with an exemplary embodiment of the present disclosure. For example, the apparatus 200 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.
Referring to fig. 5, the apparatus 200 may include one or more of the following components: a processing component 202, a memory 204, a power component 206, a multimedia component 208, an audio component 210, an input/output (I/O) interface 212, a sensor component 214, and a communication component 216.
The processing component 202 generally controls overall operation of the device 200, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 202 may include one or more processors 220 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 202 can include one or more modules that facilitate interaction between the processing component 202 and other components. For example, the processing component 202 can include a multimedia module to facilitate interaction between the multimedia component 208 and the processing component 202.
The memory 204 is configured to store various types of data to support operations at the apparatus 200. Examples of such data include instructions for any application or method operating on the device 200, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 204 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
Power components 206 provide power to the various components of device 200. Power components 206 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for device 200.
The multimedia component 208 includes a screen that provides an output interface between the device 200 and the user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 208 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 200 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 210 is configured to output and/or input audio signals. For example, audio component 210 includes a Microphone (MIC) configured to receive external audio signals when apparatus 200 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 204 or transmitted via the communication component 216. In some embodiments, audio component 210 also includes a speaker for outputting audio signals.
The I/O interface 212 provides an interface between the processing component 202 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor component 214 includes one or more sensors for providing various aspects of status assessment for the device 200. For example, the sensor assembly 214 may detect an open/closed state of the device 200, the relative positioning of components, such as a display and keypad of the device 200, the sensor assembly 214 may also detect a change in the position of the device 200 or a component of the device 200, the presence or absence of user contact with the device 200, the orientation or acceleration/deceleration of the device 200, and a change in the temperature of the device 200. The sensor assembly 214 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 214 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 214 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 216 is configured to facilitate wired or wireless communication between the apparatus 200 and other devices. The apparatus 200 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 216 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 216 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the apparatus 200 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer readable storage medium comprising instructions, such as memory 204, comprising instructions executable by processor 220 of device 200 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
It is understood that "a plurality" in this disclosure means two or more, and other words are analogous. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. The singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It will be further understood that the terms "first," "second," and the like are used to describe various information and that such information should not be limited by these terms. These terms are only used to distinguish one type of information from another and do not denote a particular order or importance. Indeed, the terms "first," "second," and the like are fully interchangeable. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present disclosure.
It will be further understood that, unless otherwise specified, "connected" includes direct connections between the two without the presence of other elements, as well as indirect connections between the two with the presence of other elements.
It is further to be understood that while operations are depicted in the drawings in a particular order, this is not to be understood as requiring that such operations be performed in the particular order shown or in serial order, or that all illustrated operations be performed, to achieve desirable results. In certain environments, multitasking and parallel processing may be advantageous.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (12)

1. A text translation method, characterized in that the text translation method comprises:
acquiring a text to be translated, and identifying a target entity word and an abbreviation corresponding to the target entity word in the text to be translated;
replacing all the abbreviations with the target entity words to obtain a first text corresponding to the text to be translated;
and determining a translation result of the text to be translated based on the first text.
2. The text translation method according to claim 1, wherein determining a translation result of the text to be translated based on the first text comprises:
replacing the target entity words in the first text by the same replacement symbol to obtain a second text corresponding to the text to be translated;
translating other texts in the second text except the replacing symbol to obtain a first translation result, and translating the target entity word to obtain a translation result of the target entity word;
and replacing the replacement symbol in the first translation result with the translation result of the target entity word to obtain the final translation result of the text to be translated.
3. The text translation method according to claim 1 or 2, wherein the identifying a target entity word included in the text to be translated and an abbreviation corresponding to the target entity word comprises:
determining rules for identifying the target entity words and abbreviations corresponding to the target entity words;
and identifying the target entity words and the abbreviations corresponding to the target entity words included in the text to be translated based on the rules.
4. The text translation method according to claim 1 or 2, wherein the identifying a target entity word included in the text to be translated and an abbreviation corresponding to the target entity word comprises:
determining the target entity words and the abbreviations corresponding to the target entity words included in the text to be translated based on the reference resolution model, and/or
And determining the target entity words and the abbreviations corresponding to the target entity words in the text to be translated based on the corresponding relation between the entity words and the abbreviations.
5. The text translation method according to claim 1, wherein the target entity word includes a person name, the person name includes a first type of person name and a second type of person name, the person name includes a first part and a second part;
the identifying of the target entity word and the abbreviation corresponding to the target entity word included in the text to be translated comprises:
determining the name of the person included in the text to be translated based on a regular expression for determining the name of the person;
if the person name is the first type of person name and the first part of the first type of person name exists in the text to be translated, determining to identify an abbreviation corresponding to the first type of person name;
and if the name is the second type name and the second part of the second type name exists in the text to be translated, determining to identify the abbreviation corresponding to the second type name.
6. A text translation apparatus, characterized in that the text translation apparatus comprises:
the acquisition module is used for acquiring a text to be translated;
the recognition module is used for recognizing target entity words and abbreviations corresponding to the target entity words in the text to be translated;
and the determining module is used for replacing all the abbreviations with the target entity words to obtain a first text corresponding to the text to be translated, and determining a translation result of the text to be translated based on the first text.
7. The text translation device according to claim 6, wherein the determining module determines the translation result of the text to be translated based on the first text in the following manner:
replacing the target entity words in the first text by the same replacement symbol to obtain a second text corresponding to the text to be translated;
translating other texts in the second text except the replacing symbol to obtain a first translation result, and translating the target entity word to obtain a translation result of the target entity word;
and replacing the replacement symbol in the first translation result with the translation result of the target entity word to obtain the final translation result of the text to be translated.
8. The text translation apparatus according to claim 6 or 7, wherein the recognition module recognizes the target entity word included in the text to be translated and the abbreviation corresponding to the target entity word by:
determining rules for identifying the target entity words and abbreviations corresponding to the target entity words;
and identifying the target entity words and the abbreviations corresponding to the target entity words included in the text to be translated based on the rules.
9. The text translation apparatus according to claim 6 or 7, wherein the recognition module recognizes the target entity word included in the text to be translated and the abbreviation corresponding to the target entity word by:
determining the target entity words and the abbreviations corresponding to the target entity words included in the text to be translated based on the reference resolution model, and/or
And determining the target entity words and the abbreviations corresponding to the target entity words in the text to be translated based on the corresponding relation between the entity words and the abbreviations.
10. The text translation apparatus according to claim 6, wherein the target entity word includes a person name including a first type of person name and a second type of person name, the person name including a first part and a second part, wherein,
the recognition module recognizes a target entity word and an abbreviation corresponding to the target entity word included in the text to be translated in the following way:
determining the name of the person included in the text to be translated based on a regular expression for determining the name of the person;
if the person name is the first type of person name and the first part of the first type of person name exists in the text to be translated, determining to identify an abbreviation corresponding to the first type of person name;
and if the name is the second type name and the second part of the second type name exists in the text to be translated, determining to identify the abbreviation corresponding to the second type name.
11. A text translation apparatus, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to: performing the text translation method of any one of claims 1 to 5.
12. A non-transitory computer-readable storage medium, wherein instructions in the storage medium, when executed by a processor of a mobile terminal, enable the mobile terminal to perform the text translation method of any one of claims 1 to 5.
CN202110226769.5A 2021-03-01 2021-03-01 Text translation method, text translation device and storage medium Pending CN113239707A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110226769.5A CN113239707A (en) 2021-03-01 2021-03-01 Text translation method, text translation device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110226769.5A CN113239707A (en) 2021-03-01 2021-03-01 Text translation method, text translation device and storage medium

Publications (1)

Publication Number Publication Date
CN113239707A true CN113239707A (en) 2021-08-10

Family

ID=77130293

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110226769.5A Pending CN113239707A (en) 2021-03-01 2021-03-01 Text translation method, text translation device and storage medium

Country Status (1)

Country Link
CN (1) CN113239707A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116108862A (en) * 2023-04-07 2023-05-12 北京澜舟科技有限公司 Chapter-level machine translation model construction method, chapter-level machine translation model construction system and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1625739A (en) * 2001-12-21 2005-06-08 埃里·阿博 Content conversion method and apparatus
US20190251174A1 (en) * 2018-02-12 2019-08-15 Samsung Electronics Co., Ltd. Machine translation method and apparatus
CN111368531A (en) * 2020-03-09 2020-07-03 腾讯科技(深圳)有限公司 Translation text processing method and device, computer equipment and storage medium
US10839164B1 (en) * 2018-10-01 2020-11-17 Iqvia Inc. Automated translation of clinical trial documents
CN112084796A (en) * 2020-09-15 2020-12-15 南京文图景信息科技有限公司 Multi-language place name root Chinese translation method based on Transformer deep learning model

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1625739A (en) * 2001-12-21 2005-06-08 埃里·阿博 Content conversion method and apparatus
US20190251174A1 (en) * 2018-02-12 2019-08-15 Samsung Electronics Co., Ltd. Machine translation method and apparatus
US10839164B1 (en) * 2018-10-01 2020-11-17 Iqvia Inc. Automated translation of clinical trial documents
CN111368531A (en) * 2020-03-09 2020-07-03 腾讯科技(深圳)有限公司 Translation text processing method and device, computer equipment and storage medium
CN112084796A (en) * 2020-09-15 2020-12-15 南京文图景信息科技有限公司 Multi-language place name root Chinese translation method based on Transformer deep learning model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
于夏薇;袁军鹏;: "融合语料库的论文作者姓名中英自动翻译研究", 情报工程, no. 01 *
刘颖;曹项;: "基于网络搜索的英汉人名翻译", 中文信息学报, no. 02 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116108862A (en) * 2023-04-07 2023-05-12 北京澜舟科技有限公司 Chapter-level machine translation model construction method, chapter-level machine translation model construction system and storage medium
CN116108862B (en) * 2023-04-07 2023-07-25 北京澜舟科技有限公司 Chapter-level machine translation model construction method, chapter-level machine translation model construction system and storage medium

Similar Documents

Publication Publication Date Title
CN111128183B (en) Speech recognition method, apparatus and medium
CN111368541A (en) Named entity identification method and device
CN107564526B (en) Processing method, apparatus and machine-readable medium
EP3734472A1 (en) Method and device for text processing
CN114154459A (en) Speech recognition text processing method and device, electronic equipment and storage medium
CN113761888A (en) Text translation method and device, computer equipment and storage medium
CN111160047A (en) Data processing method and device and data processing device
CN112036174A (en) Punctuation marking method and device
CN113673261A (en) Data generation method and device and readable storage medium
CN113239707A (en) Text translation method, text translation device and storage medium
CN111382748B (en) Image translation method, device and storage medium
CN112036195A (en) Machine translation method, device and storage medium
CN108241614B (en) Information processing method and device, and device for information processing
CN110781689B (en) Information processing method, device and storage medium
Warhade et al. English-to-sanskrit statistical machine translation with ubiquitous application
CN112149432A (en) Method and device for translating chapters by machine and storage medium
CN111414766B (en) Translation method and device
CN113971218A (en) Position coding method, position coding device and storage medium
CN113343720A (en) Subtitle translation method and device for subtitle translation
CN112199963A (en) Text processing method and device and text processing device
CN112613327A (en) Information processing method and device
CN112926343A (en) Data processing method and device and electronic equipment
CN113589948A (en) Data processing method and device and electronic equipment
CN113221581A (en) Text translation method, device and storage medium
CN115409200A (en) Database operation method, device and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination