CN114861683A - Method for separating translation text and editing Tag - Google Patents
Method for separating translation text and editing Tag Download PDFInfo
- Publication number
- CN114861683A CN114861683A CN202210430364.8A CN202210430364A CN114861683A CN 114861683 A CN114861683 A CN 114861683A CN 202210430364 A CN202210430364 A CN 202210430364A CN 114861683 A CN114861683 A CN 114861683A
- Authority
- CN
- China
- Prior art keywords
- tag
- sentence
- text
- translated
- original
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/103—Formatting, i.e. changing of presentation of documents
- G06F40/109—Font handling; Temporal or kinetic typography
Abstract
The invention relates to the field of computer-aided translation, in particular to a method for separating a translation text and editing Tag. The invention realizes the retention of the Tag information and the removal of redundant Tag symbols in the source data by converting the Tag symbol information into font color information, assigning the font color information to the corresponding characters and replacing the traditional Tag symbol with the color identifier, thereby ensuring that the process of reading and understanding the translated text by a translator is more convenient on the basis of not influencing the semantics of the original characters. Moreover, the method provided by the invention supports the translator to self-define the identification color, simplifies the editing process of Tag, and greatly reduces the learning cost of the translator.
Description
Technical Field
The invention relates to the field of computer-aided translation, in particular to a method for separating a translation text and editing Tag.
Background
The Tag is a classification identifier widely used in computer software, and is mainly used for processing typesetting formats in various file formats in the field of computer-aided translation.
However, in the process of using Tag character to identify CAT software, there still exist many problems to be solved: firstly, CAT software directly uses sentence data in a source program, and because the data is doped with Tag symbols, the semantics of continuous characters in a translation file are interrupted by the Tag symbols, so that the understanding of a translator on the semantics is influenced; secondly, because the Tag is the markup language of the computer software and is defined in advance by the CAT software, a translator needs to perform system and special learning of various tags before using the Tag for identification, which brings great learning cost to the translator.
Disclosure of Invention
In order to solve at least one of the problems mentioned in the background above, the present invention proposes a method of separating translated text from editing Tag.
A method for separating translation text and editing Tag records Tag information and original translation information by creating a Tag information table and a character information table, wherein the specific format of the Tag information table is as follows:
{ sentence ID, Tag number, start Tag, end Tag, start Tag position, text length }
The sentence ID is a number read into a sentence, the Tag number is a number corresponding to each Tag, the start Tag is a Tag for marking the beginning of the sentence, the end Tag is a Tag for marking the end of the sentence, the position of the start Tag is the position of the start Tag in the sentence, the character length is the number of characters between the start Tag and the end Tag, and when the character length is 0, the Tag is shown to exist independently;
the specific format of the text information table is as follows:
{ sentence ID, original text, translated text }
The sentence ID is a number read to the sentence, the original text is the original read text to be translated, and the translated text is the translated text.
The method specifically comprises the following steps:
in step S1, a sentence source code of the original sentence is read in the program.
Step S2, processing the read sentence source code, the steps are as follows:
step S201, reading text information and Tag information of the displayed sentence source code, specifically, including two types of Tag display formats:
< Tag > text < Tag >
< Tag > word
Step S202, recording the Tag information of the original text after reading the source code of the sentence into a Tag information table;
step S203, deleting the Tag symbol in the source code of the sentence and reserving the characters, and further saving the reserved characters in the character information table.
Step S3, customizing color identification rules and establishing a table of colors of Tag sequence numbers of the original text, the steps are as follows:
step S301, formulating a character color identification rule, specifically, the color identification rule comprises a font color rule and a font background color rule, wherein the font color rule marks a character font as different colors, and the font background color rule marks a character background as different colors;
step S302, by establishing an original Tag serial number color table, different identification colors corresponding to different serial number Tag symbols are defined, specifically, the format of the original Tag serial number color table is:
{ Tag number, identification color }
Wherein, the Tag serial number is the serial number corresponding to each Tag symbol in the Tag information table of the original text, and the identification color is the different identification colors corresponding to the different defined Tag symbols.
And step S4, replacing the Tag symbol in the source code of the original text sentence with the customized color rule, and displaying the processed original text.
Step S5, translating the original sentence into a translated sentence, marking the translated sentence with a corresponding color, and writing a source code of the translated sentence, the steps are as follows:
step S501, selecting characters needing color labeling in the translated sentence, and correspondingly labeling the colors to the translated sentence according to the colors corresponding to the original characters in the Tag serial number color table;
step S502, analyzing the Tag symbol and the position information of the characters in the source code of the original sentence, comparing the Tag information table of the original sentence to obtain the Tag symbol sequence number comparison table of the original sentence and the translated sentence,
specifically, the format of the Tag symbol number comparison table is as follows:
{ translated text Tag number, original text Tag number }
Wherein, the Tag serial number of the translated text is the Tag serial number in the translated text after translation, and the Tag serial number of the original text is the Tag serial number in the original text;
step S503, converting the Tag information of the translated sentence into a Tag information table of the translated sentence according to the Tag symbol sequence number comparison table, the Tag information table of the original sentence and the color and character position information of the translated sentence;
step S504, writing the source code of the translated sentence according to the Tag information table and the character information table of the translated sentence.
The invention provides a method for separating a translation text and editing Tag, which has the following beneficial effects compared with the prior art:
the invention realizes the retention of the Tag information and the removal of redundant Tag symbols in the source data by converting the Tag symbol information into font color information, assigning the font color information to the corresponding characters and replacing the traditional Tag symbol with the color identifier, thereby ensuring that the process of reading and understanding the translated text by a translator is more convenient on the basis of not influencing the semantics of the original characters.
The method provided by the invention supports the translator to self-define the identification color, simplifies the editing process of Tag, and greatly reduces the learning cost of the translator.
Drawings
FIG. 1 is a flow chart of the method of the present invention for separating translated text and editing Tag.
FIG. 2 is an explanatory diagram of a Tag information table in the embodiment of the present invention.
FIG. 3 is an explanatory diagram of a text information table in the embodiment of the present invention.
FIG. 4 is a diagram illustrating a table of colors of Tag numbers of original texts in an embodiment of the present invention.
FIG. 5 is a diagram illustrating a Tag number comparison table according to an embodiment of the present invention.
Detailed Description
In order to make the objects and features of the present invention more apparent and understandable, the present invention will be described in detail below with reference to embodiments and the accompanying drawings.
A method for separating translated text and editing Tag records Tag information and original translation information by creating a Tag information table and a text information table, as shown in fig. 2, the specific format of the Tag information table is:
{ sentence ID, Tag number, start Tag, end Tag, start Tag position, text length }
The sentence ID is a number read into a sentence, the Tag number is a number corresponding to each Tag, the start Tag is a Tag for marking the beginning of the sentence, the end Tag is a Tag for marking the end of the sentence, the position of the start Tag is the position of the start Tag in the sentence, the character length is the number of characters between the start Tag and the end Tag, and when the character length is 0, the Tag is shown to exist independently;
as shown in fig. 3, the specific format of the text information table is:
{ sentence ID, original text, translated text }
The sentence ID is a number read to the sentence, the original text is the original read text to be translated, and the translated text is the translated text.
As shown in fig. 1, the method for separating translated text and editing Tag provided by the present invention specifically includes the steps of:
in step S1, a sentence source code of the original sentence is read in the program.
Step S2, processing the source code of the sentence read, the process is as follows:
firstly, reading character information and Tag information of a displayed sentence source code, specifically, two types of Tag display formats are included:
< Tag > text < Tag >
< Tag > word
Then, the original text Tag information after reading the sentence source code is recorded in the Tag information table.
And finally, deleting the Tag symbol in the sentence source code and reserving the characters, and further, storing the reserved characters into the character information table.
Step S3, customizing color identification rules and establishing an original text Tag sequence number color table, the process is as follows:
firstly, a character color identification rule is formulated, specifically, the color identification rule comprises a font color rule and a font background color rule, wherein the font color rule is used for marking a character font into different colors, and the font background color rule is used for marking a character background into different colors.
Then, by establishing an original text Tag serial number color table, different identification colors corresponding to different serial number Tag symbols are defined, as shown in fig. 4, the format of the original text Tag serial number color table is:
{ Tag number, identification color }
Wherein, the Tag serial number is the serial number corresponding to each Tag symbol in the Tag information table of the original text, and the identification color is the different identification colors corresponding to the different defined Tag symbols.
And step S4, replacing the Tag symbol in the source code of the original text sentence with the customized color rule, and displaying the processed original text.
Step S5, translating the original sentence into a translated sentence, marking the translated sentence with a corresponding color, and writing a source code of the translated sentence, the process is as follows:
firstly, selecting characters needing to be marked with colors in a translated sentence, and correspondingly marking the colors to the translated characters according to the colors corresponding to the original characters in the Tag serial number color table.
Then, analyzing the Tag symbol and the position information of the characters in the source code of the original sentence, comparing the Tag information table of the original sentence to obtain the Tag symbol sequence number comparison table of the original sentence and the translated sentence,
as shown in FIG. 5, the format of the Tag identifier number table is:
{ translated sentence Tag number, original sentence Tag number }
Wherein, the Tag serial number of the translated text is the Tag serial number in the translated text after translation, and the Tag serial number of the original text is the Tag serial number in the original text.
And then, converting the Tag information of the translated text into a Tag information table of the translated text according to the Tag symbol number comparison table, the Tag information table of the original text and the color and character position information of the translated text.
And finally, writing the source code of the translated sentence according to the Tag information table and the character information table of the translated sentence.
Thus, the working process of the present invention has been carried out once according to the method disclosed herein.
While the invention has been described with reference to a specific embodiment, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted for elements thereof without departing from the scope of the invention.
Claims (7)
1. A method for separating translation text and editing Tag is characterized in that Tag information and original translation text information are recorded by creating a Tag information table and a character information table, and the method specifically comprises the following steps:
step S1, reading sentence source codes of the original sentences in the program;
step S2, processing the source code of the sentence read;
step S3, customizing color identification rules and establishing an original text Tag serial number color table;
step S4, replacing Tag symbol in source code of sentence of original text with customized color rule, and displaying processed original text;
and step S5, translating the original sentence into a translated sentence, marking corresponding colors on the translated sentence, and writing the source code of the translated sentence.
2. The method for separating and translating text and editing Tag of claim 1, wherein the specific format of the Tag information table is as follows:
{ sentence ID, Tag number, start Tag, end Tag, start Tag position, text length }
Wherein, the sentence ID is a number read to the sentence, the Tag number is a number corresponding to each Tag symbol, the start Tag symbol is a Tag symbol for marking the beginning of the sentence, the end Tag symbol is a Tag symbol for marking the end of the sentence, the position of the start Tag symbol is the position of the start Tag symbol in the sentence, the character length is the number of characters of the character between the start Tag symbol and the end Tag symbol, when the character length is 0, the Tag is shown to exist alone,
the specific format of the text information table is as follows:
{ sentence ID, original text, translated text }
The sentence ID is a number read to the sentence, the original text is the original read text to be translated, and the translated text is the translated text.
3. The method for separating translated text and editing Tag of claim 1, wherein the step S2 comprises the steps of:
step S201, reading text information and Tag information of the displayed sentence source code, specifically, including two types of Tag display formats:
< Tag > text < Tag >
< Tag > word
Step S202, recording the Tag information of the original text after reading the source code of the sentence into a Tag information table;
step S203, deleting the Tag symbol in the source code of the sentence and reserving the characters, and further saving the reserved characters in the character information table.
4. The method for separating translated text and editing Tag of claim 1, wherein the step S3 comprises the steps of:
step S301, formulating a character color identification rule;
step S302, different identification colors corresponding to different sequence number Tag symbols are defined by establishing an original text Tag sequence number color table.
5. The method of claim 4, wherein the step S301 defines a text color rule, and specifically the color rule includes a font color rule and a font background color rule, wherein the font color rule is to mark a text font as a different color, and the font background color rule is to mark a text background as a different color.
6. The method of claim 4, wherein the establishing of the table of colors of sequence numbers of Tag of the original text in step S302 is specifically performed in the following format:
{ Tag number, identification color }
Wherein, the Tag serial number is the serial number corresponding to each Tag symbol in the Tag information table of the original text, and the identification color is the different identification colors corresponding to the different defined Tag symbols.
7. The method for separating translated text and editing Tag of claim 1, wherein the step S5 comprises the steps of:
step S501, selecting characters needing color labeling in the translated sentence, and correspondingly labeling the colors to the translated characters according to the colors corresponding to the original characters in the Tag serial number color table;
step S502, analyzing the Tag symbol and the position information of the characters in the source code of the original sentence, comparing the Tag information table of the original sentence to obtain the Tag symbol sequence number comparison table of the original sentence and the translated sentence,
specifically, the format of the Tag identifier number comparison table is as follows:
{ translated text Tag number, original text Tag number }
Wherein, the Tag serial number of the translated text is the Tag serial number in the translated text after translation, and the Tag serial number of the original text is the Tag serial number in the original text;
step S503, converting the Tag information of the translated sentence into a Tag information table of the translated sentence according to the Tag symbol sequence number comparison table, the Tag information table of the original sentence and the color and character position information of the translated sentence;
step S504, writing the source code of the translated sentence according to the Tag information table and the character information table of the translated sentence.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210430364.8A CN114861683A (en) | 2022-04-22 | 2022-04-22 | Method for separating translation text and editing Tag |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210430364.8A CN114861683A (en) | 2022-04-22 | 2022-04-22 | Method for separating translation text and editing Tag |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114861683A true CN114861683A (en) | 2022-08-05 |
Family
ID=82633675
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210430364.8A Pending CN114861683A (en) | 2022-04-22 | 2022-04-22 | Method for separating translation text and editing Tag |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114861683A (en) |
-
2022
- 2022-04-22 CN CN202210430364.8A patent/CN114861683A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5893134A (en) | Aligning source texts of different natural languages to produce or add to an aligned corpus | |
CN108595389B (en) | Method for converting Word document into txt plain text document | |
JPS59152485A (en) | Electronic font management | |
CN1841364A (en) | Document translation method and document translation device | |
CN103093252B (en) | Information output apparatus and information output method | |
US8908209B2 (en) | Altering a PDF print job based upon criteria stored in memory of a printing system | |
JP2007164705A (en) | Method and program for converting computerized document | |
CN110609980B (en) | Arabic display method | |
CN114861683A (en) | Method for separating translation text and editing Tag | |
CN109902299B (en) | Text processing method and device | |
JP4807618B2 (en) | Image processing apparatus and image processing program | |
US20230004706A1 (en) | Device Dependent Rendering of PDF Content Including Multiple Articles and a Table of Contents | |
JPH10162098A (en) | Document electrolyzing device and method therefor | |
CN112686000B (en) | Format conversion method of electronic book document, electronic equipment and storage medium | |
JP2017091024A (en) | Input assistance device | |
CN112347765A (en) | Entity labeling method, module and device based on dictionary matching | |
CN112965772A (en) | Web page display method and device and electronic equipment | |
CN111523307A (en) | Online translation new word note generation system based on symbolic marks | |
JP2011060268A (en) | Image processing apparatus and program | |
JPH0883280A (en) | Document processor | |
CN114328804A (en) | Method and system for searching key words containing character pictures | |
US11416671B2 (en) | Device dependent rendering of PDF content | |
US11797750B2 (en) | Methods and systems for identifying and tagging outlined or converted text | |
JPH06119328A (en) | Document edition processing system | |
CN114186549A (en) | Docx document service processing and data utilization system and method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication |