CN114861683A - Method for separating translation text and editing Tag - Google Patents

Method for separating translation text and editing Tag Download PDF

Info

Publication number
CN114861683A
CN114861683A CN202210430364.8A CN202210430364A CN114861683A CN 114861683 A CN114861683 A CN 114861683A CN 202210430364 A CN202210430364 A CN 202210430364A CN 114861683 A CN114861683 A CN 114861683A
Authority
CN
China
Prior art keywords
tag
sentence
text
translated
original
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210430364.8A
Other languages
Chinese (zh)
Inventor
吴志武
何其恬
胡洌波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Yiyou Network Technology Co ltd
Original Assignee
Hangzhou Yiyou Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Yiyou Network Technology Co ltd filed Critical Hangzhou Yiyou Network Technology Co ltd
Priority to CN202210430364.8A priority Critical patent/CN114861683A/en
Publication of CN114861683A publication Critical patent/CN114861683A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/109Font handling; Temporal or kinetic typography

Abstract

The invention relates to the field of computer-aided translation, in particular to a method for separating a translation text and editing Tag. The invention realizes the retention of the Tag information and the removal of redundant Tag symbols in the source data by converting the Tag symbol information into font color information, assigning the font color information to the corresponding characters and replacing the traditional Tag symbol with the color identifier, thereby ensuring that the process of reading and understanding the translated text by a translator is more convenient on the basis of not influencing the semantics of the original characters. Moreover, the method provided by the invention supports the translator to self-define the identification color, simplifies the editing process of Tag, and greatly reduces the learning cost of the translator.

Description

Method for separating translation text and editing Tag
Technical Field
The invention relates to the field of computer-aided translation, in particular to a method for separating a translation text and editing Tag.
Background
The Tag is a classification identifier widely used in computer software, and is mainly used for processing typesetting formats in various file formats in the field of computer-aided translation.
However, in the process of using Tag character to identify CAT software, there still exist many problems to be solved: firstly, CAT software directly uses sentence data in a source program, and because the data is doped with Tag symbols, the semantics of continuous characters in a translation file are interrupted by the Tag symbols, so that the understanding of a translator on the semantics is influenced; secondly, because the Tag is the markup language of the computer software and is defined in advance by the CAT software, a translator needs to perform system and special learning of various tags before using the Tag for identification, which brings great learning cost to the translator.
Disclosure of Invention
In order to solve at least one of the problems mentioned in the background above, the present invention proposes a method of separating translated text from editing Tag.
A method for separating translation text and editing Tag records Tag information and original translation information by creating a Tag information table and a character information table, wherein the specific format of the Tag information table is as follows:
{ sentence ID, Tag number, start Tag, end Tag, start Tag position, text length }
The sentence ID is a number read into a sentence, the Tag number is a number corresponding to each Tag, the start Tag is a Tag for marking the beginning of the sentence, the end Tag is a Tag for marking the end of the sentence, the position of the start Tag is the position of the start Tag in the sentence, the character length is the number of characters between the start Tag and the end Tag, and when the character length is 0, the Tag is shown to exist independently;
the specific format of the text information table is as follows:
{ sentence ID, original text, translated text }
The sentence ID is a number read to the sentence, the original text is the original read text to be translated, and the translated text is the translated text.
The method specifically comprises the following steps:
in step S1, a sentence source code of the original sentence is read in the program.
Step S2, processing the read sentence source code, the steps are as follows:
step S201, reading text information and Tag information of the displayed sentence source code, specifically, including two types of Tag display formats:
< Tag > text < Tag >
< Tag > word
Step S202, recording the Tag information of the original text after reading the source code of the sentence into a Tag information table;
step S203, deleting the Tag symbol in the source code of the sentence and reserving the characters, and further saving the reserved characters in the character information table.
Step S3, customizing color identification rules and establishing a table of colors of Tag sequence numbers of the original text, the steps are as follows:
step S301, formulating a character color identification rule, specifically, the color identification rule comprises a font color rule and a font background color rule, wherein the font color rule marks a character font as different colors, and the font background color rule marks a character background as different colors;
step S302, by establishing an original Tag serial number color table, different identification colors corresponding to different serial number Tag symbols are defined, specifically, the format of the original Tag serial number color table is:
{ Tag number, identification color }
Wherein, the Tag serial number is the serial number corresponding to each Tag symbol in the Tag information table of the original text, and the identification color is the different identification colors corresponding to the different defined Tag symbols.
And step S4, replacing the Tag symbol in the source code of the original text sentence with the customized color rule, and displaying the processed original text.
Step S5, translating the original sentence into a translated sentence, marking the translated sentence with a corresponding color, and writing a source code of the translated sentence, the steps are as follows:
step S501, selecting characters needing color labeling in the translated sentence, and correspondingly labeling the colors to the translated sentence according to the colors corresponding to the original characters in the Tag serial number color table;
step S502, analyzing the Tag symbol and the position information of the characters in the source code of the original sentence, comparing the Tag information table of the original sentence to obtain the Tag symbol sequence number comparison table of the original sentence and the translated sentence,
specifically, the format of the Tag symbol number comparison table is as follows:
{ translated text Tag number, original text Tag number }
Wherein, the Tag serial number of the translated text is the Tag serial number in the translated text after translation, and the Tag serial number of the original text is the Tag serial number in the original text;
step S503, converting the Tag information of the translated sentence into a Tag information table of the translated sentence according to the Tag symbol sequence number comparison table, the Tag information table of the original sentence and the color and character position information of the translated sentence;
step S504, writing the source code of the translated sentence according to the Tag information table and the character information table of the translated sentence.
The invention provides a method for separating a translation text and editing Tag, which has the following beneficial effects compared with the prior art:
the invention realizes the retention of the Tag information and the removal of redundant Tag symbols in the source data by converting the Tag symbol information into font color information, assigning the font color information to the corresponding characters and replacing the traditional Tag symbol with the color identifier, thereby ensuring that the process of reading and understanding the translated text by a translator is more convenient on the basis of not influencing the semantics of the original characters.
The method provided by the invention supports the translator to self-define the identification color, simplifies the editing process of Tag, and greatly reduces the learning cost of the translator.
Drawings
FIG. 1 is a flow chart of the method of the present invention for separating translated text and editing Tag.
FIG. 2 is an explanatory diagram of a Tag information table in the embodiment of the present invention.
FIG. 3 is an explanatory diagram of a text information table in the embodiment of the present invention.
FIG. 4 is a diagram illustrating a table of colors of Tag numbers of original texts in an embodiment of the present invention.
FIG. 5 is a diagram illustrating a Tag number comparison table according to an embodiment of the present invention.
Detailed Description
In order to make the objects and features of the present invention more apparent and understandable, the present invention will be described in detail below with reference to embodiments and the accompanying drawings.
A method for separating translated text and editing Tag records Tag information and original translation information by creating a Tag information table and a text information table, as shown in fig. 2, the specific format of the Tag information table is:
{ sentence ID, Tag number, start Tag, end Tag, start Tag position, text length }
The sentence ID is a number read into a sentence, the Tag number is a number corresponding to each Tag, the start Tag is a Tag for marking the beginning of the sentence, the end Tag is a Tag for marking the end of the sentence, the position of the start Tag is the position of the start Tag in the sentence, the character length is the number of characters between the start Tag and the end Tag, and when the character length is 0, the Tag is shown to exist independently;
as shown in fig. 3, the specific format of the text information table is:
{ sentence ID, original text, translated text }
The sentence ID is a number read to the sentence, the original text is the original read text to be translated, and the translated text is the translated text.
As shown in fig. 1, the method for separating translated text and editing Tag provided by the present invention specifically includes the steps of:
in step S1, a sentence source code of the original sentence is read in the program.
Step S2, processing the source code of the sentence read, the process is as follows:
firstly, reading character information and Tag information of a displayed sentence source code, specifically, two types of Tag display formats are included:
< Tag > text < Tag >
< Tag > word
Then, the original text Tag information after reading the sentence source code is recorded in the Tag information table.
And finally, deleting the Tag symbol in the sentence source code and reserving the characters, and further, storing the reserved characters into the character information table.
Step S3, customizing color identification rules and establishing an original text Tag sequence number color table, the process is as follows:
firstly, a character color identification rule is formulated, specifically, the color identification rule comprises a font color rule and a font background color rule, wherein the font color rule is used for marking a character font into different colors, and the font background color rule is used for marking a character background into different colors.
Then, by establishing an original text Tag serial number color table, different identification colors corresponding to different serial number Tag symbols are defined, as shown in fig. 4, the format of the original text Tag serial number color table is:
{ Tag number, identification color }
Wherein, the Tag serial number is the serial number corresponding to each Tag symbol in the Tag information table of the original text, and the identification color is the different identification colors corresponding to the different defined Tag symbols.
And step S4, replacing the Tag symbol in the source code of the original text sentence with the customized color rule, and displaying the processed original text.
Step S5, translating the original sentence into a translated sentence, marking the translated sentence with a corresponding color, and writing a source code of the translated sentence, the process is as follows:
firstly, selecting characters needing to be marked with colors in a translated sentence, and correspondingly marking the colors to the translated characters according to the colors corresponding to the original characters in the Tag serial number color table.
Then, analyzing the Tag symbol and the position information of the characters in the source code of the original sentence, comparing the Tag information table of the original sentence to obtain the Tag symbol sequence number comparison table of the original sentence and the translated sentence,
as shown in FIG. 5, the format of the Tag identifier number table is:
{ translated sentence Tag number, original sentence Tag number }
Wherein, the Tag serial number of the translated text is the Tag serial number in the translated text after translation, and the Tag serial number of the original text is the Tag serial number in the original text.
And then, converting the Tag information of the translated text into a Tag information table of the translated text according to the Tag symbol number comparison table, the Tag information table of the original text and the color and character position information of the translated text.
And finally, writing the source code of the translated sentence according to the Tag information table and the character information table of the translated sentence.
Thus, the working process of the present invention has been carried out once according to the method disclosed herein.
While the invention has been described with reference to a specific embodiment, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted for elements thereof without departing from the scope of the invention.

Claims (7)

1. A method for separating translation text and editing Tag is characterized in that Tag information and original translation text information are recorded by creating a Tag information table and a character information table, and the method specifically comprises the following steps:
step S1, reading sentence source codes of the original sentences in the program;
step S2, processing the source code of the sentence read;
step S3, customizing color identification rules and establishing an original text Tag serial number color table;
step S4, replacing Tag symbol in source code of sentence of original text with customized color rule, and displaying processed original text;
and step S5, translating the original sentence into a translated sentence, marking corresponding colors on the translated sentence, and writing the source code of the translated sentence.
2. The method for separating and translating text and editing Tag of claim 1, wherein the specific format of the Tag information table is as follows:
{ sentence ID, Tag number, start Tag, end Tag, start Tag position, text length }
Wherein, the sentence ID is a number read to the sentence, the Tag number is a number corresponding to each Tag symbol, the start Tag symbol is a Tag symbol for marking the beginning of the sentence, the end Tag symbol is a Tag symbol for marking the end of the sentence, the position of the start Tag symbol is the position of the start Tag symbol in the sentence, the character length is the number of characters of the character between the start Tag symbol and the end Tag symbol, when the character length is 0, the Tag is shown to exist alone,
the specific format of the text information table is as follows:
{ sentence ID, original text, translated text }
The sentence ID is a number read to the sentence, the original text is the original read text to be translated, and the translated text is the translated text.
3. The method for separating translated text and editing Tag of claim 1, wherein the step S2 comprises the steps of:
step S201, reading text information and Tag information of the displayed sentence source code, specifically, including two types of Tag display formats:
< Tag > text < Tag >
< Tag > word
Step S202, recording the Tag information of the original text after reading the source code of the sentence into a Tag information table;
step S203, deleting the Tag symbol in the source code of the sentence and reserving the characters, and further saving the reserved characters in the character information table.
4. The method for separating translated text and editing Tag of claim 1, wherein the step S3 comprises the steps of:
step S301, formulating a character color identification rule;
step S302, different identification colors corresponding to different sequence number Tag symbols are defined by establishing an original text Tag sequence number color table.
5. The method of claim 4, wherein the step S301 defines a text color rule, and specifically the color rule includes a font color rule and a font background color rule, wherein the font color rule is to mark a text font as a different color, and the font background color rule is to mark a text background as a different color.
6. The method of claim 4, wherein the establishing of the table of colors of sequence numbers of Tag of the original text in step S302 is specifically performed in the following format:
{ Tag number, identification color }
Wherein, the Tag serial number is the serial number corresponding to each Tag symbol in the Tag information table of the original text, and the identification color is the different identification colors corresponding to the different defined Tag symbols.
7. The method for separating translated text and editing Tag of claim 1, wherein the step S5 comprises the steps of:
step S501, selecting characters needing color labeling in the translated sentence, and correspondingly labeling the colors to the translated characters according to the colors corresponding to the original characters in the Tag serial number color table;
step S502, analyzing the Tag symbol and the position information of the characters in the source code of the original sentence, comparing the Tag information table of the original sentence to obtain the Tag symbol sequence number comparison table of the original sentence and the translated sentence,
specifically, the format of the Tag identifier number comparison table is as follows:
{ translated text Tag number, original text Tag number }
Wherein, the Tag serial number of the translated text is the Tag serial number in the translated text after translation, and the Tag serial number of the original text is the Tag serial number in the original text;
step S503, converting the Tag information of the translated sentence into a Tag information table of the translated sentence according to the Tag symbol sequence number comparison table, the Tag information table of the original sentence and the color and character position information of the translated sentence;
step S504, writing the source code of the translated sentence according to the Tag information table and the character information table of the translated sentence.
CN202210430364.8A 2022-04-22 2022-04-22 Method for separating translation text and editing Tag Pending CN114861683A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210430364.8A CN114861683A (en) 2022-04-22 2022-04-22 Method for separating translation text and editing Tag

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210430364.8A CN114861683A (en) 2022-04-22 2022-04-22 Method for separating translation text and editing Tag

Publications (1)

Publication Number Publication Date
CN114861683A true CN114861683A (en) 2022-08-05

Family

ID=82633675

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210430364.8A Pending CN114861683A (en) 2022-04-22 2022-04-22 Method for separating translation text and editing Tag

Country Status (1)

Country Link
CN (1) CN114861683A (en)

Similar Documents

Publication Publication Date Title
US5893134A (en) Aligning source texts of different natural languages to produce or add to an aligned corpus
CN108595389B (en) Method for converting Word document into txt plain text document
JPS59152485A (en) Electronic font management
CN1841364A (en) Document translation method and document translation device
CN103093252B (en) Information output apparatus and information output method
US8908209B2 (en) Altering a PDF print job based upon criteria stored in memory of a printing system
JP2007164705A (en) Method and program for converting computerized document
CN110609980B (en) Arabic display method
CN114861683A (en) Method for separating translation text and editing Tag
CN109902299B (en) Text processing method and device
JP4807618B2 (en) Image processing apparatus and image processing program
US20230004706A1 (en) Device Dependent Rendering of PDF Content Including Multiple Articles and a Table of Contents
JPH10162098A (en) Document electrolyzing device and method therefor
CN112686000B (en) Format conversion method of electronic book document, electronic equipment and storage medium
JP2017091024A (en) Input assistance device
CN112347765A (en) Entity labeling method, module and device based on dictionary matching
CN112965772A (en) Web page display method and device and electronic equipment
CN111523307A (en) Online translation new word note generation system based on symbolic marks
JP2011060268A (en) Image processing apparatus and program
JPH0883280A (en) Document processor
CN114328804A (en) Method and system for searching key words containing character pictures
US11416671B2 (en) Device dependent rendering of PDF content
US11797750B2 (en) Methods and systems for identifying and tagging outlined or converted text
JPH06119328A (en) Document edition processing system
CN114186549A (en) Docx document service processing and data utilization system and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication