CN112287652A - Method, system and device for translating formatted pictures and texts - Google Patents
Method, system and device for translating formatted pictures and texts Download PDFInfo
- Publication number
- CN112287652A CN112287652A CN202011493135.8A CN202011493135A CN112287652A CN 112287652 A CN112287652 A CN 112287652A CN 202011493135 A CN202011493135 A CN 202011493135A CN 112287652 A CN112287652 A CN 112287652A
- Authority
- CN
- China
- Prior art keywords
- translation
- text
- template
- format
- translating
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 46
- 238000013519 translation Methods 0.000 claims abstract description 108
- 230000001915 proofreading effect Effects 0.000 claims description 16
- 238000004590 computer program Methods 0.000 claims description 13
- 238000012937 correction Methods 0.000 claims description 7
- 238000000605 extraction Methods 0.000 claims description 4
- 230000009286 beneficial effect Effects 0.000 abstract description 2
- 230000014616 translation Effects 0.000 description 78
- 238000010586 diagram Methods 0.000 description 11
- 230000006870 function Effects 0.000 description 5
- 238000012545 processing Methods 0.000 description 4
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
- G06F40/186—Templates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/137—Hierarchical processing, e.g. outlines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
- Document Processing Apparatus (AREA)
Abstract
The invention provides a method, a system and a device for translating formatted pictures and texts, wherein the translation method comprises the following processes: acquiring a picture-text file with a format; identifying characters and language types in the image-text file and restoring the characters into original texts; identifying the format type of the image-text file and determining a matched translation template according to the format type; translating the original text according to the language type to obtain a translated text; and filling the translation into the selected translation template to obtain a translation final draft. The invention has the following beneficial effects: the translation workload of the user side and the time for editing the format of the translated text file can be greatly reduced, so that the user only needs to pay attention to the translation result without investing excessive energy and time, the process for editing the format of the file by the user side is simplified, and the translation quality is improved.
Description
Technical Field
The invention belongs to the technical field of intelligent translation, and particularly relates to a method, a system and a device for translating formatted pictures and texts.
Background
The existing translation software can only realize the function of translating text sentences, and for image-text files, the existing translation software cannot extract original texts from pictures and cannot automatically generate formatted translations after translation. Therefore, the user needs to extract the original text by himself/herself, translate the original text into a translation, and finally manually fill the translation into the translation template and edit the document format, which seriously affects the work efficiency.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the method, the system and the device for translating the format graphics and texts can greatly reduce the translation workload of a user side and the time for editing the format of a translated text file.
The invention is realized by the following steps: a method for translating formatted pictures and texts comprises the following processes:
acquiring a picture-text file with a format;
identifying characters and language types in the image-text file and restoring the characters into original texts;
identifying the format type of the image-text file and determining a matched translation template according to the format type;
translating the original text according to the language type to obtain a translated text;
and filling the translation into the selected translation template to obtain a translation final draft.
Further, the step of identifying the text in the teletext file and restoring the text to the original text comprises the following steps:
performing OCR character recognition on the image-text file to recognize characters and language types thereof;
extracting the recognized characters;
and carrying out sentence and paragraph division on the extracted characters to obtain the original text.
Further, the specific process of the step of obtaining the original text by performing sentence and paragraph division on the extracted characters is as follows:
the extracted characters form sentences and paragraphs from top to bottom and from left to right.
Further, the method also comprises the following steps:
and correcting the original text and taking the corrected result as the original text to be translated.
Further, the step of identifying the format type of the teletext and determining a matching translation template according to the format type comprises the following processes:
identifying the format type of the image-text file;
selecting the latest template in a template library according to the format type;
and identifying the format of the translation file corresponding to the latest template and using the format as a translation template of the image-text file.
Further, the method also comprises the following steps:
and checking the translation and taking the checking result as the translation to be filled into the translation template.
Based on the same inventive concept, the invention also provides a system for translating the formatted pictures and texts, which comprises the following components:
the acquisition module is used for acquiring the image-text file with the format;
the original text generation module is used for identifying characters and language types in the image-text file and restoring the characters into original text;
the template generation module is used for identifying the format type of the image-text file and determining a matched translation template according to the format type;
the translation module is used for translating the original text according to the language type to obtain a translated text;
and the translation generation module is used for filling the translation into the selected translation template to obtain a translation final draft.
Further, the original text generation module includes:
the character recognition submodule is used for carrying out OCR character recognition on the image-text file and recognizing characters and language types thereof;
the character extraction submodule is used for extracting the identified characters;
and the original text generation submodule is used for carrying out sentence and paragraph division on the extracted characters to obtain an original text.
Further, still include:
and the original text correction module is used for correcting the original text and taking a correction result as input data of the translation module.
Further, the template generation module includes:
the format identification submodule is used for identifying the format type of the image-text file;
the template selection submodule is used for selecting the latest template in the template library according to the format type;
and the template generating sub-module is used for identifying the format of the translation file corresponding to the latest template and using the translation file as a translation template of the image-text file.
Further, still include:
and the translation proofreading module is used for proofreading the translation and taking a proofreading result as input data of the translation generating module.
Based on the same inventive concept, the invention also provides a translation device with format graphics and texts, which comprises: a memory storing a computer program and a processor implementing the steps of the above method when executing the computer program.
The invention has the following beneficial effects: the translation workload of the user side and the time for editing the format of the translated text file can be greatly reduced, so that the user only needs to pay attention to the translation result without investing excessive energy and time, the process for editing the format of the file by the user side is simplified, and the translation quality is improved.
Drawings
FIG. 1 is a flowchart of a method according to a first embodiment of the present invention;
FIG. 2 is a sub-flowchart of step S02 in the first embodiment shown in FIG. 1;
FIG. 3 is a sub-flowchart of step S04 in the first embodiment shown in FIG. 1;
FIG. 4 is a flowchart of a method according to a second embodiment of the present invention;
FIG. 5 is a sub-flowchart of step S02 in the second embodiment shown in FIG. 4;
FIG. 6 is a sub-flowchart of step S03 in the second embodiment shown in FIG. 4;
FIG. 7 is a block diagram of the system of the present invention;
FIG. 8 is a block diagram of the components of the text generation module in the system of FIG. 5;
FIG. 9 is a block diagram of the components of a template generation module in the system of FIG. 5;
FIG. 10 is a schematic view of the structure of the apparatus of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
Example one
As shown in fig. 1 to 3, a method for translating formatted graphics and text includes the following steps:
and S01, acquiring the image-text file with the format.
And identifying characters and language types in the image-text file and restoring the characters into original texts.
Specifically, as shown in fig. 2, step S02 further includes the following process:
s021, performing OCR character recognition on the image-text file, recognizing characters and language types thereof, and recording the characters and the language types thereof.
S022, extracting the identified characters.
S023, performing sentence and paragraph division on the extracted characters to obtain original texts. Specifically, the dividing process is as follows: the extracted characters form sentences and paragraphs from top to bottom and from left to right.
And correcting the original text and taking the corrected result as the original text to be translated. The specific process of the step is as follows: and displaying the OCR recognition result to a user, wherein the user can adjust to perfect sentence and paragraph division of the original text, correct the character with the recognition error, and submit the corrected character to form the final original text to be translated.
And identifying the format type of the image-text file and determining a matched translation template according to the format type.
Specifically, as shown in fig. 3, step S04 further includes the following process:
and S041, identifying the format type of the image-text file.
And S042, selecting the latest template in the template library according to the format type.
And S043, identifying a translation file format corresponding to the latest template and taking the translation file format as a translation template of the image-text file.
And translating the original text according to the language type to obtain a translated text. In this step, the translation process is completed by the translation software, which belongs to the prior art and is not described in detail.
And checking the translated text and using the checking result as the translated text to be filled into the translation template. The specific process of the step is as follows: and (3) carrying out one-to-one correspondence on the sentences of the original text and the sentences of the translated text, carrying out paragraph alignment, simultaneously carrying out proofreading and correction on the translated text, and saving and submitting the adjusted translated text by a user.
And filling the translation into the selected translation template to obtain a translation final draft. After the template is filled, a translation file is automatically generated and displayed to a user, the user can carry out final proofreading on the format of the translation file, and the translation file can be generated by submitting the proofreading.
It should be noted that steps S02 to S05 may be performed in other sequences, but the input data of step S03 in the first embodiment is the output result from step S02, and similarly, the input data of step S05 is the output result from step S03.
Example two
As shown in fig. 4 to 6, a method for translating formatted graphics and text includes the following steps:
and S01, acquiring the image-text file with the format.
And identifying the format type of the image-text file and determining a matched translation template according to the format type.
Specifically, as shown in fig. 5, step S02 further includes the following process:
and S021, identifying the format type of the image-text file.
S022, selecting the latest template in the template library according to the format type.
S023, identifying a format of the translation file corresponding to the latest template and taking the format as a translation template of the image-text file.
And identifying characters and language types in the image-text file and restoring the characters into original texts.
Specifically, as shown in fig. 6, step S03 further includes the following process:
s031, carry on OCR characters to the picture and text file, discern characters and language type.
S032, extracting the recognized characters.
And S033, performing sentence and paragraph division on the extracted characters to obtain an original text. Specifically, the dividing process is as follows: the extracted characters form sentences and paragraphs from top to bottom and from left to right.
And correcting the original text and taking the corrected result as the original text to be translated. The specific process of the step is as follows: and displaying the OCR recognition result to a user, wherein the user can adjust to perfect sentence and paragraph division of the original text, correct the character with the recognition error, and submit the corrected character to form the final original text to be translated.
And translating the original text according to the language type to obtain a translated text. In this step, the translation process is completed by the translation software, which belongs to the prior art and is not described in detail.
And checking the translated text and using the checking result as the translated text to be filled into the translation template. The specific process of the step is as follows: and (3) carrying out one-to-one correspondence on the sentences of the original text and the sentences of the translated text, carrying out paragraph alignment, simultaneously carrying out proofreading and correction on the translated text, and saving and submitting the adjusted translated text by a user.
And filling the translation into the selected translation template to obtain a translation final draft. After the template is filled, a translation file is automatically generated and displayed to a user, the user can carry out final proofreading on the format of the translation file, and the translation file can be generated by submitting the proofreading.
As shown in fig. 7, based on the same inventive concept, the invention further provides a translation system with formatted images and texts, which comprises an acquisition module 1, an original text generation module 2, an original text modification module 3, a template generation module 6, a translation module 4, a translation proofreading module 5 and a translation generation module 7.
The acquisition module 1 is used for acquiring a formatted image-text file; the original text generation module 2 is used for identifying characters and language types in the image-text file and restoring the characters into original text; the original text correction module 3 is used for correcting the original text and taking the corrected result as the input data of the translation module 4; the template generation module 6 is used for identifying the format type of the image-text file and determining a matched translation template according to the format type; the translation module 4 is used for translating the original text according to the language type to obtain a translated text; the translation proofreading module 5 is used for proofreading the translation and taking the proofreading result as the input data of the translation generating module 7. The translation generating module 7 is configured to fill the translation into the selected translation template to obtain a translation end draft.
As a preferred example, as shown in fig. 8, the original text generation module 2 includes a character recognition sub-module 21, a character extraction sub-module 22, and an original text generation sub-module 23.
The character recognition submodule 21 is configured to perform OCR character recognition on the image-text file, and recognize characters and language types thereof; the character extraction submodule 22 is used for extracting the identified characters; the original text generation submodule 23 is configured to perform sentence and paragraph division on the extracted characters to obtain an original text.
As a preferred example, as shown in fig. 9, the template generating module 6 includes a format recognition submodule 61, a template selection submodule 62 and a template generating submodule 63.
The format identification submodule 61 is used for identifying the format type of the image-text file; the template selection submodule 62 is used for selecting the latest template in the template library according to the format type; the template generating submodule 63 is configured to identify a format of a translation file corresponding to the latest template and use the format of the translation file as a template of the translation of the teletext file.
As shown in fig. 10, based on the same inventive concept, the present invention further provides a device for translating formatted graphics and text, which includes a memory 100 and a processor 200, where the memory 100 stores a computer program, and the processor 200 implements the steps of the methods in the first embodiment and the second embodiment when executing the computer program.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.
Claims (12)
1. A method for translating formatted pictures and texts is characterized by comprising the following processes:
acquiring a picture-text file with a format;
identifying characters and language types in the image-text file and restoring the characters into original texts;
identifying the format type of the image-text file and determining a matched translation template according to the format type;
translating the original text according to the language type to obtain a translated text;
and filling the translation into the selected translation template to obtain a translation final draft.
2. The method for translating formatted pictures and texts according to claim 1, wherein said step of identifying and reducing the texts in said picture and text file to original texts comprises the following steps:
performing OCR character recognition on the image-text file to recognize characters and language types thereof;
extracting the recognized characters;
and carrying out sentence and paragraph division on the extracted characters to obtain the original text.
3. The method for translating formatted pictures and texts according to claim 2, wherein the step of dividing the extracted words into sentences and paragraphs to obtain the original text specifically comprises the following steps:
the extracted characters form sentences and paragraphs from top to bottom and from left to right.
4. The method for translating formatted pictures and texts according to claim 1, further comprising the following process:
and correcting the original text and taking the corrected result as the original text to be translated.
5. The method for translating formatted pictures and texts according to claim 1, wherein said step of identifying the format type of said picture and text file and determining the matching translation template according to said format type comprises the following steps:
identifying the format type of the image-text file;
selecting the latest template in a template library according to the format type;
and identifying the format of the translation file corresponding to the latest template and using the format as a translation template of the image-text file.
6. The method for translating formatted pictures and texts according to claim 1, further comprising the following process:
and checking the translation and taking the checking result as the translation to be filled into the translation template.
7. A system for translating formatted pictures and texts, comprising:
the acquisition module is used for acquiring the image-text file with the format;
the original text generation module is used for identifying characters and language types in the image-text file and restoring the characters into original text;
the template generation module is used for identifying the format type of the image-text file and determining a matched translation template according to the format type;
the translation module is used for translating the original text according to the language type to obtain a translated text;
and the translation generation module is used for filling the translation into the selected translation template to obtain a translation final draft.
8. The system for translating formatted pictures and texts according to claim 7, wherein said text generation module comprises:
the character recognition submodule is used for carrying out OCR character recognition on the image-text file and recognizing characters and language types thereof;
the character extraction submodule is used for extracting the identified characters;
and the original text generation submodule is used for carrying out sentence and paragraph division on the extracted characters to obtain an original text.
9. A formatted picture and text translation system according to claim 7, further comprising:
and the original text correction module is used for correcting the original text and taking a correction result as input data of the translation module.
10. The formatted picture and text translation system according to claim 7, wherein said template generation module comprises:
the format identification submodule is used for identifying the format type of the image-text file;
the template selection submodule is used for selecting the latest template in the template library according to the format type;
and the template generating sub-module is used for identifying the format of the translation file corresponding to the latest template and using the translation file as a translation template of the image-text file.
11. A formatted picture and text translation system according to claim 7, further comprising:
and the translation proofreading module is used for proofreading the translation and taking a proofreading result as input data of the translation generating module.
12. A device for translating formatted pictures and texts, comprising: a memory storing a computer program and a processor implementing the steps of the method of any one of claims 1 to 6 when the processor executes the computer program.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010605610 | 2020-06-29 | ||
CN202010605610X | 2020-06-29 |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112287652A true CN112287652A (en) | 2021-01-29 |
Family
ID=74426917
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011493135.8A Pending CN112287652A (en) | 2020-06-29 | 2020-12-17 | Method, system and device for translating formatted pictures and texts |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112287652A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104346319A (en) * | 2013-08-05 | 2015-02-11 | 北大方正集团有限公司 | Method and system for inspecting document style |
CN107122337A (en) * | 2016-02-24 | 2017-09-01 | 阿里巴巴集团控股有限公司 | One kind translation official documents and correspondence generation method and device |
CN108038095A (en) * | 2017-12-15 | 2018-05-15 | 四川汉科计算机信息技术有限公司 | A kind of document automatic creation method |
CN109299445A (en) * | 2018-08-01 | 2019-02-01 | 政采云有限公司 | It obtains the method, apparatus of file template, calculate equipment and storage medium |
CN109783826A (en) * | 2019-01-15 | 2019-05-21 | 四川译讯信息科技有限公司 | A kind of document automatic translating method |
-
2020
- 2020-12-17 CN CN202011493135.8A patent/CN112287652A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104346319A (en) * | 2013-08-05 | 2015-02-11 | 北大方正集团有限公司 | Method and system for inspecting document style |
CN107122337A (en) * | 2016-02-24 | 2017-09-01 | 阿里巴巴集团控股有限公司 | One kind translation official documents and correspondence generation method and device |
CN108038095A (en) * | 2017-12-15 | 2018-05-15 | 四川汉科计算机信息技术有限公司 | A kind of document automatic creation method |
CN109299445A (en) * | 2018-08-01 | 2019-02-01 | 政采云有限公司 | It obtains the method, apparatus of file template, calculate equipment and storage medium |
CN109783826A (en) * | 2019-01-15 | 2019-05-21 | 四川译讯信息科技有限公司 | A kind of document automatic translating method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8627203B2 (en) | Method and apparatus for capturing, analyzing, and converting scripts | |
CA2174258A1 (en) | Method and System for Automatic Transcription Correction | |
CN103093252B (en) | Information output apparatus and information output method | |
JPH07121664A (en) | Automatic decision apparatus of european language | |
CN112329447B (en) | Training method of Chinese error correction model, chinese error correction method and device | |
CN112579466B (en) | Method and device for generating test cases and computer readable storage medium | |
CN112766000A (en) | Machine translation method and system based on pre-training model | |
CN116402067B (en) | Cross-language self-supervision generation method for multi-language character style retention | |
CN107066438A (en) | A kind of method for editing text and device, electronic equipment | |
CN112861864A (en) | Topic entry method, topic entry device, electronic device and computer-readable storage medium | |
US20240338535A1 (en) | Webtoon content multilingual translation method | |
CN112287652A (en) | Method, system and device for translating formatted pictures and texts | |
CN117829101A (en) | Method, apparatus, device and medium for converting text style | |
CN112965772A (en) | Web page display method and device and electronic equipment | |
CN109657244B (en) | English long sentence automatic segmentation method and system | |
CN116320622B (en) | Broadcast television news video-to-picture manuscript manufacturing system and manufacturing method | |
CN116932712A (en) | Multi-mode input interactive information generation method, device, equipment and medium | |
CN117235546A (en) | Multi-version file comparison method, device, system and storage medium | |
CN117436415A (en) | Presentation generation method and device, electronic equipment and storage medium | |
CN116341525A (en) | Text examination and correction system based on natural language processing | |
CN115292349A (en) | Method, system and device for generating SQL | |
CN111310457B (en) | Word mismatching recognition method and device, electronic equipment and storage medium | |
CN112668581A (en) | Document title identification method and device | |
CN112133309A (en) | Audio and text synchronization method, computing device and storage medium | |
CN114638241B (en) | Data matching method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |