CN112287652A

CN112287652A - Method, system and device for translating formatted pictures and texts

Info

Publication number: CN112287652A
Application number: CN202011493135.8A
Authority: CN
Inventors: 毕伟; 杨星月
Original assignee: Nanjing Yijiezhi Information Technology Co ltd
Current assignee: Nanjing Yijiezhi Information Technology Co ltd
Priority date: 2020-06-29
Filing date: 2020-12-17
Publication date: 2021-01-29

Abstract

The invention provides a method, a system and a device for translating formatted pictures and texts, wherein the translation method comprises the following processes: acquiring a picture-text file with a format; identifying characters and language types in the image-text file and restoring the characters into original texts; identifying the format type of the image-text file and determining a matched translation template according to the format type; translating the original text according to the language type to obtain a translated text; and filling the translation into the selected translation template to obtain a translation final draft. The invention has the following beneficial effects: the translation workload of the user side and the time for editing the format of the translated text file can be greatly reduced, so that the user only needs to pay attention to the translation result without investing excessive energy and time, the process for editing the format of the file by the user side is simplified, and the translation quality is improved.

Description

Method, system and device for translating formatted pictures and texts

Technical Field

The invention belongs to the technical field of intelligent translation, and particularly relates to a method, a system and a device for translating formatted pictures and texts.

Background

The existing translation software can only realize the function of translating text sentences, and for image-text files, the existing translation software cannot extract original texts from pictures and cannot automatically generate formatted translations after translation. Therefore, the user needs to extract the original text by himself/herself, translate the original text into a translation, and finally manually fill the translation into the translation template and edit the document format, which seriously affects the work efficiency.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: the method, the system and the device for translating the format graphics and texts can greatly reduce the translation workload of a user side and the time for editing the format of a translated text file.

The invention is realized by the following steps: a method for translating formatted pictures and texts comprises the following processes:

acquiring a picture-text file with a format;

identifying characters and language types in the image-text file and restoring the characters into original texts;

identifying the format type of the image-text file and determining a matched translation template according to the format type;

translating the original text according to the language type to obtain a translated text;

and filling the translation into the selected translation template to obtain a translation final draft.

Further, the step of identifying the text in the teletext file and restoring the text to the original text comprises the following steps:

performing OCR character recognition on the image-text file to recognize characters and language types thereof;

extracting the recognized characters;

and carrying out sentence and paragraph division on the extracted characters to obtain the original text.

Further, the specific process of the step of obtaining the original text by performing sentence and paragraph division on the extracted characters is as follows:

the extracted characters form sentences and paragraphs from top to bottom and from left to right.

Further, the method also comprises the following steps:

and correcting the original text and taking the corrected result as the original text to be translated.

Further, the step of identifying the format type of the teletext and determining a matching translation template according to the format type comprises the following processes:

identifying the format type of the image-text file;

selecting the latest template in a template library according to the format type;

and identifying the format of the translation file corresponding to the latest template and using the format as a translation template of the image-text file.

Further, the method also comprises the following steps:

and checking the translation and taking the checking result as the translation to be filled into the translation template.

Based on the same inventive concept, the invention also provides a system for translating the formatted pictures and texts, which comprises the following components:

the acquisition module is used for acquiring the image-text file with the format;

the original text generation module is used for identifying characters and language types in the image-text file and restoring the characters into original text;

the template generation module is used for identifying the format type of the image-text file and determining a matched translation template according to the format type;

the translation module is used for translating the original text according to the language type to obtain a translated text;

and the translation generation module is used for filling the translation into the selected translation template to obtain a translation final draft.

Further, the original text generation module includes:

the character recognition submodule is used for carrying out OCR character recognition on the image-text file and recognizing characters and language types thereof;

the character extraction submodule is used for extracting the identified characters;

and the original text generation submodule is used for carrying out sentence and paragraph division on the extracted characters to obtain an original text.

Further, still include:

and the original text correction module is used for correcting the original text and taking a correction result as input data of the translation module.

Further, the template generation module includes:

the format identification submodule is used for identifying the format type of the image-text file;

the template selection submodule is used for selecting the latest template in the template library according to the format type;

and the template generating sub-module is used for identifying the format of the translation file corresponding to the latest template and using the translation file as a translation template of the image-text file.

Further, still include:

and the translation proofreading module is used for proofreading the translation and taking a proofreading result as input data of the translation generating module.

Based on the same inventive concept, the invention also provides a translation device with format graphics and texts, which comprises: a memory storing a computer program and a processor implementing the steps of the above method when executing the computer program.

The invention has the following beneficial effects: the translation workload of the user side and the time for editing the format of the translated text file can be greatly reduced, so that the user only needs to pay attention to the translation result without investing excessive energy and time, the process for editing the format of the file by the user side is simplified, and the translation quality is improved.

Drawings

FIG. 1 is a flowchart of a method according to a first embodiment of the present invention;

FIG. 2 is a sub-flowchart of step S02 in the first embodiment shown in FIG. 1;

FIG. 3 is a sub-flowchart of step S04 in the first embodiment shown in FIG. 1;

FIG. 4 is a flowchart of a method according to a second embodiment of the present invention;

FIG. 5 is a sub-flowchart of step S02 in the second embodiment shown in FIG. 4;

FIG. 6 is a sub-flowchart of step S03 in the second embodiment shown in FIG. 4;

FIG. 7 is a block diagram of the system of the present invention;

FIG. 8 is a block diagram of the components of the text generation module in the system of FIG. 5;

FIG. 9 is a block diagram of the components of a template generation module in the system of FIG. 5;

FIG. 10 is a schematic view of the structure of the apparatus of the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.

Example one

As shown in fig. 1 to 3, a method for translating formatted graphics and text includes the following steps:

and S01, acquiring the image-text file with the format.

And identifying characters and language types in the image-text file and restoring the characters into original texts.

Specifically, as shown in fig. 2, step S02 further includes the following process:

s021, performing OCR character recognition on the image-text file, recognizing characters and language types thereof, and recording the characters and the language types thereof.

S022, extracting the identified characters.

S023, performing sentence and paragraph division on the extracted characters to obtain original texts. Specifically, the dividing process is as follows: the extracted characters form sentences and paragraphs from top to bottom and from left to right.

And correcting the original text and taking the corrected result as the original text to be translated. The specific process of the step is as follows: and displaying the OCR recognition result to a user, wherein the user can adjust to perfect sentence and paragraph division of the original text, correct the character with the recognition error, and submit the corrected character to form the final original text to be translated.

And identifying the format type of the image-text file and determining a matched translation template according to the format type.

Specifically, as shown in fig. 3, step S04 further includes the following process:

and S041, identifying the format type of the image-text file.

And S042, selecting the latest template in the template library according to the format type.

And S043, identifying a translation file format corresponding to the latest template and taking the translation file format as a translation template of the image-text file.

And translating the original text according to the language type to obtain a translated text. In this step, the translation process is completed by the translation software, which belongs to the prior art and is not described in detail.

And checking the translated text and using the checking result as the translated text to be filled into the translation template. The specific process of the step is as follows: and (3) carrying out one-to-one correspondence on the sentences of the original text and the sentences of the translated text, carrying out paragraph alignment, simultaneously carrying out proofreading and correction on the translated text, and saving and submitting the adjusted translated text by a user.

And filling the translation into the selected translation template to obtain a translation final draft. After the template is filled, a translation file is automatically generated and displayed to a user, the user can carry out final proofreading on the format of the translation file, and the translation file can be generated by submitting the proofreading.

It should be noted that steps S02 to S05 may be performed in other sequences, but the input data of step S03 in the first embodiment is the output result from step S02, and similarly, the input data of step S05 is the output result from step S03.

Example two

As shown in fig. 4 to 6, a method for translating formatted graphics and text includes the following steps:

and S01, acquiring the image-text file with the format.

Specifically, as shown in fig. 5, step S02 further includes the following process:

and S021, identifying the format type of the image-text file.

S022, selecting the latest template in the template library according to the format type.

S023, identifying a format of the translation file corresponding to the latest template and taking the format as a translation template of the image-text file.

Specifically, as shown in fig. 6, step S03 further includes the following process:

s031, carry on OCR characters to the picture and text file, discern characters and language type.

S032, extracting the recognized characters.

And S033, performing sentence and paragraph division on the extracted characters to obtain an original text. Specifically, the dividing process is as follows: the extracted characters form sentences and paragraphs from top to bottom and from left to right.

As shown in fig. 7, based on the same inventive concept, the invention further provides a translation system with formatted images and texts, which comprises an acquisition module 1, an original text generation module 2, an original text modification module 3, a template generation module 6, a translation module 4, a translation proofreading module 5 and a translation generation module 7.

The acquisition module 1 is used for acquiring a formatted image-text file; the original text generation module 2 is used for identifying characters and language types in the image-text file and restoring the characters into original text; the original text correction module 3 is used for correcting the original text and taking the corrected result as the input data of the translation module 4; the template generation module 6 is used for identifying the format type of the image-text file and determining a matched translation template according to the format type; the translation module 4 is used for translating the original text according to the language type to obtain a translated text; the translation proofreading module 5 is used for proofreading the translation and taking the proofreading result as the input data of the translation generating module 7. The translation generating module 7 is configured to fill the translation into the selected translation template to obtain a translation end draft.

As a preferred example, as shown in fig. 8, the original text generation module 2 includes a character recognition sub-module 21, a character extraction sub-module 22, and an original text generation sub-module 23.

The character recognition submodule 21 is configured to perform OCR character recognition on the image-text file, and recognize characters and language types thereof; the character extraction submodule 22 is used for extracting the identified characters; the original text generation submodule 23 is configured to perform sentence and paragraph division on the extracted characters to obtain an original text.

As a preferred example, as shown in fig. 9, the template generating module 6 includes a format recognition submodule 61, a template selection submodule 62 and a template generating submodule 63.

The format identification submodule 61 is used for identifying the format type of the image-text file; the template selection submodule 62 is used for selecting the latest template in the template library according to the format type; the template generating submodule 63 is configured to identify a format of a translation file corresponding to the latest template and use the format of the translation file as a template of the translation of the teletext file.

As shown in fig. 10, based on the same inventive concept, the present invention further provides a device for translating formatted graphics and text, which includes a memory 100 and a processor 200, where the memory 100 stores a computer program, and the processor 200 implements the steps of the methods in the first embodiment and the second embodiment when executing the computer program.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims

1. A method for translating formatted pictures and texts is characterized by comprising the following processes:

acquiring a picture-text file with a format;

2. The method for translating formatted pictures and texts according to claim 1, wherein said step of identifying and reducing the texts in said picture and text file to original texts comprises the following steps:

extracting the recognized characters;

3. The method for translating formatted pictures and texts according to claim 2, wherein the step of dividing the extracted words into sentences and paragraphs to obtain the original text specifically comprises the following steps:

4. The method for translating formatted pictures and texts according to claim 1, further comprising the following process:

5. The method for translating formatted pictures and texts according to claim 1, wherein said step of identifying the format type of said picture and text file and determining the matching translation template according to said format type comprises the following steps:

identifying the format type of the image-text file;

6. The method for translating formatted pictures and texts according to claim 1, further comprising the following process:

7. A system for translating formatted pictures and texts, comprising:

8. The system for translating formatted pictures and texts according to claim 7, wherein said text generation module comprises:

9. A formatted picture and text translation system according to claim 7, further comprising:

10. The formatted picture and text translation system according to claim 7, wherein said template generation module comprises:

11. A formatted picture and text translation system according to claim 7, further comprising:

12. A device for translating formatted pictures and texts, comprising: a memory storing a computer program and a processor implementing the steps of the method of any one of claims 1 to 6 when the processor executes the computer program.