CN109657619A - A kind of attached drawing interpretation method, device and storage medium - Google Patents

A kind of attached drawing interpretation method, device and storage medium Download PDF

Info

Publication number
CN109657619A
CN109657619A CN201811564424.5A CN201811564424A CN109657619A CN 109657619 A CN109657619 A CN 109657619A CN 201811564424 A CN201811564424 A CN 201811564424A CN 109657619 A CN109657619 A CN 109657619A
Authority
CN
China
Prior art keywords
character
pel
translated
attached drawing
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811564424.5A
Other languages
Chinese (zh)
Inventor
单杰
董柏雷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Shun Yu Information Technology Co Ltd
Original Assignee
Jiangsu Shun Yu Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Shun Yu Information Technology Co Ltd filed Critical Jiangsu Shun Yu Information Technology Co Ltd
Priority to CN201811564424.5A priority Critical patent/CN109657619A/en
Publication of CN109657619A publication Critical patent/CN109657619A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition

Abstract

The invention discloses a kind of attached drawing interpretation methods, device and storage medium, wherein, method includes: to establish the corresponding relationship of attached drawing parameter Yu character to be translated, character identical with character to be translated in cypher text is treated to be translated, obtain corresponding translation character, corresponding relationship based on attached drawing parameter Yu character to be translated, establish the corresponding relationship of translation character and attached drawing parameter, corresponding relationship based on translation character and attached drawing parameter, character pel is replaced with into the translation pel comprising translation character, the present invention can be improved attached drawing translation efficiency, save human cost, meet high-volume translation brief, guarantee the consistency of text and attached drawing translation simultaneously, improve reading experience.

Description

A kind of attached drawing interpretation method, device and storage medium
Technical field
The present embodiments relate to image processing techniques more particularly to a kind of attached drawing interpretation methods, device and storage medium.
Background technique
In translation field, if document to be translated has attached drawing, and has text in attached drawing, in order to improve reading body It tests, it usually needs the text of textual portions and attached drawing part is translated respectively respectively.
It is a great problem to the character translation in attached drawing, the method generallyd use at present is, will be in attached drawing by translator Text translation content production corresponding with its separately picture is handled at table, then by picture treatment people, will translate Content is added to corresponding position in attached drawing.
Existing attached drawing interpretation method translation efficiency is low, and expends a large amount of human costs, is unable to satisfy high-volume translation It is required that in addition, text and attached drawing are separately translated, it is difficult to which the consistency for guaranteeing the two influences reading experience.
Summary of the invention
The present invention provides a kind of attached drawing interpretation method, device and storage medium, to improve attached drawing translation efficiency, saves manpower Cost meets high-volume translation brief, while guaranteeing the consistency of text and attached drawing translation, improves reading experience.
In a first aspect, the embodiment of the invention provides a kind of attached drawing interpretation methods, comprising:
Establish the corresponding relationship of attached drawing parameter Yu character to be translated, wherein attached drawing parameter includes character to be translated in correspondence The pel location information of character pel and character pel in respective figure in attached drawing, character to be translated are to identify from attached drawing The character that needs out are translated;
It treats character identical with character to be translated in cypher text to be translated, obtains corresponding translation character;
Corresponding relationship based on attached drawing parameter Yu character to be translated establishes the corresponding relationship of translation character and attached drawing parameter;
Based on the corresponding relationship of translation character and attached drawing parameter, character pel is replaced with into the translation figure comprising translation character Member.
Optionally, the corresponding relationship of attached drawing parameter Yu character to be translated is established, comprising:
Determine the character pel comprising character to be translated;
Character pel and pel location information are extracted, generates the first data set, wherein the first data set includes character pel And the corresponding relationship of pel location information;
Character recognition is carried out to the character pel in the first data set, identifies character to be translated;
Based on the first data set and the character to be translated that identifies, the second data set is generated, wherein the second data set includes The corresponding relationship of character to be translated, character pel and pel location information.
Optionally, after generating the first data set, further includes:
Based on the first data set, character pel is differently shown in the accompanying drawings;
The adjustment of character pel position is operated based on personnel, updates the character pel for needing adjustment in the first data set and right The pel location information answered.
Optionally, after generating the first data set, further includes:
Highlight the attached drawing region to be selected except character pel;
The character pel and corresponding pel location information that personnel are selected from attached drawing region to be selected are added to the first data It concentrates.
Optionally, this method further include:
Based on the second data set, visualized list is generated, wherein visualized list includes pel column and editable identification Character pel is shown on information bar, pel column, and the character to be translated corresponding to character pel is shown on identification information column;
Based on to the edited visualized list in identification information column and second data set, third data set is generated, Wherein, third data set includes the corresponding relationship of character to be translated, character pel and pel location information.
Optionally, it is based on the second data set, generates visualized list, comprising:
Based on the character pel and character to be translated in the second data set, pel column and identification information column are generated;
Based on the pel location information in the second data set, the position of each character pel in the accompanying drawings is determined;
According to the sequence of positions of the character pel of setting, pel column and identification information column sequence list character pel and to Translate character.
Optionally, visualized list further includes confirmation column, to confirm whether the character to be translated in identification information column is correct.
Optionally, visualized list further includes the recommendation information column that can be breathed out, and recommendation information column is shown and character to be translated The recommendation character to match recommends character to obtain by text to be translated.
Optionally, this method further include:
Based on the second data set, editable text is generated near character pel in a manner of not blocking character pel Frame, wherein the character to be translated corresponding to character pel is shown in text box;
Based on text box after editor and the second data set, third data set is generated, wherein third data set includes wait turn over Translate the corresponding relationship of character, character pel and pel location information.
Optionally, the character pel comprising character to be translated is determined, comprising:
Based on all word contents occurred in ORB algorithm positioning attached drawing;
The continuation character of default spacing is considered as character to be translated;
Rectangular element of the interception comprising character to be translated is as character pel.
Optionally, attached drawing is the Figure of description of patent document, attached drawing interpretation method further include:
Extract the appended drawing reference in specification to be translated;
Characterized by the character of appended drawing reference, determining that exclusion includes attached drawing when including the character pel of character to be translated The character pel of the character of label.
Optionally, character pel is replaced with into the translation pel comprising translation character, comprising:
Background colour filling is carried out to character pel;
Translation character is shown in character pel with default size and font.
Optionally, this method further include:
When retrieval is less than character identical with character to be translated in text to be translated, extracted using segmentation methods to be translated Technical term in character;
Cypher text, which is treated, based on technical term carries out retrieval matching;
If in text to be translated retrieval be matched to identical technical term, by text to be translated with technical term pair The character answered is associated with character to be translated foundation.
Second aspect, the embodiment of the invention also provides a kind of attached drawing translating equipments, comprising:
First relationship establishes unit, for establishing the corresponding relationship of attached drawing parameter Yu character to be translated, wherein attached drawing parameter The pel location information of character pel and character pel in respective figure including character to be translated in respective figure, wait turn over Translating character is the character that the needs identified from attached drawing are translated;
Translation unit is translated for treating character identical with character to be translated in cypher text, is obtained corresponding Translate character;
Second relationship establishes unit, for the corresponding relationship based on attached drawing parameter Yu character to be translated, establishes translation character With the corresponding relationship of attached drawing parameter;
Display unit replaces with character pel comprising turning over for the corresponding relationship based on translation character and attached drawing parameter Translate the translation pel of character.
The third aspect, the embodiment of the invention also provides a kind of computer readable storage mediums, are stored thereon with computer Program realizes the attached drawing interpretation method as described in first aspect present invention is any when the program is executed by processor.
The embodiment of the present invention by establishing the corresponding relationship of attached drawing parameter Yu character to be translated, treat in cypher text with to The identical character of translation character is translated, and corresponding translation character is obtained, corresponding with character to be translated based on attached drawing parameter Relationship establishes the corresponding relationship of translation character and attached drawing parameter, and the corresponding relationship based on translation character and attached drawing parameter, by word Symbol pel replaces with the translation pel comprising translation character, improves attached drawing translation efficiency, saves human cost, meets high-volume Translation brief, while guaranteeing the consistency of text and attached drawing translation, improve reading experience.
Detailed description of the invention
Fig. 1 is a kind of flow chart of attached drawing interpretation method provided in an embodiment of the present invention;
Fig. 2 is an attached drawing to be translated in the embodiment of the present invention;
Fig. 3 is to the schematic diagram after the character translation to be translated in Fig. 2 wire frame A;
Fig. 4 is the flow chart of another attached drawing interpretation method provided in an embodiment of the present invention;
Fig. 5 is a kind of schematic diagram of visualized list in the embodiment of the present invention;
Fig. 6 is the schematic diagram after the recommendation information column of character 3 to be translated in Fig. 5 is breathed out;
Fig. 7 is a kind of structural schematic diagram of attached drawing translating equipment provided in an embodiment of the present invention;
Fig. 8 is the structural schematic diagram that the first relationship establishes unit in the embodiment of the present invention.
Specific embodiment
The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining the present invention rather than limiting the invention.It also should be noted that in order to just Only the parts related to the present invention are shown in description, attached drawing rather than entire infrastructure.
The embodiment of the invention provides a kind of attached drawing interpretation methods, suitable for the attached drawing to the foreign language literature for including attached drawing Translation, Fig. 1 is a kind of flow chart of attached drawing interpretation method provided in an embodiment of the present invention, as shown in Figure 1, the attached drawing interpretation method Specifically comprise the following steps:
S110, the corresponding relationship for establishing attached drawing parameter Yu character to be translated.
Wherein, attached drawing parameter includes character pel and character pel of the character to be translated in respective figure in respective figure In pel location information, character to be translated is the character that the needs identified from attached drawing are translated.Fig. 2 is the embodiment of the present invention In an attached drawing to be translated, as shown in Fig. 2, specifically, character to be translated be attached drawing in word, phrase, a line or multline text, It does not include the appended drawing reference in attached drawing such as the phrase " MemoryDevice " that wire frame A frame selects in Fig. 2, such as single character sum number Word.Character pel is the pel in attached drawing comprising character to be translated, and the size of character pel can be according to the size of character to be translated It carries out adaptively, shape can be right angle quadrangle, and corresponding appended drawing reference is included in the right angle quadrangle, in Fig. 2 Pel shown in wire frame A.Pel location information refers to the specific location of character pel in the accompanying drawings, illustratively, can be based on Attached drawing establishes two-dimensional coordinate system, according to the coordinate information of character pel in the coordinate system, determines the pel position of the character pel Confidence breath.It should be noted that the method for the pel location information of the appended drawing reference pel of above-mentioned determination is its of the invention In an embodiment, as long as the present invention is herein not in fact, can be used to determine the pel location information of appended drawing reference pel It limits.
S120, it treats character identical with character to be translated in cypher text and is translated, obtain corresponding translation character.
Specifically, text to be translated is the textual portions of foreign language literature, during treating cypher text translation, first Based on the character to be translated recognized, retrieval and the identical character of character to be translated from text to be translated, and foundation pair It should be related to, treat cypher text full text and translated, obtain the corresponding translation character of character to be translated.
It is corresponding with attached drawing parameter to establish translation character by S130, the corresponding relationship based on attached drawing parameter Yu character to be translated Relationship.
Specifically, pair of corresponding relationship and character to be translated and translation character based on attached drawing parameter and character to be translated It should be related to, establish the corresponding relationship of translation character and attached drawing parameter.
S140, the corresponding relationship based on translation character and attached drawing parameter replace with character pel comprising translation character Translation pel.
Based on the corresponding relationship of translation character and attached drawing parameter, the corresponding character to be translated of translation character can be found The location information of character pel and the pel, and then the character pel in attached drawing is replaced with into the translation figure comprising translation character Member.Fig. 3 is to the schematic diagram after the character translation to be translated in Fig. 2 wire frame A, as shown in figure 3, including character to be translated The pel of " Memory Device " is replaced by the translation pel comprising translating character " memory ".Since translation character comes The character corresponding with character to be translated from cypher text, in this way, ensure that the content one of the attached drawing after cypher text and translation Cause property promotes reading experience.
The embodiment of the present invention by establishing the corresponding relationship of attached drawing parameter Yu character to be translated, treat in cypher text with to The identical character of translation character is translated, and corresponding translation character is obtained, corresponding with character to be translated based on attached drawing parameter Relationship establishes the corresponding relationship of translation character and attached drawing parameter, and the corresponding relationship based on translation character and attached drawing parameter, by word Symbol pel replaces with the translation pel comprising translation character, improves attached drawing translation efficiency, saves human cost, meets high-volume Translation brief, while guaranteeing the consistency of text and attached drawing translation, improve reading experience.
Optionally, the corresponding relationship of attached drawing parameter Yu character to be translated is established, comprising:
Determine the character pel comprising character to be translated;
Character pel and pel location information are extracted, generates the first data set, wherein the first data set includes character pel And the corresponding relationship of pel location information;
Character recognition is carried out to the character pel in the first data set, identifies character to be translated;
Based on the first data set and the character to be translated that identifies, the second data set is generated, wherein the second data set includes The corresponding relationship of character to be translated, character pel and pel location information.
Correspondingly, Fig. 4 is the flow chart of another attached drawing interpretation method provided in an embodiment of the present invention, as shown in figure 4, should Method includes:
S111, the character pel comprising character to be translated is determined.
Optionally, it is based on ORB (Oriented FAST and Rotated BRIEF) algorithm, with the language of document to be translated Speech character is characterized, and is positioned all word contents occurred in attached drawing, is set the default spacing of single intercharacter, pre- by this is met If the continuation character of spacing is considered as character to be translated, rectangular element of the interception comprising character to be translated is as character pel.Into one Step, if there are multirow characters to be translated in attached drawing, with behavior unit, each nonoverlapping mode of character pel intercepts corresponding word Accord with pel.It should be noted that all word contents occurred in above-mentioned attached drawing are positioned, other algorithms can also be used, As long as can be translated content of the Primary Location into picture, the present invention is not particularly limited.
Optionally, if attached drawing is the Figure of description of patent document, then can extract first in specification to be translated Then appended drawing reference characterized by the character of appended drawing reference, is determining that exclusion is wrapped when including the character pel of character to be translated The character pel of character containing appended drawing reference.So as to improve the efficiency of the determining character pel comprising character to be translated, and keep away Exempt from the interference of appended drawing reference, improves the accuracy of character recognition to be translated in subsequent.
S112, character pel and pel location information are extracted, generates the first data set.
Wherein, the first data set includes the corresponding relationship of character pel and pel location information.Specifically, including in determination After the character pel of character to be translated, character pel and corresponding pel location information are extracted, and establish the first number According to collection, the first data set includes the corresponding relationship of character pel and pel location information.
S113, character recognition is carried out to the character pel in the first data set, identifies character to be translated.
Optionally, identify that character to be translated can use neural network algorithm from character pel, specifically, can use Convolutional neural networks (Convolutional Neural Networks, CNN), depth residual error network (Deep Residual Learning, DRN), visual geometric group (Visual Geometry Group, VGG), deep learning structure (GoogLeNet). In the following, being specifically described by taking CNN as an example:
(1) prepare training dataset: being based on historical data, use a large amount of foreign language text for having checked character to be translated It offers, the character to be translated and corresponding character pel in a large amount of foreign language literatures is extracted, as training dataset.
(2) it handles training dataset: training dataset being handled using image processing software (OpenCV), by pel It is scaled unified size (using can also need not uniform sizes when GoogLeNet), single channel is extracted and is converted into grayscale image.
(3) training test: using common development of neural networks framework (such as Tensorflow, Caffe, Keras, Python etc.) building convolutional neural networks be trained and test.
(4) trained model is used for the identification of character to be translated, positioning may to be pel and the knowledge of character to be translated Its other content.
Identified using optical character identification (Optical Character Recognition, OCR) in attached drawing wait turn over Character is translated, this method is difficult to cope with a large amount of interference characters in picture, it is still desirable to which translator expends great effort core one by one Right, recognition accuracy is not high, complementary not strong for the translation high for accuracy requirement.The embodiment of the present invention Using the character to be translated in neural network algorithm identification attached drawing, raising identifies recognition efficiency and accuracy rate.
S114, based on the first data set and the character to be translated identified, generate the second data set.
Specifically, can be on the basis of the first data set, the corresponding character to be translated identified is added, generates the Two data sets.Wherein, the second data set includes that the corresponding of the character to be translated identified, character pel and pel location information is closed System.
S120, it treats character identical with character to be translated in cypher text and is translated, obtain corresponding translation character.
It is corresponding with attached drawing parameter to establish translation character by S130, the corresponding relationship based on attached drawing parameter Yu character to be translated Relationship.
S140, the corresponding relationship based on translation character and attached drawing parameter replace with character pel comprising translation character Translation pel.
It optionally, further include checking the character pel of the first data set, specifically after generating the first data set Include:
Based on the first data set, character pel is differently shown in the accompanying drawings.Specifically, according in the first data set Character pel pel location information, character pel is differently shown in the accompanying drawings, for example, edge is highlighted, with It is different from the other parts of attached drawing.
After character pel is differently shown in the accompanying drawings, in the accompanying drawings to the character pel of the first data set into Row verification, the audit process can be by manually performing, when occurring including comprising non-character to be translated, character pel in character pel When character to be translated is imperfect or character pel do not include character to be translated, can be manually to the pel position in attached drawing It sets and is adjusted or deletes, machine operates the adjustment of character pel position based on personnel, and update needs to adjust in the first data set Character pel and corresponding pel location information.
It optionally, further include that missing inspection is carried out to the region other than character pel in attached drawing after generating the first data set, Specifically include:
The attached drawing region to be selected except character pel is highlighted, specifically, secretly the first number can be shown show or hide According to the character pel of concentration, to highlight the attached drawing region to be selected except character pel.
The character pel and corresponding pel location information that personnel are selected from attached drawing region to be selected are added to the first data It concentrates.Specifically, can be leaked to attached drawing region to be selected after highlighting the attached drawing region to be selected except character pel Inspection, it is artificial to choose the character including character to be translated when there is the character pel not extracted by step S112 in attached drawing Pel, the character pel and corresponding pel location information that machine selectes personnel from attached drawing region to be selected are added to the first data It concentrates.
Optionally, this method further includes checking the character to be translated identified, is specifically included:
Based on the character pel and character to be translated in the second data set, visualized list is generated in new window, it can It include pel column and editable identification information column depending on changing list, character pel is shown on pel column, and correspondence is shown on identification information column In the character to be translated of character pel.
Based on to the edited visualized list in identification information column and the second data set, third data set is generated, wherein Third data set includes the corresponding relationship of character to be translated, character pel and pel location information.Further, third data set The sample data that can be used as neural network algorithm is used for training picture recognition neural network model, and by trained model Determine that the character pel comprising character to be translated and step S113 carry out the character pel in the first data set in step S111 Character recognition, to improve the accuracy of machine recognition.
Specifically, determining the position of each character pel in the accompanying drawings based on the pel location information in the second data set.It presses According to the sequence of positions of the character pel of setting, for example, in picture, from top to bottom and from left to right in pel column and identification information The character pel and corresponding character to be translated in the second data set are sequentially listed in column.Fig. 5 is a kind of in the embodiment of the present invention The schematic diagram of visualized list, as shown in Figure 5, wherein show the character pel comprising character to be translated, identification information in pel column The character to be translated corresponding to character pel is shown on column, the identification information column be editable format, discovery identify wait turn over Translate character in character and character pel it is inconsistent when, human-edited can be carried out to character to be translated.
Optionally, visualized list further includes confirmation column, which can confirm identification information by the modes such as choosing The character to be translated identified in column is errorless.In addition, visualized list also may include serial number column, to each character pel and corresponding Character to be translated is numbered.
Optionally, visualized list further includes the recommendation information column that can be breathed out, and recommendation information column is shown and character to be translated The recommendation character to match recommends character to obtain by text to be translated, specifically, the recommendation character be based on identify to Character is translated, retrieval character similar with the character height to be translated obtains in text to be translated.Similarity mode can be with Using the common method of industry, bilingual intertranslation quality evaluation aided algorithm (Bilingual Evaluation can also be used Understudy, BLEU), multiple recommendation characters are sorted from high to low by scoring, and show in recommendation information column score compared with High preceding several recommendation word symbols.Fig. 6 is the schematic diagram after the recommendation information column of character 3 to be translated in Fig. 5 is breathed out, such as Fig. 5 institute Show, recommendation information column is usually hidden state, when in the character and identification information column that personnel have found in pel column in character pel Character to be translated it is inconsistent when, can be by clicking identification letter as shown in fig. 6, character pel 3 and character to be translated 3 are inconsistent The exhalation mark exhalation recommendation information column of the lower right Xi Lan, choose in recommendation information column with the character in character pel is consistent pushes away Character is recommended, machine is based on generating third data set to the edited visualized list in identification information column and the second data set.
Optionally, in an alternative embodiment of the invention, to the character to be translated identified carry out verification can also be used it is following Method:
Based on the character pel and corresponding pel location information in the second data set, in a manner of not blocking character pel White space near character pel generates editable text box, wherein is shown in text box corresponding to character pel Character to be translated.
Personnel can check the character in character pel and text box, if it exists character-recognition errors to be translated, It can be modified in text box, machine is based on text box and the second data set after editor, third data set is generated, In, third data set includes the corresponding relationship of character to be translated, character pel and pel location information.
Optionally, character pel is replaced with into the translation pel comprising translation character, comprising:
Based on character pel in third data set and corresponding pel location information, character pel is carried on the back in the accompanying drawings Scenery filling is shown translation character in word with default size and font according to the corresponding relationship of translation character and attached drawing parameter It accords in pel.
In an alternative embodiment of the invention, third data set, the pixel of the character to be translated in location character pel are based on Point forms the character pel of background colour and white with background colour or white filling black pixel point, and then by third data set In translation character the appropriate location of the pel is set with preset size and font.
In addition, succinct and beautiful for attached drawing, the text in attached drawing is frequently present of the situation of omission, simplification, such as saves Slightly preposition, attribute etc., and then cause to retrieve in text to be translated less than the character completely the same with character to be translated, but close The noun of key, such as technical term will not omit.Optionally, for the situation, in the embodiment of the present invention, when in text to be translated When middle retrieval is less than character identical with character to be translated, the technical term in character to be translated, base are extracted using segmentation methods Cypher text, which is treated, in technical term carries out retrieval matching, if retrieval is matched to identical technical term in text to be translated, Character corresponding with technical term in text to be translated and character to be translated are then established into association, in subsequent translation, guaranteed Translation consistency of the technical term in text and attached drawing.
The embodiment of the invention also provides a kind of attached drawing translating equipment, Fig. 7 is a kind of attached drawing provided in an embodiment of the present invention The structural schematic diagram of translating equipment, as shown in fig. 7, the attached drawing translating equipment includes:
First relationship establishes unit 100, for establishing the corresponding relationship of attached drawing parameter Yu character to be translated, wherein attached drawing Parameter includes pel location information of character pel and character pel of the character to be translated in respective figure in respective figure, Character to be translated is the character that the needs identified from attached drawing are translated.Specifically, character to be translated is word, word in attached drawing Group, a line or multline text do not include the appended drawing reference in attached drawing, such as single character and number.Character pel is in attached drawing Pel comprising character to be translated, the size of character pel can carry out adaptive according to the size of character to be translated, and shape can be with It is right angle quadrangle, corresponding appended drawing reference is included in the right angle quadrangle.Pel location information refers to character pel attached Specific location in figure illustratively can be based on attached drawing, establish two-dimensional coordinate system, in the coordinate system according to character pel Coordinate information, determine the pel location information of the character pel.It should be noted that the appended drawing reference pel of above-mentioned determination The method of pel location information is a wherein embodiment of the invention, as long as in fact, can be used to determine appended drawing reference figure The pel location information of member, the present invention is it is not limited here.
Translation unit 200 is translated for treating character identical with character to be translated in cypher text, is corresponded to Translation character.Specifically, text to be translated is the textual portions of foreign language literature, during treating cypher text translation, Character retrieval module 201 first in translation unit 200 based on the character to be translated recognized, from text to be translated retrieval with The identical character of character to be translated, relating module 202 establish corresponding relationship, and translation module 203 treats cypher text full text It is translated, obtains the corresponding translation character of character to be translated.
Second relationship establishes unit 300, for the corresponding relationship based on attached drawing parameter Yu character to be translated, establishes translation word The corresponding relationship of symbol and attached drawing parameter.
Character pel is replaced with for the corresponding relationship based on translation character and attached drawing parameter and includes by display unit 400 Translate the translation pel of character.Based on the corresponding relationship of translation character and attached drawing parameter, it is corresponding that translation character can be found The character pel of character to be translated and the location information of the pel, and then the character pel in attached drawing is replaced with comprising translation word The translation pel of symbol.Since translation character is to turn over from character corresponding with character to be translated in cypher text in this way, ensure that The content consistency of attached drawing after translation sheet and translation, promotes reading experience.
The embodiment of the present invention establishes the corresponding relationship that unit establishes attached drawing parameter Yu character to be translated by the first relationship, turns over It translates unit and treats character identical with character to be translated in cypher text and translated, obtain corresponding translation character, second closes System establishes corresponding relationship of the unit based on attached drawing parameter Yu character to be translated, establishes translation character pass corresponding with attached drawing parameter Character pel is replaced with translating comprising translation character by system, corresponding relationship of the display unit based on translation character and attached drawing parameter Texts and pictures member, improves attached drawing translation efficiency, saves human cost, meets high-volume translation brief, while guaranteeing text and attached drawing The consistency of translation improves reading experience.
Fig. 8 is the structural schematic diagram that the first relationship establishes unit in the embodiment of the present invention, as shown in figure 8, optionally, first It includes character pel determining module 101, attached drawing parameter extraction unit 102, the first dataset generation module that relationship, which establishes unit 100, 103, character recognition module 104 and the second dataset generation module 105.Wherein, character pel determining module 101 is for determining packet Character pel containing character to be translated, specifically, it is based on ORB (Oriented FAST and Rotated BRIEF) algorithm, with The language character of document to be translated is characterized, and positions all word contents occurred in attached drawing, sets the default of single intercharacter The continuation character for meeting the default spacing is considered as character to be translated by spacing, and interception is made comprising the rectangular element of character to be translated For character pel.Further, if there are multirow characters to be translated in attached drawing, with behavior unit, each character pel is nonoverlapping Mode intercepts corresponding character pel.It should be noted that all word contents occurred in above-mentioned attached drawing are positioned, it can also To use other algorithms, if can be translated content of the Primary Location into picture, the present invention is not particularly limited.It is attached Graph parameter extraction unit 102 is after character pel determining module 101 determines the character pel comprising character to be translated, by character Pel and corresponding pel location information extract, and the first dataset generation module 103 is based on attached drawing parameter extraction unit 102 The character pel of extraction and corresponding pel location information establish the first data set, and the first data set includes character pel and pel The corresponding relationship of location information.Character recognition module 104 identifies character to be translated using neural network algorithm from character pel, Specifically, convolutional neural networks (Convolutional Neural Networks, CNN), depth residual error network can be used (Deep Residual Learning, DRN), visual geometric group (Visual Geometry Group, VGG), deep learning knot Structure (GoogLeNet).Second dataset generation module 105 is based on identifying based on the first data set and character recognition module 104 Character to be translated, generate the second data set.Wherein, the second data set include the character to be translated identified, character pel and The corresponding relationship of pel location information.
Optionally, it further includes character pel verification module 106 that the first relationship, which establishes unit 100, in the number of generation first After collection, the character pel of the first data set is checked.Specifically, character pel verification module 106 is based on the first number According to collection, sends and instruct to display unit 400, which includes character pel and corresponding pel location information, display unit 400 Character pel is differently shown in the accompanying drawings.Specifically, pel location information of the display unit 400 according to character pel, Character pel is differently shown in the accompanying drawings, for example, edge is highlighted, to be different from the other parts of attached drawing.It is inciting somebody to action After character pel is differently shown in the accompanying drawings, the character pel of the first data set is checked in the accompanying drawings, the core It can be by manually performing, when the word to be translated for occurring including comprising non-character to be translated, character pel in character pel to process Imperfect or character pel is accorded with not comprising when character to be translated, manually the pel position in attached drawing can be adjusted Or delete, attached drawing parameter extraction unit 102 operates the adjustment of character pel position based on personnel, extracts word adjusted again Pel and corresponding pel location information are accorded with, the first dataset generation module 103 updates the character that adjustment is needed in the first data set Pel and corresponding pel location information.
Optionally, it further includes character pel missing inspection module 107 that the first relationship, which establishes unit 100, in the number of generation first After collection, missing inspection is carried out to the region other than character pel in attached drawing.Specifically, character pel missing inspection module 107 is based on the One data set sends to display unit 400 and instructs, which includes character pel and corresponding pel location information, and display is single First 400 dark show or hides show the character pel in the first data set, to highlight the attached drawing to be selected except character pel Region.Personnel carry out missing inspection to attached drawing region to be selected, artificial to choose when there is the character pel not being extracted in attached drawing Character pel including character to be translated, attached drawing parameter extraction unit 102 extract the character pel and corresponding figure manually chosen First location information, the first dataset generation module 103 by missing inspection to character pel and corresponding pel location information be added to In first data set.
Optionally, it further includes character verification module 108 and third data set generation to be translated that the first relationship, which establishes unit 100, Module 109.Wherein, character verification module 108 to be translated is used for after generating the second data set, to the word to be translated identified Symbol is checked.Specifically, character verification module 108 to be translated is based on the character pel and word to be translated in the second data set Symbol generates list, which includes pel column and identification information column, and character pel is shown on pel column, and identification information column is shown pair It should be in the character to be translated of character pel.The visualization display in new window by the list of display unit 400 forms visualization List, as shown in figure 5, the identification information column of the visualized list is editable format, in the character to be translated that discovery identifies When inconsistent with character in character pel, human-edited can be carried out to character to be translated.109 base of third dataset generation module In to the edited visualized list in identification information column and the second data set, third data set is generated.Further, third number It can be used as the sample data of neural network algorithm according to collection, for training picture recognition neural network model, and by trained mould Type determines the character pel comprising character to be translated and character recognition module 104 to first for character pel determining module 101 Character pel in data set carries out character recognition, to improve the accuracy of machine recognition.
Optionally, visualized list further includes confirmation column, which can confirm identification information by the modes such as choosing The character to be translated identified in column is errorless.In addition, visualized list also may include serial number column, to each character pel and corresponding Character to be translated is numbered.
Optionally, visualized list further includes the recommendation information column that can be breathed out, and recommendation information column is shown and character to be translated The recommendation character to match recommends character to obtain by text to be translated, specifically, the recommendation character be based on identify to Character is translated, retrieval character similar with the character height to be translated obtains in text to be translated.Similarity mode can be with Using the common method of industry, bilingual intertranslation quality evaluation aided algorithm (Bilingual Evaluation can also be used Understudy, BLEU), multiple recommendation characters are sorted from high to low by scoring, and show in recommendation information column score compared with High preceding several recommendation word symbols.As shown in fig. 6, recommendation information column is usually hidden state, when personnel have found character in pel column It, can be by clicking hiding near identification information column when the character to be translated in character and identification information column in pel is inconsistent Column exhalation, choose recommendation information column in the consistent recommendation character of character in character pel, third dataset generation module 109 Based on to the edited visualized list in identification information column and the second data set, third data set is generated.
Optionally, character verification module 108 to be translated can also be based on the second data set, refer to the transmission of display unit 400 It enables, display unit 400 can be compiled according to white space generation of the instruction in a manner of not blocking character pel near character pel The text box collected, wherein the character to be translated corresponding to character pel is shown in text box.Personnel can to character pel and Character in text box is checked, if it exists character-recognition errors to be translated, can be modified in text box, third number It is based on text box and the second data set after editing according to collection generation module 109, generates third data set, wherein third data set Corresponding relationship including character to be translated, character pel and pel location information.
Optionally, with reference to Fig. 7, display unit 400 includes character glyph filling module 401 and translation character display module 402, wherein character glyph filling module 401 is used for based on character pel and corresponding pel location information in third data set, Background colour filling is carried out to character pel in the accompanying drawings, translation character display module 402 is according to translation character and attached drawing parameter Corresponding relationship shows translation character in character pel with default size and font.
In an alternative embodiment of the invention, character glyph filling module 401 is based on third data set, in location character pel Character to be translated pixel, with background colour or white filling black pixel point and formed background colour and white character figure The translation character in third data set is arranged with preset size and font at this for member, translation character display module 402 The appropriate location of pel.
In addition, succinct and beautiful for attached drawing, the text in attached drawing is frequently present of the situation of omission, simplification, such as saves Slightly preposition, attribute etc., and then cause to retrieve in text to be translated less than the character completely the same with character to be translated, but close The noun of key, such as technical term will not omit.Optionally, for the situation, in the embodiment of the present invention, the first relationship is established single Member 100 further includes technical term extraction module 110, when character retrieval module 201 retrieved in text to be translated less than with wait turn over When translating the identical character of character, technical term extraction module 110 extracts the technical term in character to be translated using segmentation methods, Character retrieval module 201 treats cypher text based on technical term and carries out retrieval matching, if retrieving matching in text to be translated To identical technical term, then relating module 202 is by character corresponding with technical term in text to be translated and character to be translated Association is established, in subsequent translation, guarantees translation consistency of the technical term in text and attached drawing.
The embodiment of the invention also provides a kind of computer readable storage mediums, are stored thereon with computer program, the journey The attached drawing interpretation method as described in the above embodiment of the present invention is any is realized when sequence is executed by processor.
Certainly, a kind of computer readable storage medium provided by the embodiment of the present invention, computer executable instructions are not It is limited to method operation as described above, can also be performed in attached drawing interpretation method provided by the above-mentioned any embodiment of the present invention Relevant operation.
By the description above with respect to embodiment, it is apparent to those skilled in the art that, the present invention It can be realized by software and required common hardware, naturally it is also possible to which by hardware realization, but in many cases, the former is more Good embodiment.Based on this understanding, technical solution of the present invention substantially in other words contributes to the prior art Part can be embodied in the form of software products, which can store in computer readable storage medium In, floppy disk, read-only memory (Read-Only Memory, ROM), random access memory (Random such as computer Access Memory, RAM), flash memory (FLASH), hard disk or CD etc., including some instructions are with so that a computer is set Standby (can be personal computer, server or the network equipment etc.) executes method described in each embodiment of the present invention.
It is worth noting that, included each unit and module are only pressed in the embodiment of above-mentioned attached drawing translating equipment It is divided, but is not limited to the above division according to function logic, as long as corresponding functions can be realized;In addition, The specific name of each functional unit is also only for convenience of distinguishing each other, the protection scope being not intended to restrict the invention.
Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art that The invention is not limited to the specific embodiments described herein, be able to carry out for a person skilled in the art it is various it is apparent variation, It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being carried out by above embodiments to the present invention It is described in further detail, but the present invention is not limited to the above embodiments only, without departing from the inventive concept, also It may include more other equivalent embodiments, and the scope of the invention is determined by the scope of the appended claims.

Claims (15)

1. a kind of attached drawing interpretation method characterized by comprising
Establish the corresponding relationship of attached drawing parameter Yu character to be translated, wherein the attached drawing parameter includes character to be translated in correspondence The pel location information of character pel and the character pel in respective figure in attached drawing, the character to be translated is from institute State the character that the needs identified in attached drawing are translated;
It treats character identical with the character to be translated in cypher text to be translated, obtains corresponding translation character;
Corresponding relationship based on attached drawing parameter Yu character to be translated establishes translation character pass corresponding with the attached drawing parameter System;
Based on the corresponding relationship of the translation character and the attached drawing parameter, the character pel is replaced with comprising the translation The translation pel of character.
2. attached drawing interpretation method according to claim 1, which is characterized in that described to establish attached drawing parameter and character to be translated Corresponding relationship, comprising:
Determine the character pel comprising character to be translated;
It extracts the character pel and the pel location information, generates the first data set, wherein first data set includes The corresponding relationship of the character pel and the pel location information;
Character recognition is carried out to the character pel in first data set, identifies character to be translated;
Based on first data set and the character to be translated identified, the second data set is generated, wherein second data set Corresponding relationship including character to be translated, character pel and pel location information.
3. attached drawing interpretation method according to claim 2, which is characterized in that after the first data set of the generation, also Include:
Based on first data set, the character pel is differently shown in the accompanying drawings;
The adjustment of character pel position is operated based on personnel, updates the character pel for needing adjustment in first data set And corresponding pel location information.
4. attached drawing interpretation method according to claim 3, which is characterized in that further include:
Highlight the attached drawing region to be selected except the character pel;
The character pel and corresponding pel location information that personnel are selected from the attached drawing region to be selected are added to described first In data set.
5. attached drawing interpretation method according to claim 4, which is characterized in that further include:
Based on second data set, visualized list is generated, wherein the visualized list includes pel column and editable Character pel is shown on identification information column, the pel column, and the identification information column is shown corresponding to the character pel wait turn over Translate character;
Based on to the edited visualized list in the identification information column and second data set, third data set is generated, Wherein, the third data set includes the corresponding relationship of character to be translated, character pel and pel location information.
6. attached drawing interpretation method according to claim 5, which is characterized in that it is described to be based on second data set, it generates Visualized list, comprising:
Based on the character pel and character to be translated in second data set, the pel column and the identification information are generated Column;
Based on the pel location information in second data set, the position of each character pel in the accompanying drawings is determined;
According to the sequence of positions of the character pel of setting, listed on the pel column and identification information column sequence described Character pel and the character to be translated.
7. attached drawing interpretation method according to claim 5, which is characterized in that the visualized list further includes confirmation column, To confirm whether the character to be translated in the identification information column is correct.
8. attached drawing interpretation method according to claim 5, which is characterized in that the visualized list further includes that can breathe out The recommendation character to match with the character to be translated is shown on recommendation information column, the recommendation information column, and the recommendation character is logical The text to be translated is crossed to obtain.
9. attached drawing interpretation method according to claim 4, which is characterized in that further include:
Based on second data set, editable is generated near the character pel in a manner of not blocking the character pel Text box, wherein the character to be translated corresponding to the character pel is shown in the text box;
Based on text box described after editor and second data set, third data set is generated, wherein the third data set Corresponding relationship including character to be translated, character pel and pel location information.
10. attached drawing interpretation method according to claim 2, which is characterized in that the determination includes the word of character to be translated Accord with pel, comprising:
Based on all word contents occurred in ORB algorithm positioning attached drawing;
The continuation character of default spacing is considered as character to be translated;
Rectangular element of the interception comprising the character to be translated is as character pel.
11. attached drawing interpretation method according to claim 2, which is characterized in that the attached drawing is the specification of patent document Attached drawing, the attached drawing interpretation method further include:
Extract the appended drawing reference in specification to be translated;
Characterized by the character of the appended drawing reference, determining that exclusion is comprising described when including the character pel of character to be translated The character pel of the character of appended drawing reference.
12. attached drawing interpretation method according to claim 1, which is characterized in that described that the character pel is replaced with packet Translation pel containing the translation character, comprising:
Background colour filling is carried out to the character pel;
The translation character is shown in the character pel with default size and font.
13. attached drawing interpretation method according to claim 1, which is characterized in that further include:
When retrieval is less than character identical with the character to be translated in the text to be translated, institute is extracted using segmentation methods State the technical term in character to be translated;
Retrieval matching is carried out to the text to be translated based on the technical term;
If retrieval is matched to identical technical term in the text to be translated, will in the text to be translated with it is described special The corresponding character of industry term is associated with the character foundation to be translated.
14. a kind of attached drawing translating equipment characterized by comprising
First relationship establishes unit, for establishing the corresponding relationship of attached drawing parameter Yu character to be translated, wherein the attached drawing parameter The pel location information of character pel and the character pel in respective figure including character to be translated in respective figure, The character to be translated is the character that the needs identified from the attached drawing are translated;
Translation unit is translated for treating character identical with the character to be translated in cypher text, is obtained corresponding Translate character;
Second relationship establishes unit, for the corresponding relationship based on attached drawing parameter Yu character to be translated, establishes the translation character With the corresponding relationship of the attached drawing parameter;
Display unit replaces the character pel for the corresponding relationship based on the translation character and the attached drawing parameter For the translation pel comprising the translation character.
15. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor The attached drawing interpretation method as described in any in claim 1-13 is realized when execution.
CN201811564424.5A 2018-12-20 2018-12-20 A kind of attached drawing interpretation method, device and storage medium Pending CN109657619A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811564424.5A CN109657619A (en) 2018-12-20 2018-12-20 A kind of attached drawing interpretation method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811564424.5A CN109657619A (en) 2018-12-20 2018-12-20 A kind of attached drawing interpretation method, device and storage medium

Publications (1)

Publication Number Publication Date
CN109657619A true CN109657619A (en) 2019-04-19

Family

ID=66115419

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811564424.5A Pending CN109657619A (en) 2018-12-20 2018-12-20 A kind of attached drawing interpretation method, device and storage medium

Country Status (1)

Country Link
CN (1) CN109657619A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110389807A (en) * 2019-07-23 2019-10-29 北京字节跳动网络技术有限公司 A kind of interface interpretation method, device, electronic equipment and storage medium
CN112085090A (en) * 2020-09-07 2020-12-15 百度在线网络技术(北京)有限公司 Translation method and device and electronic equipment
WO2021056782A1 (en) * 2019-09-25 2021-04-01 深圳传音控股股份有限公司 Image identification and translation method and apparatus, and terminal and medium
CN114237468A (en) * 2021-12-08 2022-03-25 文思海辉智科科技有限公司 Translation method and device for text and picture, electronic equipment and readable storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007035056A (en) * 2006-08-29 2007-02-08 Ebook Initiative Japan Co Ltd Translation information generating apparatus and method, and computer program
CN103294665A (en) * 2012-02-22 2013-09-11 汉王科技股份有限公司 Text translation method for electronic reader and electronic reader
CN104517107A (en) * 2014-12-22 2015-04-15 央视国际网络无锡有限公司 Method for translating image words in real time on basis of wearable equipment
CN107609553A (en) * 2017-09-12 2018-01-19 网易有道信息技术(北京)有限公司 image processing method, medium, device and computing device
CN108182184A (en) * 2017-12-27 2018-06-19 北京百度网讯科技有限公司 Picture character interpretation method, application and computer equipment
CN108182183A (en) * 2017-12-27 2018-06-19 北京百度网讯科技有限公司 Picture character interpretation method, application and computer equipment
WO2018174603A1 (en) * 2017-03-22 2018-09-27 (주)광개토연구소 Method and device for displaying explanation of reference numeral in patent drawing image using artificial intelligence technology based machine learning

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007035056A (en) * 2006-08-29 2007-02-08 Ebook Initiative Japan Co Ltd Translation information generating apparatus and method, and computer program
CN103294665A (en) * 2012-02-22 2013-09-11 汉王科技股份有限公司 Text translation method for electronic reader and electronic reader
CN104517107A (en) * 2014-12-22 2015-04-15 央视国际网络无锡有限公司 Method for translating image words in real time on basis of wearable equipment
WO2018174603A1 (en) * 2017-03-22 2018-09-27 (주)광개토연구소 Method and device for displaying explanation of reference numeral in patent drawing image using artificial intelligence technology based machine learning
CN107609553A (en) * 2017-09-12 2018-01-19 网易有道信息技术(北京)有限公司 image processing method, medium, device and computing device
CN108182184A (en) * 2017-12-27 2018-06-19 北京百度网讯科技有限公司 Picture character interpretation method, application and computer equipment
CN108182183A (en) * 2017-12-27 2018-06-19 北京百度网讯科技有限公司 Picture character interpretation method, application and computer equipment

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110389807A (en) * 2019-07-23 2019-10-29 北京字节跳动网络技术有限公司 A kind of interface interpretation method, device, electronic equipment and storage medium
CN110389807B (en) * 2019-07-23 2022-10-25 北京字节跳动网络技术有限公司 Interface translation method and device, electronic equipment and storage medium
WO2021056782A1 (en) * 2019-09-25 2021-04-01 深圳传音控股股份有限公司 Image identification and translation method and apparatus, and terminal and medium
CN112085090A (en) * 2020-09-07 2020-12-15 百度在线网络技术(北京)有限公司 Translation method and device and electronic equipment
CN114237468A (en) * 2021-12-08 2022-03-25 文思海辉智科科技有限公司 Translation method and device for text and picture, electronic equipment and readable storage medium
CN114237468B (en) * 2021-12-08 2024-01-16 文思海辉智科科技有限公司 Text and picture translation method and device, electronic equipment and readable storage medium

Similar Documents

Publication Publication Date Title
CN109657619A (en) A kind of attached drawing interpretation method, device and storage medium
US10699111B2 (en) Page segmentation of vector graphics documents
US8014604B2 (en) OCR of books by word recognition
JP3822277B2 (en) Character template set learning machine operation method
CN107480144B (en) Method and device for generating image natural language description with cross-language learning capability
US6252988B1 (en) Method and apparatus for character recognition using stop words
CN111597908A (en) Test paper correcting method and test paper correcting device
US8331680B2 (en) Method of gray-level optical segmentation and isolation using incremental connected components
CN110114776A (en) Use the system and method for the character recognition of full convolutional neural networks
CN113408535B (en) OCR error correction method based on Chinese character level features and language model
CN110287484B (en) Chinese text description face image generation method based on face features
Pacha et al. Towards self-learning optical music recognition
CN113177435A (en) Test paper analysis method and device, storage medium and electronic equipment
AU2006223761B2 (en) Method and system for adaptive recognition of distorted text in computer images
CN113065396A (en) Automatic filing processing system and method for scanned archive image based on deep learning
CN104951749A (en) Image content recognition device and image content recognition method
CN107194337A (en) A kind of intelligence of non-selection topic reads and makes comments method
CN112949649B (en) Text image identification method and device and computing equipment
CN109147002B (en) Image processing method and device
CN109508712A (en) A kind of Chinese written language recognition methods based on image
CN112836528B (en) Machine post-translation editing method and system
Nederhof et al. OCR of handwritten transcriptions of Ancient Egyptian hieroglyphic text
CN114332898A (en) Automatic correcting method and device for connection test questions and storage medium
CN115543915A (en) Automatic database building method and system for personnel file directory
CN111310457B (en) Word mismatching recognition method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190419

RJ01 Rejection of invention patent application after publication