CN101980133B - Method and system for detecting text selection region deviation of double-layer electronic file - Google Patents

Method and system for detecting text selection region deviation of double-layer electronic file Download PDF

Info

Publication number
CN101980133B
CN101980133B CN2010105311511A CN201010531151A CN101980133B CN 101980133 B CN101980133 B CN 101980133B CN 2010105311511 A CN2010105311511 A CN 2010105311511A CN 201010531151 A CN201010531151 A CN 201010531151A CN 101980133 B CN101980133 B CN 101980133B
Authority
CN
China
Prior art keywords
double
character
layer
file
deck
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2010105311511A
Other languages
Chinese (zh)
Other versions
CN101980133A (en
Inventor
周长岭
赵海涛
兰荣春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Founder International Co Ltd
Original Assignee
Founder International Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Founder International Co Ltd filed Critical Founder International Co Ltd
Priority to CN2010105311511A priority Critical patent/CN101980133B/en
Publication of CN101980133A publication Critical patent/CN101980133A/en
Application granted granted Critical
Publication of CN101980133B publication Critical patent/CN101980133B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Character Input (AREA)

Abstract

The invention discloses a method and a system for detecting text selection region deviation of a double-layer electronic file and aims to solve the problem of poor visual effect of the text selection region of the double-layer electronic file in the prior art. The method comprises the following steps of: acquiring the block range of a single character of a double-layer electronic file text layer and acquiring the external rectangular range of a single character pattern of a double-layer electronic file image layer; calculating the difference between the block range and the external rectangular range corresponding to the same character in a height direction and a width direction; and generating a prompt message when at least one item of difference is more than a preset value. According to the technical scheme of the invention, the derivation state of the text selection region can provide prompt for a user; the user can refer to the derivation state and adjust the font size; the text selection region and the font region in the double-layer electronic file can be aligned accurately; and user experience is improved.

Description

The detection method and the system of double-deck e-file text selecting region difference
Technical field
The present invention relates to a kind of detection method and system of double-deck e-file text selecting region difference.
Background technology
Portable Document format (Portable Document Format; PDF) double-deck e-file is books, file for example; Be the pdf document that image layer and character layer are arranged, image layer is at the former book of the last demonstration space of a whole page of paper book for example, above each word be actually a font image; Character layer does not show below image layer, the electronic edition text of include file in the character layer, and (Optical Character Recognition, mode OCR) obtains to typically use optical character identification.The font image of the text of character layer and image layer is pressed the word contraposition, and the below of each word of image layer is exactly the text of this word in character layer.
The user is when the selection tool of the ocr software that uses double-deck e-file, because the relation of above-mentioned contraposition, so just can choose the text of user's needs of character layer according to the demonstration of image layer.Fig. 1 is the synoptic diagram that carries out text selecting according to the ocr software that makes double-deck e-file in the prior art.As shown in Figure 1; In the text block in square frame 10, zone selected when the ocr software of double-deck e-file is selected the user becomes black, but the area coincidence degree of the text of black region and demonstration is lower; For example the black region in square frame 11 and the square frame 12 does not cover selected character fully; Visual effect is relatively poor like this, particularly under the less situation of line space, has more influenced user's experience.
The visual effect in the text selecting zone of existing double-deck e-file is not good, for this problem, does not propose effective solution at present as yet.
Summary of the invention
Fundamental purpose of the present invention provides a kind of detection method and system of double-deck e-file text selecting region difference, in order to the regional not good problem of visual effect of text selecting that solves double-deck e-file in the prior art.
For addressing the above problem, according to an aspect of the present invention, a kind of detection method of double-deck e-file text selecting region difference is provided.
The detection method of double-deck e-file text selecting region difference of the present invention comprises: obtain the coordinate of the single character of double-deck e-file character layer at character layer; Is the coordinate in the image layer of said double-deck e-file with said single character in the coordinate conversion of character layer, confirms the block scope of said single character at the coordinate of said image layer according to the single character after the conversion; And search for the border of said single font image, confirm the boundary rectangle scope of the single font image of said double-deck e-file image layer according to said border; Calculating in the difference of short transverse and Width, generates information under the situation of at least one difference greater than preset value corresponding to the said block scope of same character and said boundary rectangle scope therein.
Further, generate information and comprise: on the font image of said image layer, add rectangle frame, the block scope to the character that is positioned at this font image in the size of this rectangle frame and the said character layer is identical.
Further, said double-deck e-file is the file of Portable Document format (Portable Document Format).
For addressing the above problem, according to a further aspect in the invention, a kind of detection system of double-deck e-file text selecting region difference is provided.
The detection system of double-deck e-file text selecting region difference of the present invention comprises: first acquisition module is used to obtain the block scope of the single character of double-deck e-file character layer; Also be used for: obtain the coordinate of the single character of double-deck e-file character layer at character layer; Is the coordinate in the image layer of said double-deck e-file with said single character in the coordinate conversion of character layer, confirms the block scope of said single character at the coordinate of said image layer according to the single character after the conversion; Second acquisition module is used to obtain the boundary rectangle scope of the single font image of said double-deck e-file image layer; Also be used for: search for the border of said single font image, confirm the boundary rectangle scope of the single font image of said double-deck e-file image layer according to said border; Computing module is used to calculate corresponding to the said block scope of same character and the said boundary rectangle scope difference at short transverse and Width; Output module generates information under the situation of at least one difference of two difference greater than preset value that is used for drawing at said computing module.
Further, said output module also is used for: on the font image of said image layer, add rectangle frame, the block scope to the character that is positioned at this font image in the size of this rectangle frame and the said character layer is identical.
According to technical scheme of the present invention; Boundary rectangle scope according to the single font image of the block scope of the single character of double-deck e-file character layer and image layer; Confirm corresponding to the said block scope of same character and said boundary rectangle scope difference at short transverse and Width; So just can be according to this difference prompting user; Supply its with reference to and the size of font adjusted, make the text selecting zone of in double-deck e-file, making accurate with font zone contraposition, improve user's experience.
Description of drawings
Accompanying drawing described herein is used to provide further understanding of the present invention, constitutes the application's a part, and illustrative examples of the present invention and explanation thereof are used to explain the present invention, do not constitute improper qualification of the present invention.In the accompanying drawings:
Fig. 1 is the synoptic diagram that carries out text selecting according to the ocr software that makes double-deck e-file in the prior art;
Fig. 2 is the synoptic diagram according to the key step of the detection method of the double-deck e-file text selecting region difference of the embodiment of the invention;
Fig. 3 is the synoptic diagram according to the output text selecting region difference information of the embodiment of the invention; And
Fig. 4 is the synoptic diagram according to the main modular of the detection system of the double-deck e-file text selecting region difference of the embodiment of the invention.
Embodiment
Below with reference to accompanying drawing and combine embodiment, specify the present invention.
Fig. 2 is the synoptic diagram according to the key step of the detection method of the double-deck e-file text selecting region difference of the embodiment of the invention.As shown in Figure 2, this method comprises that following step S21 is to step S24.
Step S21: obtain the block scope of the single character of double-deck e-file character layer, and obtain the boundary rectangle scope of the single font image of double-deck e-file image layer.
In this step, the block scope of obtaining the single character of double-deck e-file character layer specifically can adopt following steps: obtain the coordinate of the single character of double-deck e-file character layer at character layer; Is the coordinate in the image layer of said double-deck e-file with said single character in the coordinate conversion of character layer; Confirm the block scope of said single character at the coordinate of image layer according to the single character after the conversion.
In this step, the boundary rectangle scope of obtaining the single font image of double-deck e-file image layer specifically can adopt following steps: the border of searching for single font image; Confirm the boundary rectangle scope of the single font image of said double-deck e-file image layer according to said border.
Step S22: calculate corresponding to the said block scope of same character and said boundary rectangle scope difference at short transverse and Width.Because said block scope and the said boundary rectangle scope corresponding to same character can be confirmed corresponding to the text of this word in character layer in each block below of image layer therefore.
Step S23: whether two differences that draw among the determining step S22 are greater than preset value, if wherein at least one difference then gets into step S24 greater than preset value; Otherwise return step S21, repeatedly do not obtain the block scope of character late.The preset value here can be two values, respectively at be used for two diversity ratios, also can be a value.Preset value can be set with the sharp degree that the user observes by the character boundary when showing, for example is set at 1.5mm or 2mm, and the user can obviously see and selected zone and font zone to have deviation this moment.
Step S24: generate information.Generating information can be on the font image of image layer, to add rectangle frame, and the block scope to the character that is positioned at this font image in the size of this rectangle frame and the character layer is identical.After generating information, can be according to this information of instruction output of user.Mode with above-mentioned interpolation rectangle frame is an example, and the output of information is as shown in Figure 3, and Fig. 3 is the synoptic diagram according to the output text selecting region difference information of the embodiment of the invention.
In Fig. 3, the text block in the square frame 30 is through detecting, and confirming wherein has the selection zone of some texts to have deviation, and concrete example is shown in square frame 31, square frame 32, square frame 33, square frame 34, square frame 35.Be the state of image layer in the square frame 30,35 frames of square frame 31 to square frame have gone out the scope of text layers literal.From Fig. 3, can clearly see having which text after selecting, to have the selection region difference, the prompting that can reference class be similar among Fig. 3 of user is adjusted the size of these texts in character layer like this.
Fig. 4 is the synoptic diagram according to the main modular of the detection system of the double-deck e-file text selecting region difference of the embodiment of the invention.As shown in Figure 4, the detection system 40 of double-deck e-file text selecting region difference mainly comprises first acquisition module, second acquisition module, computing module, output module.
First acquisition module is used to obtain the block scope of the single character of double-deck e-file character layer; Second acquisition module is used to obtain the boundary rectangle scope of the single font image of said double-deck e-file image layer; Computing module is used to calculate corresponding to the said block scope of same character and the said boundary rectangle scope difference at short transverse and Width; Output module generates information under the situation of at least one difference of two difference greater than preset value that is used for drawing at said computing module.
In addition, first acquisition module can also be used for: obtain the coordinate of the single character of double-deck e-file character layer at character layer; Is the coordinate in the image layer of said double-deck e-file with said single character in the coordinate conversion of character layer; Confirm the block scope of said single character at the coordinate of said image layer according to the single character after the conversion.
Second acquisition module can also be used for: the border of searching for said single font image; Confirm the boundary rectangle scope of the single font image of double-deck e-file image layer according to said border.
Output module can also be used for: on the font image of said image layer, add rectangle frame, the block scope to the character that is positioned at this font image in the size of this rectangle frame and the character layer is identical.
Technical scheme according to the embodiment of the invention; Boundary rectangle scope according to the single font image of the block scope of the single character of double-deck e-file character layer and image layer; Confirm corresponding to the said block scope of same character and said boundary rectangle scope difference at short transverse and Width; So just can be according to this difference prompting user; Supply its with reference to and the size of font adjusted, the text selecting zone that the text selecting instrument of the ocr software of double-deck e-file is made is accurate with font zone contraposition, improves user's experience.
Obviously, it is apparent to those skilled in the art that above-mentioned each module of the present invention or each step can realize with the general calculation device; They can concentrate on the single calculation element; Perhaps be distributed on the network that a plurality of calculation element forms, alternatively, they can be realized with the executable program code of calculation element; Thereby; Can they be stored in the memory storage and carry out, perhaps they are made into each integrated circuit modules respectively, perhaps a plurality of modules in them or step are made into the single integrated circuit module and realize by calculation element.Like this, the present invention is not restricted to any specific hardware and software combination.
The above is merely the preferred embodiments of the present invention, is not limited to the present invention, and for a person skilled in the art, the present invention can have various changes and variation.All within spirit of the present invention and principle, any modification of being done, be equal to replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (5)

1. the detection method of a double-deck e-file text selecting region difference is characterized in that, comprising:
Obtain the coordinate of the single character of double-deck e-file character layer at character layer; Is the coordinate in the image layer of said double-deck e-file with said single character in the coordinate conversion of character layer, confirms the block scope of said single character at the coordinate of said image layer according to the single character after the conversion; And search for the border of the single font image of said double-deck e-file image layer, confirm the boundary rectangle scope of the single font image of said double-deck e-file image layer according to said border;
Calculating in the difference of short transverse and Width, generates information under the situation of at least one difference greater than preset value corresponding to the said block scope of same character and said boundary rectangle scope therein.
2. detection method according to claim 1; It is characterized in that; The generation information comprises: on the font image of said image layer, add rectangle frame, the block scope to the character that is positioned at this font image in the size of this rectangle frame and the said character layer is identical.
3. detection method according to claim 1 and 2 is characterized in that, said double-deck e-file is the file of Portable Document format.
4. the detection system of a double-deck e-file text selecting region difference is characterized in that, comprising:
First acquisition module; Be used to obtain the coordinate of the single character of double-deck e-file character layer at character layer; Is the coordinate in the image layer of said double-deck e-file with said single character in the coordinate conversion of character layer, confirms the block scope of said single character at the coordinate of said image layer according to the single character after the conversion;
Second acquisition module is used to search for the border of the single font image of said double-deck e-file image layer, confirms the boundary rectangle scope of the single font image of said double-deck e-file image layer according to said border;
Computing module is used to calculate corresponding to the said block scope of same character and the said boundary rectangle scope difference at short transverse and Width;
Output module generates information under the situation of at least one difference of two difference greater than preset value that is used for drawing at said computing module.
5. detection system according to claim 4; It is characterized in that; Said output module also is used for: on the font image of said image layer, add rectangle frame, the block scope to the character that is positioned at this font image in the size of this rectangle frame and the said character layer is identical.
CN2010105311511A 2010-10-29 2010-10-29 Method and system for detecting text selection region deviation of double-layer electronic file Expired - Fee Related CN101980133B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2010105311511A CN101980133B (en) 2010-10-29 2010-10-29 Method and system for detecting text selection region deviation of double-layer electronic file

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010105311511A CN101980133B (en) 2010-10-29 2010-10-29 Method and system for detecting text selection region deviation of double-layer electronic file

Publications (2)

Publication Number Publication Date
CN101980133A CN101980133A (en) 2011-02-23
CN101980133B true CN101980133B (en) 2012-07-04

Family

ID=43600639

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010105311511A Expired - Fee Related CN101980133B (en) 2010-10-29 2010-10-29 Method and system for detecting text selection region deviation of double-layer electronic file

Country Status (1)

Country Link
CN (1) CN101980133B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102968407B (en) * 2011-08-31 2015-09-09 汉王科技股份有限公司 The building method of double-layer PDF file and device
CN103176957B (en) * 2011-12-21 2016-08-03 北大方正集团有限公司 The treating method and apparatus of file
CN104166849B (en) * 2013-05-17 2017-04-19 北大方正集团有限公司 Electronic document identification method and apparatus
CN109298819B (en) * 2018-09-21 2021-03-16 Oppo广东移动通信有限公司 Method, device, terminal and storage medium for selecting object
CN112667115B (en) * 2020-12-22 2023-07-25 科大讯飞股份有限公司 Text display method, electronic equipment and storage device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4440513A (en) * 1981-03-12 1984-04-03 Fuji Xerox Co., Ltd. Character shaping device
CN1383516A (en) * 2000-07-05 2002-12-04 八万系统有限公司 Proofreading system of Chinese characters by means of one-to-one comparision
CN101782896A (en) * 2009-01-21 2010-07-21 汉王科技股份有限公司 PDF character extraction method combined with OCR technology

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4440513A (en) * 1981-03-12 1984-04-03 Fuji Xerox Co., Ltd. Character shaping device
CN1383516A (en) * 2000-07-05 2002-12-04 八万系统有限公司 Proofreading system of Chinese characters by means of one-to-one comparision
CN101782896A (en) * 2009-01-21 2010-07-21 汉王科技股份有限公司 PDF character extraction method combined with OCR technology

Also Published As

Publication number Publication date
CN101980133A (en) 2011-02-23

Similar Documents

Publication Publication Date Title
CN101980133B (en) Method and system for detecting text selection region deviation of double-layer electronic file
CN102201009A (en) Form generating method and device
US20070129887A1 (en) Map information system and map information processing method and program
CN101656024A (en) Electronic learning device and realizing method thereof
CN104036060A (en) Online auditing method and system for engineering drawing
CN102681978A (en) Method and system for displaying text in PDF (portable document format) document
CN104765721B (en) Layout processing method and processing device
CN105608119A (en) Rapid thematic map drawing technology
CN106648479A (en) Printing method and apparatus
CN105278961A (en) Method and system for generating database table structure document
CN101017486A (en) Method of finding company position by business card scanning
CN101424998A (en) Document page display method and system
CN101324833B (en) Method and apparatus for saving page resource
CN103838763A (en) Object file generation system and method
CN104915666A (en) Information card information positioning acquisition method based on paper-made image scanning
CN104765266A (en) Simulation clock display method and device and LED display control card
KR101516213B1 (en) Responsive Web Generating Method By Converting Document To Responsive Web
CN103488440A (en) Bill printing device and bill printing method
CN106708801A (en) Proofreading method used for text
CN102442047B (en) Label processing method and device for board combination
CN103605640B (en) Form adaption method and device
CN103106270B (en) cloud data fusion method and system
CN102542074B (en) Demonstration and search tool of topological relationship of elements
CN107704445A (en) A kind of list makes a report on method and device
JP6869299B2 (en) Quadrature table judgment device and control program

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20120704

Termination date: 20141029

EXPY Termination of patent right or utility model