CN101980133B - Method and system for detecting text selection region deviation of double-layer electronic file - Google Patents
Method and system for detecting text selection region deviation of double-layer electronic file Download PDFInfo
- Publication number
- CN101980133B CN101980133B CN2010105311511A CN201010531151A CN101980133B CN 101980133 B CN101980133 B CN 101980133B CN 2010105311511 A CN2010105311511 A CN 2010105311511A CN 201010531151 A CN201010531151 A CN 201010531151A CN 101980133 B CN101980133 B CN 101980133B
- Authority
- CN
- China
- Prior art keywords
- double
- character
- layer
- file
- deck
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title abstract description 4
- 238000001514 detection method Methods 0.000 claims description 17
- 238000006243 chemical reaction Methods 0.000 claims description 12
- 230000000007 visual effect Effects 0.000 abstract description 4
- 238000009795 derivation Methods 0.000 abstract 2
- 238000010586 diagram Methods 0.000 description 8
- 238000004364 calculation method Methods 0.000 description 5
- 238000012015 optical character recognition Methods 0.000 description 2
- 239000012141 concentrate Substances 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012797 qualification Methods 0.000 description 1
Images
Landscapes
- Character Input (AREA)
Abstract
The invention discloses a method and a system for detecting text selection region deviation of a double-layer electronic file and aims to solve the problem of poor visual effect of the text selection region of the double-layer electronic file in the prior art. The method comprises the following steps of: acquiring the block range of a single character of a double-layer electronic file text layer and acquiring the external rectangular range of a single character pattern of a double-layer electronic file image layer; calculating the difference between the block range and the external rectangular range corresponding to the same character in a height direction and a width direction; and generating a prompt message when at least one item of difference is more than a preset value. According to the technical scheme of the invention, the derivation state of the text selection region can provide prompt for a user; the user can refer to the derivation state and adjust the font size; the text selection region and the font region in the double-layer electronic file can be aligned accurately; and user experience is improved.
Description
Technical field
The present invention relates to a kind of detection method and system of double-deck e-file text selecting region difference.
Background technology
Portable Document format (Portable Document Format; PDF) double-deck e-file is books, file for example; Be the pdf document that image layer and character layer are arranged, image layer is at the former book of the last demonstration space of a whole page of paper book for example, above each word be actually a font image; Character layer does not show below image layer, the electronic edition text of include file in the character layer, and (Optical Character Recognition, mode OCR) obtains to typically use optical character identification.The font image of the text of character layer and image layer is pressed the word contraposition, and the below of each word of image layer is exactly the text of this word in character layer.
The user is when the selection tool of the ocr software that uses double-deck e-file, because the relation of above-mentioned contraposition, so just can choose the text of user's needs of character layer according to the demonstration of image layer.Fig. 1 is the synoptic diagram that carries out text selecting according to the ocr software that makes double-deck e-file in the prior art.As shown in Figure 1; In the text block in square frame 10, zone selected when the ocr software of double-deck e-file is selected the user becomes black, but the area coincidence degree of the text of black region and demonstration is lower; For example the black region in square frame 11 and the square frame 12 does not cover selected character fully; Visual effect is relatively poor like this, particularly under the less situation of line space, has more influenced user's experience.
The visual effect in the text selecting zone of existing double-deck e-file is not good, for this problem, does not propose effective solution at present as yet.
Summary of the invention
Fundamental purpose of the present invention provides a kind of detection method and system of double-deck e-file text selecting region difference, in order to the regional not good problem of visual effect of text selecting that solves double-deck e-file in the prior art.
For addressing the above problem, according to an aspect of the present invention, a kind of detection method of double-deck e-file text selecting region difference is provided.
The detection method of double-deck e-file text selecting region difference of the present invention comprises: obtain the coordinate of the single character of double-deck e-file character layer at character layer; Is the coordinate in the image layer of said double-deck e-file with said single character in the coordinate conversion of character layer, confirms the block scope of said single character at the coordinate of said image layer according to the single character after the conversion; And search for the border of said single font image, confirm the boundary rectangle scope of the single font image of said double-deck e-file image layer according to said border; Calculating in the difference of short transverse and Width, generates information under the situation of at least one difference greater than preset value corresponding to the said block scope of same character and said boundary rectangle scope therein.
Further, generate information and comprise: on the font image of said image layer, add rectangle frame, the block scope to the character that is positioned at this font image in the size of this rectangle frame and the said character layer is identical.
Further, said double-deck e-file is the file of Portable Document format (Portable Document Format).
For addressing the above problem, according to a further aspect in the invention, a kind of detection system of double-deck e-file text selecting region difference is provided.
The detection system of double-deck e-file text selecting region difference of the present invention comprises: first acquisition module is used to obtain the block scope of the single character of double-deck e-file character layer; Also be used for: obtain the coordinate of the single character of double-deck e-file character layer at character layer; Is the coordinate in the image layer of said double-deck e-file with said single character in the coordinate conversion of character layer, confirms the block scope of said single character at the coordinate of said image layer according to the single character after the conversion; Second acquisition module is used to obtain the boundary rectangle scope of the single font image of said double-deck e-file image layer; Also be used for: search for the border of said single font image, confirm the boundary rectangle scope of the single font image of said double-deck e-file image layer according to said border; Computing module is used to calculate corresponding to the said block scope of same character and the said boundary rectangle scope difference at short transverse and Width; Output module generates information under the situation of at least one difference of two difference greater than preset value that is used for drawing at said computing module.
Further, said output module also is used for: on the font image of said image layer, add rectangle frame, the block scope to the character that is positioned at this font image in the size of this rectangle frame and the said character layer is identical.
According to technical scheme of the present invention; Boundary rectangle scope according to the single font image of the block scope of the single character of double-deck e-file character layer and image layer; Confirm corresponding to the said block scope of same character and said boundary rectangle scope difference at short transverse and Width; So just can be according to this difference prompting user; Supply its with reference to and the size of font adjusted, make the text selecting zone of in double-deck e-file, making accurate with font zone contraposition, improve user's experience.
Description of drawings
Accompanying drawing described herein is used to provide further understanding of the present invention, constitutes the application's a part, and illustrative examples of the present invention and explanation thereof are used to explain the present invention, do not constitute improper qualification of the present invention.In the accompanying drawings:
Fig. 1 is the synoptic diagram that carries out text selecting according to the ocr software that makes double-deck e-file in the prior art;
Fig. 2 is the synoptic diagram according to the key step of the detection method of the double-deck e-file text selecting region difference of the embodiment of the invention;
Fig. 3 is the synoptic diagram according to the output text selecting region difference information of the embodiment of the invention; And
Fig. 4 is the synoptic diagram according to the main modular of the detection system of the double-deck e-file text selecting region difference of the embodiment of the invention.
Embodiment
Below with reference to accompanying drawing and combine embodiment, specify the present invention.
Fig. 2 is the synoptic diagram according to the key step of the detection method of the double-deck e-file text selecting region difference of the embodiment of the invention.As shown in Figure 2, this method comprises that following step S21 is to step S24.
Step S21: obtain the block scope of the single character of double-deck e-file character layer, and obtain the boundary rectangle scope of the single font image of double-deck e-file image layer.
In this step, the block scope of obtaining the single character of double-deck e-file character layer specifically can adopt following steps: obtain the coordinate of the single character of double-deck e-file character layer at character layer; Is the coordinate in the image layer of said double-deck e-file with said single character in the coordinate conversion of character layer; Confirm the block scope of said single character at the coordinate of image layer according to the single character after the conversion.
In this step, the boundary rectangle scope of obtaining the single font image of double-deck e-file image layer specifically can adopt following steps: the border of searching for single font image; Confirm the boundary rectangle scope of the single font image of said double-deck e-file image layer according to said border.
Step S22: calculate corresponding to the said block scope of same character and said boundary rectangle scope difference at short transverse and Width.Because said block scope and the said boundary rectangle scope corresponding to same character can be confirmed corresponding to the text of this word in character layer in each block below of image layer therefore.
Step S23: whether two differences that draw among the determining step S22 are greater than preset value, if wherein at least one difference then gets into step S24 greater than preset value; Otherwise return step S21, repeatedly do not obtain the block scope of character late.The preset value here can be two values, respectively at be used for two diversity ratios, also can be a value.Preset value can be set with the sharp degree that the user observes by the character boundary when showing, for example is set at 1.5mm or 2mm, and the user can obviously see and selected zone and font zone to have deviation this moment.
Step S24: generate information.Generating information can be on the font image of image layer, to add rectangle frame, and the block scope to the character that is positioned at this font image in the size of this rectangle frame and the character layer is identical.After generating information, can be according to this information of instruction output of user.Mode with above-mentioned interpolation rectangle frame is an example, and the output of information is as shown in Figure 3, and Fig. 3 is the synoptic diagram according to the output text selecting region difference information of the embodiment of the invention.
In Fig. 3, the text block in the square frame 30 is through detecting, and confirming wherein has the selection zone of some texts to have deviation, and concrete example is shown in square frame 31, square frame 32, square frame 33, square frame 34, square frame 35.Be the state of image layer in the square frame 30,35 frames of square frame 31 to square frame have gone out the scope of text layers literal.From Fig. 3, can clearly see having which text after selecting, to have the selection region difference, the prompting that can reference class be similar among Fig. 3 of user is adjusted the size of these texts in character layer like this.
Fig. 4 is the synoptic diagram according to the main modular of the detection system of the double-deck e-file text selecting region difference of the embodiment of the invention.As shown in Figure 4, the detection system 40 of double-deck e-file text selecting region difference mainly comprises first acquisition module, second acquisition module, computing module, output module.
First acquisition module is used to obtain the block scope of the single character of double-deck e-file character layer; Second acquisition module is used to obtain the boundary rectangle scope of the single font image of said double-deck e-file image layer; Computing module is used to calculate corresponding to the said block scope of same character and the said boundary rectangle scope difference at short transverse and Width; Output module generates information under the situation of at least one difference of two difference greater than preset value that is used for drawing at said computing module.
In addition, first acquisition module can also be used for: obtain the coordinate of the single character of double-deck e-file character layer at character layer; Is the coordinate in the image layer of said double-deck e-file with said single character in the coordinate conversion of character layer; Confirm the block scope of said single character at the coordinate of said image layer according to the single character after the conversion.
Second acquisition module can also be used for: the border of searching for said single font image; Confirm the boundary rectangle scope of the single font image of double-deck e-file image layer according to said border.
Output module can also be used for: on the font image of said image layer, add rectangle frame, the block scope to the character that is positioned at this font image in the size of this rectangle frame and the character layer is identical.
Technical scheme according to the embodiment of the invention; Boundary rectangle scope according to the single font image of the block scope of the single character of double-deck e-file character layer and image layer; Confirm corresponding to the said block scope of same character and said boundary rectangle scope difference at short transverse and Width; So just can be according to this difference prompting user; Supply its with reference to and the size of font adjusted, the text selecting zone that the text selecting instrument of the ocr software of double-deck e-file is made is accurate with font zone contraposition, improves user's experience.
Obviously, it is apparent to those skilled in the art that above-mentioned each module of the present invention or each step can realize with the general calculation device; They can concentrate on the single calculation element; Perhaps be distributed on the network that a plurality of calculation element forms, alternatively, they can be realized with the executable program code of calculation element; Thereby; Can they be stored in the memory storage and carry out, perhaps they are made into each integrated circuit modules respectively, perhaps a plurality of modules in them or step are made into the single integrated circuit module and realize by calculation element.Like this, the present invention is not restricted to any specific hardware and software combination.
The above is merely the preferred embodiments of the present invention, is not limited to the present invention, and for a person skilled in the art, the present invention can have various changes and variation.All within spirit of the present invention and principle, any modification of being done, be equal to replacement, improvement etc., all should be included within protection scope of the present invention.
Claims (5)
1. the detection method of a double-deck e-file text selecting region difference is characterized in that, comprising:
Obtain the coordinate of the single character of double-deck e-file character layer at character layer; Is the coordinate in the image layer of said double-deck e-file with said single character in the coordinate conversion of character layer, confirms the block scope of said single character at the coordinate of said image layer according to the single character after the conversion; And search for the border of the single font image of said double-deck e-file image layer, confirm the boundary rectangle scope of the single font image of said double-deck e-file image layer according to said border;
Calculating in the difference of short transverse and Width, generates information under the situation of at least one difference greater than preset value corresponding to the said block scope of same character and said boundary rectangle scope therein.
2. detection method according to claim 1; It is characterized in that; The generation information comprises: on the font image of said image layer, add rectangle frame, the block scope to the character that is positioned at this font image in the size of this rectangle frame and the said character layer is identical.
3. detection method according to claim 1 and 2 is characterized in that, said double-deck e-file is the file of Portable Document format.
4. the detection system of a double-deck e-file text selecting region difference is characterized in that, comprising:
First acquisition module; Be used to obtain the coordinate of the single character of double-deck e-file character layer at character layer; Is the coordinate in the image layer of said double-deck e-file with said single character in the coordinate conversion of character layer, confirms the block scope of said single character at the coordinate of said image layer according to the single character after the conversion;
Second acquisition module is used to search for the border of the single font image of said double-deck e-file image layer, confirms the boundary rectangle scope of the single font image of said double-deck e-file image layer according to said border;
Computing module is used to calculate corresponding to the said block scope of same character and the said boundary rectangle scope difference at short transverse and Width;
Output module generates information under the situation of at least one difference of two difference greater than preset value that is used for drawing at said computing module.
5. detection system according to claim 4; It is characterized in that; Said output module also is used for: on the font image of said image layer, add rectangle frame, the block scope to the character that is positioned at this font image in the size of this rectangle frame and the said character layer is identical.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2010105311511A CN101980133B (en) | 2010-10-29 | 2010-10-29 | Method and system for detecting text selection region deviation of double-layer electronic file |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2010105311511A CN101980133B (en) | 2010-10-29 | 2010-10-29 | Method and system for detecting text selection region deviation of double-layer electronic file |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101980133A CN101980133A (en) | 2011-02-23 |
CN101980133B true CN101980133B (en) | 2012-07-04 |
Family
ID=43600639
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2010105311511A Expired - Fee Related CN101980133B (en) | 2010-10-29 | 2010-10-29 | Method and system for detecting text selection region deviation of double-layer electronic file |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101980133B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102968407B (en) * | 2011-08-31 | 2015-09-09 | 汉王科技股份有限公司 | The building method of double-layer PDF file and device |
CN103176957B (en) * | 2011-12-21 | 2016-08-03 | 北大方正集团有限公司 | The treating method and apparatus of file |
CN104166849B (en) * | 2013-05-17 | 2017-04-19 | 北大方正集团有限公司 | Electronic document identification method and apparatus |
CN109298819B (en) * | 2018-09-21 | 2021-03-16 | Oppo广东移动通信有限公司 | Method, device, terminal and storage medium for selecting object |
CN112667115B (en) * | 2020-12-22 | 2023-07-25 | 科大讯飞股份有限公司 | Text display method, electronic equipment and storage device |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4440513A (en) * | 1981-03-12 | 1984-04-03 | Fuji Xerox Co., Ltd. | Character shaping device |
CN1383516A (en) * | 2000-07-05 | 2002-12-04 | 八万系统有限公司 | Proofreading system of Chinese characters by means of one-to-one comparision |
CN101782896A (en) * | 2009-01-21 | 2010-07-21 | 汉王科技股份有限公司 | PDF character extraction method combined with OCR technology |
-
2010
- 2010-10-29 CN CN2010105311511A patent/CN101980133B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4440513A (en) * | 1981-03-12 | 1984-04-03 | Fuji Xerox Co., Ltd. | Character shaping device |
CN1383516A (en) * | 2000-07-05 | 2002-12-04 | 八万系统有限公司 | Proofreading system of Chinese characters by means of one-to-one comparision |
CN101782896A (en) * | 2009-01-21 | 2010-07-21 | 汉王科技股份有限公司 | PDF character extraction method combined with OCR technology |
Also Published As
Publication number | Publication date |
---|---|
CN101980133A (en) | 2011-02-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101980133B (en) | Method and system for detecting text selection region deviation of double-layer electronic file | |
CN102201009A (en) | Form generating method and device | |
US20070129887A1 (en) | Map information system and map information processing method and program | |
CN101656024A (en) | Electronic learning device and realizing method thereof | |
CN104036060A (en) | Online auditing method and system for engineering drawing | |
CN102681978A (en) | Method and system for displaying text in PDF (portable document format) document | |
CN104765721B (en) | Layout processing method and processing device | |
CN105608119A (en) | Rapid thematic map drawing technology | |
CN106648479A (en) | Printing method and apparatus | |
CN105278961A (en) | Method and system for generating database table structure document | |
CN101017486A (en) | Method of finding company position by business card scanning | |
CN101424998A (en) | Document page display method and system | |
CN101324833B (en) | Method and apparatus for saving page resource | |
CN103838763A (en) | Object file generation system and method | |
CN104915666A (en) | Information card information positioning acquisition method based on paper-made image scanning | |
CN104765266A (en) | Simulation clock display method and device and LED display control card | |
KR101516213B1 (en) | Responsive Web Generating Method By Converting Document To Responsive Web | |
CN103488440A (en) | Bill printing device and bill printing method | |
CN106708801A (en) | Proofreading method used for text | |
CN102442047B (en) | Label processing method and device for board combination | |
CN103605640B (en) | Form adaption method and device | |
CN103106270B (en) | cloud data fusion method and system | |
CN102542074B (en) | Demonstration and search tool of topological relationship of elements | |
CN107704445A (en) | A kind of list makes a report on method and device | |
JP6869299B2 (en) | Quadrature table judgment device and control program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20120704 Termination date: 20141029 |
|
EXPY | Termination of patent right or utility model |