CN107977593A - Image processing apparatus and image processing method - Google Patents

Image processing apparatus and image processing method Download PDF

Info

Publication number
CN107977593A
CN107977593A CN201610921297.4A CN201610921297A CN107977593A CN 107977593 A CN107977593 A CN 107977593A CN 201610921297 A CN201610921297 A CN 201610921297A CN 107977593 A CN107977593 A CN 107977593A
Authority
CN
China
Prior art keywords
connected domain
connecting line
heading character
character connected
title
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610921297.4A
Other languages
Chinese (zh)
Inventor
焦继乐
范伟
孙俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to CN201610921297.4A priority Critical patent/CN107977593A/en
Publication of CN107977593A publication Critical patent/CN107977593A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/414Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text

Abstract

The present invention relates to image processing apparatus and image processing method.Image processing apparatus according to the present invention includes:Connected domain acquiring unit, for obtaining multiple connected domains of newspaper image;Character connected domain determination unit, for merging overlapping connected domain in the multiple connected domain and adjacent connected domain to obtain multiple character connected domains;Heading character determination unit, for determining multiple heading character connected domains from the multiple character connected domain;Connecting line determination unit, for determining one or more title connecting line according to the multiple heading character connected domain;And Title area acquiring unit, it is located at the heading character connected domain on identical title connecting line for combination to obtain one or more Title area of the newspaper image.Using image processing apparatus according to the present invention and image processing method, the title extraction of newspaper image can be automatically performed, so as to save substantial amounts of manpower, improves the labeling effciency of Digital Newspapers title.

Description

Image processing apparatus and image processing method
Technical field
The embodiment of the present invention is related to image processing field, more particularly to the Title area that can obtain newspaper image Image processing apparatus and image processing method.
Background technology
This part provides background information related to the present invention, this is not necessarily the prior art.
With the development of information technology, in order to protect original paper, and the convenience of storage and lookup, many libraries are all to shop The Press Literature of Tibetan has carried out digitized work, and literature content is stored in the form of microfilm for consulting.To these When digitlization newpapers and periodicals are retrieved, layout information, the article being often related to such as publication date, release and version name are crucial The information such as word and headline.However, due to oneself of the different layout information of different newspaper existence forms and heading message By property, these contents automatically extract with there are bigger difficulty.Therefore, at present to the mark of layout information and heading message Mostly by the way of handmarking.The processing mode of handmarking needs to waste substantial amounts of manpower and materials, and speed is slower, Inefficiency.
For above technical problem, the present invention wishes to propose a kind of scheme, and the title that can be automatically performed newspaper image carries Take, so as to save substantial amounts of manpower, improve the labeling effciency of Digital Newspapers title.
The content of the invention
This part provides the general summary of the present invention, rather than its four corner or the comprehensive of its whole feature drape over one's shoulders Dew.
It is an object of the invention to provide a kind of image processing apparatus and image processing method, can be automatically performed newspaper figure The title extraction of picture, so as to save substantial amounts of manpower, improves the labeling effciency of Digital Newspapers title.
According to an aspect of the present invention, there is provided a kind of image processing apparatus, including:Connected domain acquiring unit, for obtaining Take multiple connected domains of newspaper image;Character connected domain determination unit, for merging the overlapping connection in the multiple connected domain Domain and adjacent connected domain are to obtain multiple character connected domains;Heading character determination unit, for from the multiple character connected domain In determine multiple heading character connected domains;Connecting line determination unit, for determining one according to the multiple heading character connected domain A or multiple title connecting lines;And Title area acquiring unit, for combining the mark on identical title connecting line The autograph accords with connected domain to obtain one or more Title area of the newspaper image.
According to another aspect of the present invention, there is provided a kind of image processing method, including:Obtain multiple companies of newspaper image Logical domain;Merge the overlapping connected domain in the multiple connected domain and adjacent connected domain to obtain multiple character connected domains;From described Multiple heading character connected domains are determined in multiple character connected domains;According to the multiple heading character connected domain determine one or Multiple title connecting lines;And combination is located at the heading character connected domain on identical title connecting line to obtain the newspaper figure One or more Title area of picture.
According to another aspect of the present invention, there is provided a kind of program product, the program product include the machine being stored therein Device readable instruction code, wherein, described instruction code can make the computer perform root when being read by computer and being performed According to the image processing method of the present invention.
According to another aspect of the present invention, there is provided a kind of machinable medium, carries according to the present invention thereon Program product.
, can be by merging overlapping connected domain and phase using image processing apparatus according to the present invention and image processing method Adjacent connected domain obtains the character connected domain of newspaper image, and heading character connected domain is determined from character connected domain, and can root One or more title connecting line is determined according to heading character connected domain, is located at so as to combine on identical title connecting line Heading character connected domain to obtain the Title area of newspaper image.So, newspaper can be automatically extracted fast and reliablely The Title area of image, so as to save substantial amounts of manpower and materials, improves the labeling effciency of Digital Newspapers title.
Description and specific examples in this summary are intended merely to the purpose of signal, the model being not intended to limit the invention Enclose.
Brief description of the drawings
Attached drawing described here is intended merely to the purpose of the signal of selected embodiment and not all possible implementation, and not It is intended to limit the scope of the invention.In the accompanying drawings:
Fig. 1 is the schematic diagram according to the newspaper image to be treated of the embodiment of the present invention;
Fig. 2 is the structure diagram according to the image processing apparatus of the embodiment of the present invention;
Fig. 3 is the schematic diagram of the newspaper image after the multiple connected domains of acquisition according to the embodiment of the present invention;
Fig. 4 is the structure diagram according to the character connected domain determination unit of the embodiment of the present invention;
Fig. 5 is the schematic diagram according to the overlapping connected domain of merging of the embodiment of the present invention;
Fig. 6 is the schematic diagram according to the adjacent connected domain of merging of the embodiment of the present invention;
Fig. 7 is the structure diagram according to the heading character determination unit of the embodiment of the present invention;
Fig. 8 is the schematic diagram according to the newspaper image after the definite heading character connected domain of the embodiment of the present invention;
Fig. 9 is the structure diagram according to the heading character determination unit of an alternative embodiment of the invention;
Figure 10 is to count the curve map of the number of character connected domain according to the embodiment of the present invention by variable of size;
Figure 11 is the structure diagram according to the connecting line determination unit of the embodiment of the present invention;
Figure 12 is the structure diagram according to the connecting line determination unit of an alternative embodiment of the invention;
Figure 13 is the structure diagram according to the connecting line determination unit of another embodiment of the present invention;
Figure 14 is the schematic diagram according to the process for determining a title connecting line of the embodiment of the present invention;
Figure 15 be according to the embodiment of the present invention obtain Title area after newspaper image schematic diagram;
Figure 16 is the structure diagram according to the image processing apparatus of an alternative embodiment of the invention;
Figure 17 be according to an alternative embodiment of the invention obtain Title area after newspaper image schematic diagram;
Figure 18 is the flow chart according to the image processing method of the embodiment of the present invention;And
Figure 19 is the exemplary knot for the general purpose personal computer that can wherein realize image processing method according to the present invention The block diagram of structure.
Although the present invention is subjected to various modifications and alternative forms, its specific embodiment is as an example in attached drawing In show, and be described in detail here.It should be understood, however, that the description at this to specific embodiment is not intended to send out this It is bright to be restricted to disclosed concrete form, but on the contrary, the invention aims to cover the spirit and scope of the present invention it It is interior all modifications, equivalent and replace.It should be noted that running through several attached drawings, corresponding label indicates corresponding component.
Embodiment
The example of the present invention is described more fully referring now to attached drawing.It is described below what is be merely exemplary in nature, It is not intended to limit the invention, application or purposes.
Example embodiment is provided below, so that the present invention will become detailed, and will be to those skilled in the art Fully pass on its scope.The example of numerous specific details such as discrete cell, apparatus and method is elaborated, to provide to this hair The detailed understanding of bright embodiment.To those skilled in the art it will be obvious that, it is not necessary to use specific details, Example embodiment can be implemented with many different forms, they shall not be interpreted to limit the scope of the invention. In some example embodiments, well-known process, well-known structure and widely-known technique are not described in detail.
Fig. 1 is the schematic diagram according to the newspaper image to be treated of the embodiment of the present invention.As shown in Figure 1, newspaper figure There are the information such as title, date, text and the title of newspaper as in.In Fig. 1 the newspaper image is shown with black box Title area.It is an object of the invention to provide a kind of image processing apparatus and image processing method, enabling extraction is such as The Title area of newspaper image shown in Fig. 1.
Image processing apparatus 200 according to an embodiment of the invention is described with reference to Fig. 2.
Image processing apparatus 200 according to the present invention includes connected domain acquiring unit 210, character connected domain determination unit 220th, heading character determination unit 230, connecting line determination unit 240 and Title area acquiring unit 250.
According to an embodiment of the invention, connected domain acquiring unit 210 can obtain multiple connected domains of newspaper image.Connection Domain (also referred to as connected region) detection is common method in image processing field and area of pattern recognition, it is in target point Cut, have a wide range of applications in edge detection and region detection.There is big in image processing field and area of pattern recognition The method of the detection connected domain of amount, the present invention do not limit this.Appoint that is, connected domain acquiring unit 210 can use Method known to meaning obtains multiple connected domains of a newspaper image.Further, connected domain acquiring unit 210 will can obtain Multiple connected domains be transferred to character connected domain determination unit 220.
According to an embodiment of the invention, character connected domain determination unit 220 can merge the overlapping company in multiple connected domains Lead to domain and adjacent connected domain to obtain multiple character connected domains.According to an embodiment of the invention, character connected domain determination unit 220 Multiple connected domains of newspaper image can be obtained from connected domain acquiring unit 210, and multiple character connected domains of acquisition are passed It is defeated to arrive heading character determination unit 230.According to an embodiment of the invention, character can include chinese character, can also include Korea Spro The characteristics of language character and Japanese character etc., this kind of character is that a character includes one or more connected domain, thus can be with Character connected domain is obtained by way of merging overlapping connected domain and adjacent connected domain.
According to an embodiment of the invention, heading character determination unit 230 can determine multiple from multiple character connected domains Heading character connected domain.According to an embodiment of the invention, heading character determination unit 230 can be from character connected domain determination unit 220 obtain multiple character connected domains of newspaper image, and therefrom determine that multiple heading character connected domains are transferred to connecting line and determine Unit 240.
According to an embodiment of the invention, connecting line determination unit 240 can determine one according to multiple heading character connected domains A or multiple title connecting lines.Here, title connecting line refers to the heading character connected domain for belonging to same Title area Line.According to an embodiment of the invention, connecting line determination unit 240 can obtain multiple from heading character determination unit 230 Heading character connected domain, and determine one or more title connecting line, definite title connecting line is transferred to Title area Acquiring unit 250.
According to an embodiment of the invention, Title area acquiring unit 250 can be combined on identical title connecting line Heading character connected domain to obtain one or more Title area of newspaper image.Here, Title area acquiring unit 250 One or more title connecting line can be obtained from connecting line determination unit 240, thus obtain one of newspaper image or Multiple Title areas.
Image processing apparatus according to the present invention, can determine multiple heading character connected domains according to newspaper image, and can To determine the line for the heading character connected domain for belonging to same Title area.Once it is determined that title connecting line, can be easy Ground obtains and the corresponding Title area of title connecting line.So, the mark of newspaper image can be realized fast and reliablely Topic automatically extracts.
Fig. 3 is the schematic diagram of the newspaper image after the multiple connected domains of acquisition according to the embodiment of the present invention.Hereinbefore Mention, connected domain acquiring unit 210 can obtain multiple connected domains of newspaper image, and the square frame of each black shows in Fig. 3 A connected domain is gone out, multiple connected domains is shown in Fig. 3.As shown in figure 3, a connected domain is probably one in newspaper image Connected domain where a character, such as " anti-" word, it is also possible to the part in a character in newspaper image, such as " adding " The left side " power " part where connected domain.That is, a character in newspaper image may be by one or more company Logical domain is formed.Therefore, in order to obtain the character connected domain in newspaper image, it is necessary to connected domain acquiring unit 210 obtain it is more A connected domain carries out appropriate merging.
Fig. 4 is the structure diagram according to the character connected domain determination unit 220 of the embodiment of the present invention.As shown in figure 4, word Overlapping connected domain determination unit 221, adjacent connected domain determination unit 222 can be included and merge by according with connected domain determination unit 220 Unit 223.
According to an embodiment of the invention, when the boundary rectangle frame of two connected domains is there are during overlapping region, connected domain is overlapped Determination unit 221 can determine that the two connected domains are overlapping connected domain.Here, overlapping connected domain determination unit 221 can be from Connected domain acquiring unit 210 obtains all connected domains of newspaper image, and judges whether deposited in all connected domains of newspaper image In overlapping connected domain.Further, combining unit 223 can be transferred to by overlapping connected domain by overlapping connected domain determination unit 221.
Fig. 5 is the schematic diagram according to the overlapping connected domain of merging of the embodiment of the present invention.Figure institute on the left of Fig. 5 arrows Show, character " ginseng " includes two with the connected domain shown in black box.Two connected domains shown in Fig. 5 are rectangle, and In practical operation, connected domain can be other arbitrary polygonal shapes.According to an embodiment of the invention, when two connected domains Boundary rectangle frame is there are during overlapping region, i.e., when one or more angle of the boundary rectangle frame of a connected domain is located at another When in the region of the boundary rectangle frame of connected domain, overlapping connected domain determination unit 221 can determine that the two connected domains are overlapping Connected domain.As shown in the figure on the left of Fig. 5 arrows, two connected domains that character " ginseng " includes are there are overlapping region, thus overlapping company Logical domain determination unit 221 judges the two connected domains to overlap connected domain.
According to an embodiment of the invention, the feelings of multiple overlapping connected domains can also be judged by overlapping connected domain determination unit 221 Condition.For example, if all there are overlapping region for the boundary rectangle frame of each two connected domain in multiple connected domains, then overlapping connection Domain determination unit 221 may determine that this multiple connected domain belongs to overlapping connected domain.
According to an embodiment of the invention, when between closest two side of the boundary rectangle frame of two connected domains away from From less than first threshold, and merge the aspect ratio of the boundary rectangle frame of the connected domain after the two connected domains and 1 difference During less than second threshold, adjacent connected domain determination unit 222 determines that the two connected domains are adjacent connected domain.According to the present invention Embodiment, adjacent connected domain determination unit 222 not only may determine that whether two connected domains are adjacent connected domain, can also judge Whether one group of connected domain including two or more connected domain is adjacent connected domain.For example, when each two in one group of connected domain connects The distance between closest two side of boundary rectangle frame in logical domain both less than first threshold, and merge the connection of this group When the aspect ratio of the boundary rectangle frame of connected domain after domain is less than second threshold with 1 difference, adjacent connected domain determination unit 222 can determine that this group of connected domain is adjacent connected domain.
Here, adjacent connected domain determination unit 222 can obtain all companies of newspaper image from connected domain acquiring unit 210 Logical domain, and judge to whether there is adjacent connected domain in all connected domains of newspaper image.Further, adjacent connected domain determination unit Adjacent connected domain can be transferred to combining unit 223 by 222.
Fig. 6 is the schematic diagram according to the adjacent connected domain of merging of the embodiment of the present invention.Figure institute on the left of Fig. 6 arrows Show, character " adding " includes two with the connected domain shown in black box.Two connected domains shown in Fig. 6 are rectangle, and In practical operation, connected domain can be other arbitrary polygonal shapes.According to an embodiment of the invention, using two connected domains as Example, when the distance between closest two side of boundary rectangle frame of two connected domains is less than first threshold, illustrates this Two connected domains are apart from close, and the aspect ratio of the boundary rectangle frame of connected domain after the two connected domains are merged and 1 When difference is less than second threshold, illustrate to merge the close square of connected domain after the two connected domains.When meeting above-mentioned two During condition, adjacent connected domain determination unit 222 can determine that the two connected domains are adjacent connected domain.On the left of Fig. 6 arrows Shown in figure, two connected domains that character " adding " includes merge the later connected domain of the two connected domains very apart from close Close to square, thus adjacent connected domain determination unit 222 judges the two connected domains for adjacent connected domain.
According to an embodiment of the invention, combining unit 223 can merge overlapping connected domain, merge adjacent connected domain, and can Multiple connected domains to be obtained after by merging are used as character connected domain.Here, combining unit 223 can be from overlapping connected domain Determination unit 221 obtains overlapping connected domain, adjacent connected domain is obtained from adjacent connected domain determination unit 222, so as to merge overlapping Connected domain, merges adjacent connected domain, and using the connected domain after merging as character connected domain.According to an embodiment of the invention, Combining unit 223 can also obtain multiple connected domains of newspaper image from connected domain acquiring unit 210, so as to will both be not belonging to hand over Folded connected domain is also not belonging to the connected domain of adjacent connected domain directly as character connected domain.Further, combining unit 223 can be with Definite character connected domain is transferred to heading character determination unit 230.
As shown in the figure on the right side of Fig. 5 arrows, two overlapping connected domains that character " ginseng " is included merge the company obtained later The boundary rectangle in logical domain includes character " ginseng ".As shown in the figure on the right side of Fig. 6 arrows, two adjacent companies that character " adding " is included The boundary rectangle for the connected domain that logical domain obtains after merging includes character " adding ".
According to an embodiment of the invention, character connected domain determination unit 220 can be by merging overlapping connected domain and merging The mode of adjacent connected domain determines character connected domain from multiple connected domains of newspaper image.The character connected domain so obtained is very Accurately, and to determine that heading character connected domain lays the foundation in next step.
Fig. 7 is the structure diagram according to the heading character determination unit 230 of the embodiment of the present invention.
As shown in fig. 7, heading character determination unit 230 can include comparing unit 231.According to an embodiment of the invention, The character connected domain that size in multiple character connected domains can be more than the 3rd threshold value by comparing unit 231 is determined as heading character company Logical domain.
According to an embodiment of the invention, heading character determination unit 230 can be obtained from character connected domain determination unit 220 Multiple character connected domains of newspaper image, and the size of all character connected domains is compared with the 3rd threshold value, and by size Character connected domain more than the 3rd threshold value is determined as heading character connected domain.Further, heading character determination unit 230 can incite somebody to action Definite heading character connected domain is transmitted to connecting line determination unit 240.
As shown in Figure 1, in newspaper image, under normal circumstances, heading character is larger than text character.Therefore, according to The embodiment of the present invention, can set the 3rd threshold value according to practical experience, so that the character that size is more than to the 3rd threshold value connects Domain is determined as heading character connected domain.
According to an embodiment of the invention, the size of character connected domain, such as character connection can be weighed with various parameters In the length and width of the area in domain, the length of the boundary rectangle of character connected domain and the boundary rectangle of wide average and character connected domain Higher value etc..Different reference thresholds can be set for different parameter of measurement.For example, when the face with character connected domain When accumulating the size to weigh character connected domain, area in multiple character connected domains can be more than by comparing unit 231 is directed to area The character connected domain of the 3rd threshold value be determined as heading character connected domain;When the length of the boundary rectangle with character connected domain and wide When average is to weigh the size of character connected domain, comparing unit 231 can by the length of boundary rectangle in multiple character connected domains and Wide average is more than the length for the boundary rectangle for being directed to character connected domain and the character connected domain of the 3rd threshold value of wide average determines For heading character connected domain;When with the length of the boundary rectangle of character connected domain and it is wide in higher value weigh character connected domain During size, the higher value in the length of boundary rectangle in multiple character connected domains and width can be more than by comparing unit 231 is directed to word Accord with the length of the boundary rectangle of connected domain with it is wide in the character connected domain of the 3rd threshold value of higher value be determined as heading character and connect Domain.
Fig. 8 is the schematic diagram according to the newspaper image after the definite heading character connected domain of the embodiment of the present invention.Fig. 8 It is middle to show heading character connected domain with black box, that is to say, that heading character is determined in heading character determination unit 230 After connected domain, the text character connected domain in newspaper image is removed, only remaining heading character connected domain.
It is noted above, the 3rd threshold value can be set according to practical experience.According to an embodiment of the invention, can also basis The sizes of all character connected domains of newspaper image calculates the 3rd threshold value.
Fig. 9 is the structure diagram according to the heading character determination unit 230 of an alternative embodiment of the invention.
As shown in figure 9, heading character determination unit 230 can include size determination unit 232, statistic unit 233, threshold value Determination unit 234 and comparing unit 231.
According to an embodiment of the invention, size determination unit 232 can determine all characters in multiple character connected domains The size of connected domain.Here, size determination unit 232 can obtain newspaper image from character connected domain determination unit 220 All character connected domains, and calculate the size of all character connected domains.Further, size determination unit 232 can be by all words The size of symbol connected domain is transmitted to statistic unit 233.
According to an embodiment of the invention, statistic unit 233 can count character using the size of character connected domain as variable The number of connected domain.Here, statistic unit 233 can obtain the size of all character connected domains from size determination unit 232, And the number of the various sizes of character connected domain of statistics is transmitted to threshold value determination unit 234.
According to an embodiment of the invention, threshold value determination unit 234 can be according to the maximum number of character connected domain Size determines the 3rd threshold value.Here, threshold value determination unit 234 can be obtained from statistic unit 233 has various sizes of word The number of connected domain is accorded with, so that it is determined that the size with the maximum number of character connected domain, and will have the maximum number of character The size of connected domain is multiplied by certain empirical coefficient so that it is determined that the 3rd threshold value.Further, threshold value determination unit 234 can be by really The 3rd fixed threshold value is transmitted to comparing unit 231, and heading character connected domain is determined for comparing unit 231.
As described above, the size of character connected domain can be weighed with various parameters, such as the area of character connected domain, Higher value in the length of the boundary rectangle of character connected domain and the length of the boundary rectangle of wide average and character connected domain and width Etc..Here, the parameter of size is weighed used by size determination unit 232, statistic unit 233 and threshold value determination unit 234 It is consistent with the parameter that size is weighed used by comparing unit 231.Exemplified by weighing size using the area of character connected domain, Size determination unit 232 can determine the area of all character connected domains, and statistic unit 233 can be counted with different area The number of character connected domain, and threshold value determination unit 234 can be according to the area with the maximum number of character connected domain come really Fixed 3rd threshold value.
Figure 10 is to count the curve map of the number of character connected domain according to the embodiment of the present invention by variable of size. Number of the statistic unit 233 using the size of character connected domain as statistics of variable character connected domain, and obtain as shown in Figure 10 Curve map.Here, the size of the character connected domain with maximum number N is L, thus threshold value determination unit 234 can be according to ruler Very little L determines the 3rd threshold value.
According to an embodiment of the invention, threshold value determination unit 234 can determine that the 3rd threshold value T is:
T=k1×L
Wherein, L is the size with the maximum number of character connected domain, k1For empirical coefficient, and k1>1。
According to an embodiment of the invention, heading character determination unit 230 can be connected according to all characters of newspaper image Domain determines the 3rd threshold value, so as to determining all heading character connected domains in character connected domain according to the 3rd threshold value, So that the heading character connected domain determined is more accurate.
According to an embodiment of the invention, after heading character determination unit 230 determines multiple heading character connected domains, Connecting line determination unit 240 can determine one or more connecting line according to multiple heading character connected domains.Figure 11 is root According to the structure diagram of the connecting line determination unit 240 of the embodiment of the present invention.
As shown in figure 11, connecting line determination unit 240 can include finding unit 241 and determination unit 242.
According to an embodiment of the invention, the title connecting line that is not belonging to search out can be traveled through by finding unit 241 Heading character connected domain finds title connecting line as heading character connected domain is started.
According to an embodiment of the invention, all title connecting lines searched out can be determined as by determination unit 242 One or more title connecting line.
According to an embodiment of the invention, find unit 241 can determine first one start heading character connected domain, and from This starts heading character connected domain and begins look for title connecting line.In an embodiment of the present invention, the heading character by one Connected domain can at most find a title connecting line.A title company is searched out when starting heading character connected domain according to one After wiring, finding unit 241 can lemma since the title connecting line for being not belonging to search out and not doing Accord with and next beginning heading character connected domain is chosen in the heading character connected domain of connected domain, and the lemma since this is next Symbol connected domain begins look for title connecting line.In this way, finding unit 241 can determine that one or more starts to mark Autograph symbol connected domain, and after determining that starts a heading character connected domain every time, heading character connected domain is opened since this Begin to find title connecting line, until all heading character connected domains belong to title connecting line or all do beginning lemma Untill according with connected domain.
According to an embodiment of the invention, traversal can be used when determining to start heading character connected domain by finding unit 241 Mode.That is, order (such as display of the heading character connected domain on newspaper image according to heading character connected domain The storage order of order or heading character connected domain) lemma of title connecting line for being not belonging to search out is chosen successively Connected domain is accorded with as beginning heading character connected domain.
According to an embodiment of the invention, find unit 241 determine start heading character connected domain when can also use with The mode of machine.Marked that is, finding unit 241 since the title connecting line for being not belonging to search out and not doing A heading character connected domain is randomly selected in the heading character connected domain of autograph symbol connected domain as heading character is started to connect Domain.
It is next determined that unit 242 can using all title connecting lines searched out as one of newspaper image or Multiple title connecting lines.
Next, searching unit 241 according to an embodiment of the invention will be described in detail with reference to Figure 12 and Figure 13.
Figure 12 is the structure diagram according to the connecting line determination unit of an alternative embodiment of the invention.As shown in figure 12, Stable state connecting line set determination unit 2411 and output unit 2412 can be included by finding unit 241.
According to an embodiment of the invention, stable state connecting line set determination unit 2411 can be repeated for beginning heading character The end heading character connected domain of each stable state connecting line in the stable state connecting line set of connected domain performs the step of following operation Suddenly, until neighbours' lemma is not present in the end heading character connected domain of each stable state connecting line in stable state connecting line set Untill according with connected domain:Neighbours' heading character connected domain of end heading character connected domain and end heading character connected domain will be connected Connecting line as transient state connecting line;When neighbours' heading character connected domain of end heading character connected domain meets predetermined condition When, by the transient state connecting line where neighbours' heading character connected domain and the stable state connecting line phase where the heading character connected domain of end Connect the stable state connecting line where the end heading character connected domain stored to update in stable state connecting line set;And work as end When neighbours' heading character connected domain of heading character connected domain is unsatisfactory for predetermined condition, by where neighbours' heading character connected domain Transient state connecting line is stored in stable state connecting line set as new stable state connecting line.
According to an embodiment of the invention, output unit 2412 will can connect in stable state connecting line set comprising heading character The most stable state connecting line in logical domain is determined as a title connecting line.
According to an embodiment of the invention, when the mark that the title connecting line in stable state connecting line set there are more than two includes When autograph symbol connected domain number is identical and all most, output unit 2412 can randomly select a stable state connecting line as mark Connecting line is inscribed, to ensure that heading character connected domain can only at most search out a title connecting line by one.
Figure 13 is the structure diagram according to the connecting line determination unit of another embodiment of the present invention.As shown in figure 13, Finding unit 241 can also include starting heading character connected domain determination unit 2413.
According to an embodiment of the invention, beginning heading character connected domain determination unit 2413, which can travel through, is not belonging to seek The heading character connected domain of the title connecting line found is as beginning heading character connected domain.
According to an embodiment of the invention, when beginning heading character connected domain determination unit 2413 determines a beginning title After character connected domain, stable state connecting line set determination unit 2411 can be determined with the method for embodiment described above This starts the stable state connecting line set of heading character connected domain.According to an embodiment of the invention, each start heading character to connect Logical domain stores in the stable state connecting line set and is opened by heading character connected domain this all there are a stable state connecting line set Begin the connecting line of one or more heading character connected domain searched out.Next, output unit 2412 can connect from stable state A title connecting line is determined in wiring set.According to an embodiment of the invention, heading character connected domain determination unit is started 2413 can determine multiple beginning heading character connected domains by way of traversal, therefore output unit 2412 can be correspondingly defeated Go out multiple title connecting lines.
According to an embodiment of the invention, stable state connecting line set determination unit 2411 can also include initialization unit (not Show).The stable state connecting line set for starting heading character connected domain can be initialized as including following stable state by initialization unit Connecting line:Connect this start heading character connected domain with this start heading character connected domain neighbours' heading character connected domain company Wiring.
According to an embodiment of the invention, initialization unit can be to the stable state connecting line set of beginning heading character connected domain Initialized.Next, stable state connecting line set determination unit 2411 can use embodiment of the present invention steady to this State connecting line set is updated, so that it is determined that final stable state connecting line set.
The function of finding unit 241 is described in detail below in conjunction with Figure 14, i.e., for a specific beginning heading character Connected domain determines stable state connecting line set, and the process of a title connecting line is determined according to stable state connecting line set.
Figure 14 is the schematic diagram according to the process for determining a title connecting line of the embodiment of the present invention.
As shown in figure 14,0-9 shows 10 heading character connected domains, here, for convenience of description, by No. 0 lemma Connected domain is accorded with as beginning heading character connected domain.
First, initialization unit the stable state connecting line set that start heading character connected domain can be initialized as including with Lower stable state connecting line:This is connected to start heading character connected domain and start neighbours' heading character of heading character connected domain with this to connect The connecting line in domain.
In the present invention, neighbours' heading character connected domain of a heading character connected domain can be defined.When two titles When character connected domain meets first group of predetermined condition, the two heading character connected domains are referred to as neighbours' heading character connected domain.Root According to the embodiment of the present invention, first group of predetermined condition can include the pact for the distance between the two heading character connected domains Beam.As a specific example, first group of predetermined condition is:The distance between center of two heading character connected domains is less than 2 times of minimum value in both below:The height of one heading character connected domain and the maximum of width;And another title The height of character connected domain and the maximum of width, i.e.,:
dij<2*(min(max(iw,ih),max(jw,jh)))
Wherein, dijRepresent the distance between center of i-th of heading character connected domain and j-th of heading character connected domain, iwRepresent the width of i-th of heading character connected domain, ihRepresent the height of i-th of heading character connected domain, jwRepresent j-th of mark The width of autograph symbol connected domain, jhRepresent the height of j-th of heading character connected domain.
As shown in figure 14, No. 1 heading character connected domain is neighbours' heading character connected domain of No. 0 heading character connected domain. Therefore, the stable state connecting line set of No. 0 heading character connected domain can be initialized as including following stable state company by initialization unit Wiring:The stable state connecting line 0-1 of No. 0 heading character connected domain of connection and No. 1 heading character connected domain.Here, it illustrate only 0 There is neighbours' heading character connected domain in number heading character connected domain.In practical operation, a beginning lemma Symbol connected domain may have multiple neighbours' heading character connected domains, then there are a plurality of steady in the stable state connecting line set of its initialization State connecting line.
Next, stable state connecting line set determination unit 2411 can be directed to the stable state connection for starting heading character connected domain The end heading character connected domain of each stable state connecting line in line set performs following operation:Connection end heading character is connected The connecting line of neighbours' heading character connected domain of logical domain and end heading character connected domain is as transient state connecting line.
In the present invention, the end heading character connected domain of stable state connecting line is to be eventually connected on the stable state connecting line That heading character connected domain.By taking the embodiment shown in Figure 14 as an example, the end heading character connected domain of stable state connecting line 0-1 is No. 1 heading character connected domain.Since, there may be multiple stable state connecting lines, thus there is also multiple ends in stable state connecting line set Heading character connected domain is held, identical operation is carried out for each end heading character connected domain.Further, end is being judged It can be used during neighbours' heading character connected domain of heading character connected domain with judging that the neighbours for starting heading character connected domain mark Similar method during autograph symbol connected domain, details are not described herein.
As shown in figure 14, as No. 1 heading character connected domain of end heading character connected domain, there are three neighbours' titles Character connected domain:No. 2 heading character connected domains, No. 3 heading character connected domains and No. 4 heading character connected domains.Stable state connecting line Gather determination unit 2411 by the connecting line 1-2 for connecting No. 1 heading character connected domain and No. 2 heading character connected domains, be connected No. 1 The connecting line 1-3 of heading character connected domain and No. 3 heading character connected domains and it is connected No. 1 heading character connected domain and No. 4 titles The connecting line 1-4 of character connected domain is as transient state connecting line.
Next, stable state connecting line set determination unit 2411 may determine that neighbours' title of end heading character connected domain Whether character connected domain meets second group of predetermined condition, when meeting second group of predetermined condition, by neighbours' heading character connected domain The transient state connecting line at place is connected with the stable state connecting line where the heading character connected domain of end to update stable state connecting line collection Stable state connecting line where the end heading character connected domain stored in conjunction;And when being unsatisfactory for second group of predetermined condition, will Transient state connecting line where neighbours' heading character connected domain is stored in stable state connecting line set as new stable state connecting line.
According to an embodiment of the invention, second group of predetermined condition can include the neighbours for end heading character connected domain The length of heading character connected domain, width, place transient state connecting line slope and to where the heading character connected domain of end Stable state connecting line distance constraint.
As a specific example, second group of predetermined condition can include:A. the neighbours of end caption text connected domain The length of the length of caption text connected domain and end caption text connected domain is (or steady where the caption text connected domain of end The median of the length of all caption text connected domains on state connecting line) difference be less than the 4th threshold value;B. end heading-text Width (or the end caption text of the width of neighbours' caption text connected domain of word connected domain and end caption text connected domain The median of the width of all caption text connected domains on stable state connecting line where connected domain) difference be less than the 5th threshold Value;C. the slope of the transient state connecting line where neighbours' caption text connected domain of end caption text connected domain and end heading-text The difference of the slope of stable state connecting line where word connected domain is less than the 6th threshold value;And the neighbour of d. ends caption text connected domain The distance for occupying the stable state connecting line where the center to end caption text connected domain of caption text connected domain is less than the 7th threshold value.
According to an embodiment of the invention, can according to actual demand or experience come set the 4th threshold value, the 5th threshold value, 6th threshold value and the 7th threshold value, can also set these threshold values according to certain criterion.For example, can set the 7th threshold value as k2*max(cw,ch), wherein, k2Represent empirical coefficient, and k2<1, cwRepresent neighbours' heading-text of end caption text connected domain The width of word connected domain, chRepresent the height of neighbours' caption text connected domain of end caption text connected domain.
In second group of predetermined condition, meet that condition a and b illustrate neighbours' caption text connected domain and end caption text Connected domain is closely sized to, and meets that condition c and d illustrate the transient state connecting line and end mark where neighbours' caption text connected domain Inscribe stable state connecting line where word connected domain almost point-blank.Therefore, when neighbours' caption text connected domain meets the During two groups of predetermined conditions, the stable state where which can be connected to end caption text connected domain connects In wiring.
By taking the example shown in Figure 14 as an example, stable state connecting line set determination unit 2411 may determine that as end lemma Accord with neighbours' heading character connected domain of No. 1 heading character connected domain of connected domain:No. 2, No. 3 and No. 4 heading character connected domains are Second group of predetermined condition of no satisfaction.As shown in figure 14, although No. 2 and No. 3 heading character connected domains meet condition a and b, but can Condition c and d can be unsatisfactory for, thus is unsatisfactory for second group of predetermined condition, and No. 4 heading character connected domains meet second group of predetermined bar Part.According to an embodiment of the invention, stable state connecting line set determination unit 2411 is by the transient state where No. 4 heading character connected domains Connecting line 1-4 is connected with the stable state connecting line 0-1 where No. 1 heading character connected domain to be deposited with updating in stable state connecting line set Stable state connecting line 0-1 where No. 1 heading character connected domain of storage.That is, at this time, stored in stable state connecting line set Stable state connecting line 0-1 is updated to 0-1-4.According to an embodiment of the invention, stable state connecting line set determination unit 2411 is by No. 2 The transient state connecting line 1-3 conducts where transient state connecting line 1-2 and No. 3 heading character connected domains where heading character connected domain New stable state connecting line is stored in stable state connecting line set.By above step, three are stored in stable state connecting line set Bar stable state connecting line:0-1-4;1-2 and 1-3.
Next, stable state connecting line set determination unit 2411 may determine that each stable state in stable state connecting line set connects Whether the end heading character connected domain of wiring is also there are neighbours' heading character connected domain, if appointing in stable state connecting line set The end heading character connected domain of one stable state connecting line is also there are neighbours' heading character connected domain, then stable state connecting line set is true Order member 2411 can be repeated for each stable state connecting line in the stable state connecting line set for starting heading character connected domain End heading character connected domain performs the step of operations described above, until each stable state in stable state connecting line set connects Untill all neighbours' heading character connected domain is not present in the end heading character connected domain of wiring.
By taking the example shown in Figure 14 as an example, for the end lemma of the stable state connecting line 1-2 in stable state connecting line set Accord with No. 2 heading character connected domains of connected domain:No. 5 marks of No. 2 heading character connected domains and its neighbours' heading character connected domain will be connected The connecting line 2-5 of autograph symbol connected domain is as transient state connecting line;Judge that No. 5 heading character connected domains are unsatisfactory for second group of predetermined bar Part;It is stored in transient state connecting line 2-5 as new stable state connecting line in stable state connecting line set.For stable state connecting line set In stable state connecting line 1-3 No. 3 heading character connected domains of end heading character connected domain:No. 3 heading character connections will be connected The connecting line 3-7 of domain and its No. 7 heading character connected domain of neighbours' heading character connected domain is as transient state connecting line;Judge No. 7 marks Autograph symbol connected domain is unsatisfactory for second group of predetermined condition;Stable state is stored in using transient state connecting line 3-7 as new stable state connecting line In connecting line set.For No. 4 titles of end heading character connected domain of the stable state connecting line 0-1-4 in stable state connecting line set Character connected domain:The company of No. 4 heading character connected domains and its neighbours' heading character No. 5 heading character connected domain of connected domain will be connected The connecting line of No. 4 heading character connected domains of wiring 4-5 and connection and its neighbours' heading character No. 6 heading character connected domain of connected domain 4-6 is as transient state connecting line;Judge that No. 5 heading character connected domains are all unsatisfactory for second group of predetermined condition, No. 6 heading character connections Domain meets second group of predetermined condition;It is stored in transient state connecting line 4-5 as new stable state connecting line in stable state connecting line set, Stable state connecting line 0-1-4 is updated to 0-1-4-6.Therefore, by above step, stored in stable state connecting line set following Stable state connecting line:1-2;1-3;2-5;3-7;4-5 and 0-1-4-6.
In the manner, stable state connecting line set determination unit 2411 can repeat to be directed to stable state connecting line set In the end heading character connected domain of each stable state connecting line the step of performing operations described above, until stable state connects Untill neighbours' heading character connected domain is not present in the end heading character connected domain of each stable state connecting line in line set.Most Afterwards, determine to be combined into for the stable state connecting line collection for starting heading character No. 0 heading character connected domain of connected domain:1-2;1-3;2-5; 3-7;4-5;0-1-4-6-8-9;5-6 and 6-7.
Next, output unit 2412 can be chosen comprising heading character connected domain number most from stable state connecting line set More stable state connecting line 0-1-4-6-8-9 is as the title connecting line searched out by No. 0 heading character connected domain.
According to an embodiment of the invention, the title connecting line that is not belonging to search out can be traveled through by finding unit 241 Heading character connected domain finds title connecting line as heading character connected domain is started.For example, searching out title connecting line After 0-1-4-6-8-9, finding unit 241 can determine that No. 2 heading character connected domains connect as next beginning heading character Title connecting line is found in logical domain, thereby determines that unit 242 can be with multiple title connecting lines of newspaper image.Next, header area Domain acquiring unit 250 obtains multiple Title areas of newspaper image by combining the heading character connected domain on title connecting line. Thus, it is possible to the fast and reliable automatic Title area for obtaining newspaper image in ground.
Figure 15 be according to the embodiment of the present invention obtain Title area after newspaper image schematic diagram.Such as Figure 15 institutes Show, two Title areas of newspaper image are shown with black box.
According to an embodiment of the invention, Title area acquiring unit 250 is by combining the heading character on title connecting line Connected domain obtains multiple Title areas of newspaper image.Under normal circumstances, the Title area of the newspaper image thus obtained is Accurately.However, the influence of the presence and some other small connected domain due to punctuation mark, may have extraction not Complete title.As shown in figure 15, " Acheng six " actually should also be a part for Title area, but due to being reported in extraction During the connected domain of paper image, " six " word has been split into three small connected domains, and is connected merging adjacent connected domain with overlapping The character connected domain of " six " word also can be surrounded without acquisition well during domain, causes to determine to connect in connecting line determination unit 240 During wiring, not by " these three the character connected domains of Acheng six " are connected with following title connecting line.
In order to solve the above-mentioned technical problem, the present invention proposes the image processing apparatus of another embodiment.Figure 16 is root According to the structure diagram of the image processing apparatus 200 of an alternative embodiment of the invention.
As shown in figure 16, image processing apparatus 200 can include connected domain acquiring unit 210, character connected domain determines list Member 220, heading character determination unit 230, connecting line determination unit 240, connecting line updating block 260 and Title area obtain single Member 250.Connected domain acquiring unit 210 depicted herein, character connected domain determination unit 220, heading character determination unit 230, Connecting line determination unit 240 and Title area acquiring unit 250 can use previously described connected domain acquiring unit 210, word Accord with connected domain determination unit 220, heading character determination unit 230, connecting line determination unit 240 and Title area acquiring unit 250, details are not described herein.Connecting line updating block 260 is described below.
According to an embodiment of the invention, when remaining heading character connected domain meets the 3rd group of predetermined condition, connecting line is more New unit 260 can will meet the remaining heading character connected domain and the head end mark of a title connecting line of the 3rd group of predetermined condition Autograph symbol connected domain or end heading character connected domain are connected to update this title connecting line.Wherein, remaining lemma It is to be not belonging to the heading character connected domain of one or more title connecting line to accord with connected domain.
According to an embodiment of the invention, connecting line updating block 260 can obtain newspaper figure from connecting line determination unit 240 All title connecting lines of picture, and all heading characters connection of newspaper image can be obtained from heading character determination unit 230 Domain, so that it is determined that being not belonging to those heading character connected domains of title connecting line as remaining heading character connected domain.According to this The embodiment of invention, connecting line updating block 260 can determine all remaining heading character connected domains of newspaper image.
Next, connecting line updating block 260 may determine that in all remaining heading character connected domains of newspaper image Whether each residue heading character connected domain meets the 3rd group of predetermined condition.Specifically, connecting line updating block 260 may determine that Whether one remaining heading character connected domain meets the 3rd group of predetermined condition with a title connecting line.
According to an embodiment of the invention, the 3rd group of predetermined condition can include the ruler for remaining heading character connected domain Very little, stroke width, with the slope of the connecting line of the heading character connected domain on title connecting line, to this title connecting line Distance and this title connecting line on heading character connected domain between minimum distance constraint.
As a specific example, the 3rd group of predetermined condition can include:A. remaining caption text connected domain The head end caption text connected domain of length and title connecting line either end caption text connected domain length (or this The median of the length of all heading character connected domains on title connecting line) difference be less than the 8th threshold value;B. this is remaining The width of caption text connected domain is connected with the head end caption text connected domain or end caption text of this title connecting line The difference of the width (or median of the width of all heading character connected domains on this title connecting line) in domain is less than the Nine threshold values;C. the connection of this remaining caption text connected domain and any one heading character connected domain on this title connecting line The slope of line and the difference of the slope of this title connecting line are less than the tenth threshold value;D. in this remaining caption text connected domain The distance of the heart to this title connecting line is less than the 11st threshold value;E. this remaining caption text connected domain is connected with this title The minimum distance between heading character connected domain on line is less than the 12nd threshold value;And f. this remaining caption text connected domain Stroke width and the head end caption text connected domain of this title connecting line or the stroke width of end caption text connected domain The difference for spending (or median of the stroke width of all heading character connected domains on this title connecting line) is less than the tenth Three threshold values.
According to an embodiment of the invention, above-mentioned each threshold value can be set according to actual needs or experience, can also Above-mentioned each threshold value is set according to certain criterion.For the condition e in the 3rd group of predetermined condition, due to remaining heading character There may be punctuation mark between connected domain and corresponding title connecting line, therefore should be than judging neighbours in Rule of judgment e It is loose during heading character connected domain.That is, the 12nd threshold value is greater than 2* (min (max (pw,ph),max(qw, qh))), wherein, pwRepresent the width of remaining heading character connected domain, phRepresent the height of remaining heading character connected domain, qwTable The width of the nearest heading character connected domain of the distance residue heading character connected domain, q on indicating topic connecting linehRepresent that title connects The height of the nearest heading character connected domain of the distance residue heading character connected domain in wiring.
According to an embodiment of the invention, when remaining heading character connected domain meets the 3rd group of predetermined condition, connecting line is more The remaining heading character connected domain for meeting the 3rd group of predetermined condition can be somebody's turn to do by new unit 260 with distance on this title connecting line The nearest heading character connected domain of remaining heading character connected domain is connected to update this title connecting line.
Under normal circumstances, the caption text connection that distance residue caption text connected domain is nearest on a title connecting line Domain is the head end caption text connected domain or end caption text connected domain of the title connecting line.Therefore, when remaining lemma When according with connected domain the 3rd group of predetermined condition of satisfaction, connecting line updating block 260 can will meet the residue of the 3rd group of predetermined condition Heading character connected domain is connected with the head end heading character connected domain or end heading character connected domain of this title connecting line Connect to update this title connecting line.
According to an embodiment of the invention, after connecting line updating block 260 have updated title connecting line, Title area obtains Unit 250 can combine the heading character connected domain on the title connecting line after identical renewal to obtain newspaper image One or more Title area.
According to an embodiment of the invention, title connecting line can be post-processed, so as to eliminate punctuation mark and Influence of the small connected domain to title connecting line, so that the title connecting line obtained is more accurate.
Figure 17 be according to an alternative embodiment of the invention obtain Title area after newspaper image schematic diagram.Such as Shown in Figure 17, after have passed through post processing, the Title area of the newspaper image of acquisition includes " Acheng six ", so that more accurate Ground obtains the Title area of newspaper image.
The foregoing describe image processing apparatus according to the present invention.Image processing method according to the present invention will be described below Method.
Figure 18 is the flow chart according to the image processing method of the embodiment of the present invention.
As shown in figure 18, in step S1810, multiple connected domains of newspaper image are obtained.
Next, in step S1820, merge overlapping connected domain in multiple connected domains and adjacent connected domain is more to obtain A character connected domain.
Next, in step S1830, multiple heading character connected domains are determined from multiple character connected domains.
Next, in step S1840, determine that one or more title connects according to multiple heading character connected domains Line.
Next, in step S1850, combination is located at the heading character connected domain on identical title connecting line to obtain One or more Title area of newspaper image.
Preferably, the overlapping connected domain in multiple connected domains and adjacent connected domain are merged to obtain multiple character connected domain bags Include:When the boundary rectangle frames of two connected domains, there are during overlapping region, determine that two connected domains are overlapping connected domain;When two companies The distance between closest two side of boundary rectangle frame in logical domain is less than first threshold, and merges the two connected domains When the aspect ratio of the boundary rectangle frame of connected domain afterwards is less than second threshold with 1 difference, it is adjacent to determine two connected domains Connected domain;And merge overlapping connected domain, merge adjacent connected domain, and using the multiple connected domains obtained after merging as word Accord with connected domain.
Preferably, determine that multiple heading character connected domains include from multiple character connected domains:By multiple character connected domains The character connected domain that middle size is more than the 3rd threshold value is determined as heading character connected domain.
Preferably, determine that multiple heading character connected domains include from multiple character connected domains:Determine multiple character connections The size of all character connected domains in domain;The number of character connected domain is counted using the size of character connected domain as variable;With And the 3rd threshold value is determined according to the size with the maximum number of character connected domain.
Preferably, determine that one or more title connecting line includes according to multiple heading character connected domains:Traversal does not belong to In the heading character connected domain of the title connecting line searched out title connection is found as heading character connected domain is started Line;All title connecting lines searched out are determined as one or more title connecting line.
Preferably, finding title connecting line includes:Repeat for the stable state connecting line set for starting heading character connected domain In the end heading character connected domain of each stable state connecting line the step of performing following operation, until in stable state connecting line set Each stable state connecting line end heading character connected domain be not present neighbours' heading character connected domain untill:End will be connected The connecting line of neighbours' heading character connected domain of heading character connected domain and end heading character connected domain is as transient state connecting line; When neighbours' heading character connected domain of end heading character connected domain meets predetermined condition, by neighbours' heading character connected domain institute Transient state connecting line be connected with the stable state connecting line where the heading character connected domain of end to update stable state connecting line set Stable state connecting line where the end heading character connected domain of middle storage;And neighbours' title when end heading character connected domain When character connected domain is unsatisfactory for predetermined condition, connect the transient state connecting line where neighbours' heading character connected domain as new stable state Wiring is stored in stable state connecting line set;And most steady of heading character connected domain will be included in stable state connecting line set State connecting line is determined as a title connecting line.
Preferably, title connecting line is found to further include:The stable state connecting line set for starting heading character connected domain is initial Turn to including following stable state connecting line:Connection starts heading character connected domain with starting neighbours' lemma of heading character connected domain Accord with the connecting line of connected domain.
Preferably, predetermined condition include for end heading character connected domain neighbours' heading character connected domain length, Width, the slope of transient state connecting line at place and the pact of distance to the stable state connecting line where the heading character connected domain of end Beam.
Preferably, image processing method further includes:When remaining heading character connected domain meets predetermined condition, will meet pre- The remaining heading character connected domain of fixed condition and the head end heading character connected domain or end lemma of a title connecting line Symbol connected domain is connected to update a title connecting line, and remaining heading character connected domain is to be not belonging to one or more title The heading character connected domain of connecting line.
Preferably, predetermined condition includes the size for remaining heading character connected domain, stroke width, connects with a title The slope of the connecting line of heading character connected domain in wiring, to a title connecting line distance, with a title connecting line On heading character connected domain between minimum distance constraint.
Image processing method described above can be by image processing apparatus 200 according to an embodiment of the invention real Existing, therefore, the various embodiments of image processing apparatus 200 described above are suitable for this, and this will not be repeated here.
It can be seen from the above that using image processing apparatus according to the present invention and image processing method, can be overlapping by merging Connected domain and adjacent connected domain obtain the character connected domain of newspaper image, determine heading character connection from character connected domain Domain, and one or more title connecting line can be determined according to heading character connected domain, so as to combine positioned at identical Heading character connected domain on title connecting line is to obtain the Title area of newspaper image.So, can fast and reliablely The Title area of newspaper image is automatically extracted, so as to save substantial amounts of manpower and materials, improves the mark effect of Digital Newspapers title Rate.
Obviously, each operating process of image processing method according to the present invention can be various machine readable to be stored in The mode of computer executable program in storage medium is realized.
Moreover, the purpose of the present invention can also be accomplished in the following manner:Above-mentioned executable program code will be stored with Storage medium is directly or indirectly supplied to system or equipment, and computer or central processing in the system or equipment Unit (CPU) reads and performs above procedure code.At this time, as long as the system or equipment have the function of executive program, then Embodiments of the present invention are not limited to program, and the program can also be arbitrary form, for example, target program, explanation The program or be supplied to shell script of operating system etc. that device performs.
These above-mentioned machinable mediums include but not limited to:Various memories and storage unit, semiconductor equipment, Disk cell such as light, magnetic and magneto-optic disk, and other media for being suitable for storage information etc..
In addition, computer is by the corresponding website that is connected on internet, and by the computer program according to the present invention Code is downloaded and is installed in computer and then performs the program, can also realize technical scheme.
Figure 19 is the exemplary knot for the general purpose personal computer that can wherein realize image processing method according to the present invention The block diagram of structure.
As shown in figure 19, CPU 1901 according to the program stored in read-only storage (ROM) 1902 or from storage part 1908 programs for being loaded into random access memory (RAM) 1903 perform various processing.In RAM 1903, deposited also according to needs Store up the data required when CPU 1901 performs various processing etc..CPU 1901, ROM 1902 and RAM 1903 are via bus 1904 are connected to each other.Input/output interface 1905 is also connected to bus 1904.
Components described below is connected to input/output interface 1905:Importation 1906 (including keyboard, mouse etc.), output Part 1907 (including display, such as cathode-ray tube (CRT), liquid crystal display (LCD) etc., and loudspeaker etc.), storage Part 1908 (including hard disk etc.), communications portion 1909 (including network interface card such as LAN card, modem etc.).Communication Part 1909 performs communication process via network such as internet.As needed, driver 1910 can be connected to input/output Interface 1905.Detachable media 1911 such as disk, CD, magneto-optic disk, semiconductor memory etc. are installed in as needed On driver 1910 so that the computer program read out is mounted in storage part 1908 as needed.
It is such as removable from network such as internet or storage medium in the case where realizing above-mentioned series of processes by software Unload the program that the installation of medium 1911 forms software.
It will be understood by those of skill in the art that this storage medium is not limited to wherein be stored with journey shown in Figure 19 Sequence and equipment are separately distributed to provide a user the detachable media 1911 of program.The example bag of detachable media 1911 Containing disk (including floppy disk (registration mark)), CD (comprising compact disc read-only memory (CD-ROM) and digital universal disc (DVD)), Magneto-optic disk (including mini-disk (MD) (registration mark)) and semiconductor memory.Alternatively, storage medium can be ROM 1902, deposit Hard disk included in storage part 1908 etc., wherein computer program stored, and user is distributed to together with the equipment comprising them.
In the system and method for the present invention, it is clear that each unit or each step can be decomposed and/or reconfigured. These decompose and/or reconfigure the equivalents that should be regarded as the present invention.Also, the step of performing above-mentioned series of processes can be certainly So perform, but and need not be necessarily performed sequentially in time in chronological order according to the order of explanation.Some steps can To perform parallel or independently of one another.
Although the embodiment of the present invention is described in detail with reference to attached drawing above, it is to be understood that reality described above The mode of applying is only intended to the explanation present invention, and is not construed as limiting the invention.For those skilled in the art, may be used To make various changes and modifications the above embodiment without departing from the spirit and scope of the invention.Therefore, it is of the invention Scope is only limited by appended claim and its equivalents.
On the embodiment including above example, following note is also disclosed:
A kind of 1. image processing apparatus are attached, including:
Connected domain acquiring unit, for obtaining multiple connected domains of newspaper image;
Character connected domain determination unit, for merge overlapping connected domain in the multiple connected domain and adjacent connected domain with Obtain multiple character connected domains;
Heading character determination unit, for determining multiple heading character connected domains from the multiple character connected domain;
Connecting line determination unit, for determining that one or more title connects according to the multiple heading character connected domain Line;And
Title area acquiring unit, is located at the heading character connected domain on identical title connecting line to obtain for combination One or more Title area of the newspaper image.
Image processing apparatus of the note 2. according to note 1, wherein, the character connected domain determination unit includes:
Overlapping connected domain determination unit, for when the boundary rectangle frame of two connected domains is there are during overlapping region, determining institute It is overlapping connected domain to state two connected domains;
Adjacent connected domain determination unit, for when between closest two side of the boundary rectangle frame of two connected domains Distance be less than first threshold, and merge the aspect ratio and 1 of the boundary rectangle frame of the connected domain after the two connected domains When difference is less than second threshold, it is adjacent connected domain to determine described two connected domains;And
Combining unit, for merging overlapping connected domain, merges adjacent connected domain, and the multiple companies that will be obtained after merging Logical domain is as character connected domain.
Image processing apparatus of the note 3. according to note 1, wherein, the heading character determination unit includes:
Comparing unit, the character connected domain for size in the multiple character connected domain to be more than to the 3rd threshold value are determined as Heading character connected domain.
Image processing apparatus of the note 4. according to note 3, wherein, the heading character determination unit further includes:
Size determination unit, for determining the size of all character connected domains in the multiple character connected domain;
Statistic unit, for counting the number of character connected domain using the size of character connected domain as variable;And
Threshold value determination unit, for determining the 3rd threshold according to the size with the maximum number of character connected domain Value.
Image processing apparatus of the note 5. according to note 1, wherein, the connecting line determination unit includes:
Unit is found, the heading character connected domain for traveling through the title connecting line for being not belonging to search out, which is used as, to be started Heading character connected domain finds title connecting line;And
Determination unit, connects for all title connecting lines searched out to be determined as one or more of titles Wiring.
Image processing apparatus of the note 6. according to note 5, wherein, the searching unit includes:
Stable state connecting line set determination unit, for repeating the stable state connecting line for the beginning heading character connected domain The step of end heading character connected domain of each stable state connecting line in set performs following operation, until the stable state connects Untill neighbours' heading character connected domain is not present in the end heading character connected domain of each stable state connecting line in line set:
The neighbours' heading character for connecting the end heading character connected domain and the end heading character connected domain is connected The connecting line in logical domain is as transient state connecting line;
When neighbours' heading character connected domain of the end heading character connected domain meets predetermined condition, by the neighbours Transient state connecting line where heading character connected domain is connected with the stable state connecting line where the end heading character connected domain To update the stable state connecting line where the end heading character connected domain stored in the stable state connecting line set;And
When neighbours' heading character connected domain of the end heading character connected domain is unsatisfactory for the predetermined condition, by institute Transient state connecting line where stating neighbours' heading character connected domain is stored in the stable state connecting line collection as new stable state connecting line In conjunction;And
Output unit, for the stable states most comprising heading character connected domain in the stable state connecting line set to be connected Line is determined as a title connecting line.
Image processing apparatus of the note 7. according to note 6, wherein, the stable state connecting line set determination unit is also wrapped Include:
Initialization unit, for by the stable state connecting line set of the beginning heading character connected domain be initialized as including with Lower stable state connecting line:Connect neighbours' heading character of the beginning heading character connected domain and the beginning heading character connected domain The connecting line of connected domain.
Image processing apparatus of the note 8. according to note 6, wherein, the predetermined condition includes marking for the end The length of neighbours' heading character connected domain of autograph symbol connected domain, width, place transient state connecting line slope and to described The constraint of the distance of stable state connecting line where the heading character connected domain of end.
Image processing apparatus of the note 9. according to note 1, wherein, described image processing unit further includes:
Connecting line updating block, it is described predetermined for that when remaining heading character connected domain meets predetermined condition, will meet The remaining heading character connected domain of condition and the head end heading character connected domain or end title of one title connecting line Character connected domain is connected to update one title connecting line, and the residue heading character connected domain is to be not belonging to described one The heading character connected domain of a or multiple title connecting lines.
Image processing apparatus of the note 10. according to note 9, wherein, the predetermined condition includes being directed to the residue The size of heading character connected domain, stroke width, the connecting line with the heading character connected domain on one title connecting line Slope, between the heading character connected domain in the distance and one title connecting line of one title connecting line Minimum distance constraint.
A kind of 11. image processing methods are attached, including:
Obtain multiple connected domains of newspaper image;
Merge the overlapping connected domain in the multiple connected domain and adjacent connected domain to obtain multiple character connected domains;
Multiple heading character connected domains are determined from the multiple character connected domain;
One or more title connecting line is determined according to the multiple heading character connected domain;And
Heading character connected domain of the combination on the identical title connecting line with obtain one of the newspaper image or The multiple Title areas of person.
Image processing method of the note 12. according to note 11, wherein, merge the overlapping company in the multiple connected domain Logical domain and adjacent connected domain are included with obtaining multiple character connected domains:
When the boundary rectangle frames of two connected domains, there are during overlapping region, determine described two connected domains for overlapping connection Domain;
When the distance between closest two side of boundary rectangle frame of two connected domains is less than first threshold, and Merge the aspect ratio of boundary rectangle frame of the connected domain after the two connected domains and when 1 difference is less than second threshold, it is definite Described two connected domains are adjacent connected domain;And
Merge overlapping connected domain, merge adjacent connected domain, and using the multiple connected domains obtained after merging as character Connected domain.
Image processing method of the note 13. according to note 11, wherein, determined from the multiple character connected domain more A heading character connected domain includes:
The character connected domain that size in the multiple character connected domain is more than to the 3rd threshold value is determined as heading character connection Domain.
Image processing method of the note 14. according to note 13, wherein, determined from the multiple character connected domain more A heading character connected domain includes:
Determine the size of all character connected domains in the multiple character connected domain;
The number of character connected domain is counted using the size of character connected domain as variable;And
The 3rd threshold value is determined according to the size with the maximum number of character connected domain.
Image processing method of the note 15. according to note 11, wherein, it is true according to the multiple heading character connected domain One or more fixed title connecting line includes:
The heading character connected domain that traversal is not belonging to the title connecting line searched out is connected as heading character is started Title connecting line is found in domain;
All title connecting lines searched out are determined as one or more of title connecting lines.
Image processing method of the note 16. according to note 15, wherein, finding the title connecting line includes:
Repeat the end of each stable state connecting line in the stable state connecting line set for the beginning heading character connected domain The step of holding heading character connected domain to perform following operation, until each stable state connecting line in the stable state connecting line set Untill all neighbours' heading character connected domain is not present in end heading character connected domain:
The neighbours' heading character for connecting the end heading character connected domain and the end heading character connected domain is connected The connecting line in logical domain is as transient state connecting line;
When neighbours' heading character connected domain of the end heading character connected domain meets predetermined condition, by the neighbours Transient state connecting line where heading character connected domain is connected with the stable state connecting line where the end heading character connected domain To update the stable state connecting line where the end heading character connected domain stored in the stable state connecting line set;And
When neighbours' heading character connected domain of the end heading character connected domain is unsatisfactory for the predetermined condition, by institute The transient state connecting line where neighbours' heading character connected domain is stated as new stable state
Connecting line is stored in the stable state connecting line set;And
The stable state connecting lines most comprising heading character connected domain in the stable state connecting line set are determined as one Title connecting line.
Image processing method of the note 17. according to note 16, wherein, find the title connecting line and further include:
The stable state connecting line set of the beginning heading character connected domain is initialized as including following stable state connecting line:Even Connect the connecting line of the beginning heading character connected domain and neighbours' heading character connected domain of the beginning heading character connected domain.
Image processing method of the note 18. according to note 16, wherein, the predetermined condition includes being directed to the end The length of neighbours' heading character connected domain of heading character connected domain, width, place transient state connecting line slope and to institute State the constraint of the distance of the stable state connecting line where the heading character connected domain of end.
Image processing method of the note 19. according to note 11, further includes:
When remaining heading character connected domain meets predetermined condition, the remaining heading character for meeting the predetermined condition is connected Logical domain is connected with more with the head end heading character connected domain or end heading character connected domain of one title connecting line New one title connecting line, the residue heading character connected domain is to be not belonging to one or more of title connecting lines Heading character connected domain.
A kind of 20. machinable mediums are attached, carry the machine readable instructions generation including being stored therein thereon The program product of code, wherein, described instruction code can make the computer perform basis when being read by computer and being performed It is attached the image processing method any one of 11-19.

Claims (10)

1. a kind of image processing apparatus, including:
Connected domain acquiring unit, for obtaining multiple connected domains of newspaper image;
Character connected domain determination unit, for merging overlapping connected domain in the multiple connected domain and adjacent connected domain to obtain Multiple character connected domains;
Heading character determination unit, for determining multiple heading character connected domains from the multiple character connected domain;
Connecting line determination unit, for determining one or more title connecting line according to the multiple heading character connected domain; And
Title area acquiring unit, it is described to obtain for combining the heading character connected domain on identical title connecting line One or more Title area of newspaper image.
2. image processing apparatus according to claim 1, wherein, the character connected domain determination unit includes:
Overlapping connected domain determination unit, for when the boundary rectangle frame of two connected domains is there are during overlapping region, determining described two A connected domain is overlapping connected domain;
Adjacent connected domain determination unit, for when between closest two side of the boundary rectangle frame of two connected domains away from From less than first threshold, and merge the aspect ratio of the boundary rectangle frame of the connected domain after the two connected domains and 1 difference During less than second threshold, it is adjacent connected domain to determine described two connected domains;And
Combining unit, for merging overlapping connected domain, merges adjacent connected domain, and the multiple connected domains that will be obtained after merging As character connected domain.
3. image processing apparatus according to claim 1, wherein, the heading character determination unit includes:
Comparing unit, the character connected domain for size in the multiple character connected domain to be more than to the 3rd threshold value are determined as title Character connected domain.
4. image processing apparatus according to claim 3, wherein, the heading character determination unit further includes:
Size determination unit, for determining the size of all character connected domains in the multiple character connected domain;
Statistic unit, for counting the number of character connected domain using the size of character connected domain as variable;And
Threshold value determination unit, for determining the 3rd threshold value according to the size with the maximum number of character connected domain.
5. image processing apparatus according to claim 1, wherein, the connecting line determination unit includes:
Unit is found, for traveling through the heading character connected domain for the title connecting line for being not belonging to search out as beginning title Character connected domain finds title connecting line;And
Determination unit, connects for all title connecting lines searched out to be determined as one or more of titles Line.
6. image processing apparatus according to claim 5, wherein, the searching unit includes:
Stable state connecting line set determination unit, for repeating the stable state connecting line set for the beginning heading character connected domain In the end heading character connected domain of each stable state connecting line the step of performing following operation, until the stable state connecting line collection Untill neighbours' heading character connected domain is not present in the end heading character connected domain of each stable state connecting line in conjunction:
Neighbours' heading character connected domain of the end heading character connected domain and the end heading character connected domain will be connected Connecting line as transient state connecting line;
When neighbours' heading character connected domain of the end heading character connected domain meets predetermined condition, by neighbours' title Transient state connecting line where character connected domain is connected with more with the stable state connecting line where the end heading character connected domain Stable state connecting line where the end heading character connected domain stored in the new stable state connecting line set;And
When neighbours' heading character connected domain of the end heading character connected domain is unsatisfactory for the predetermined condition, by the neighbour Transient state connecting line where occupying heading character connected domain is stored in the stable state connecting line set as new stable state connecting line; And
Output unit, for the stable state connecting lines most comprising heading character connected domain in the stable state connecting line set are true It is set to a title connecting line.
7. image processing apparatus according to claim 6, wherein, the stable state connecting line set determination unit further includes:
Initialization unit, it is following steady for the stable state connecting line set of the beginning heading character connected domain to be initialized as including State connecting line:Neighbours' heading character that the beginning heading character connected domain is connected with the beginning heading character connected domain connects The connecting line in domain.
8. image processing apparatus according to claim 6, wherein, the predetermined condition includes being directed to the end lemma Accord with the length of neighbours' heading character connected domain of connected domain, width, place transient state connecting line slope and to the end The constraint of the distance of stable state connecting line where heading character connected domain.
9. image processing apparatus according to claim 1, wherein, described image processing unit further includes:
Connecting line updating block, for that when remaining heading character connected domain meets predetermined condition, will meet the predetermined condition Remaining heading character connected domain and one title connecting line head end heading character connected domain or end heading character Connected domain is connected to update one title connecting line, the residue heading character connected domain be not belonging to it is one or The heading character connected domain of the multiple title connecting lines of person.
10. a kind of image processing method, including:
Obtain multiple connected domains of newspaper image;
Merge the overlapping connected domain in the multiple connected domain and adjacent connected domain to obtain multiple character connected domains;
Multiple heading character connected domains are determined from the multiple character connected domain;
One or more title connecting line is determined according to the multiple heading character connected domain;And
Heading character connected domain of the combination on the identical title connecting line is to obtain one of the newspaper image or more A Title area.
CN201610921297.4A 2016-10-21 2016-10-21 Image processing apparatus and image processing method Pending CN107977593A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610921297.4A CN107977593A (en) 2016-10-21 2016-10-21 Image processing apparatus and image processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610921297.4A CN107977593A (en) 2016-10-21 2016-10-21 Image processing apparatus and image processing method

Publications (1)

Publication Number Publication Date
CN107977593A true CN107977593A (en) 2018-05-01

Family

ID=62003866

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610921297.4A Pending CN107977593A (en) 2016-10-21 2016-10-21 Image processing apparatus and image processing method

Country Status (1)

Country Link
CN (1) CN107977593A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109558876A (en) * 2018-11-20 2019-04-02 浙江口碑网络技术有限公司 Character recognition processing method and device
CN109948413A (en) * 2018-12-29 2019-06-28 禾多科技(北京)有限公司 Method for detecting lane lines based on the fusion of high-precision map

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090316219A1 (en) * 2008-06-18 2009-12-24 Canon Kabushiki Kaisha Image processing apparatus, image processing method and computer-readable storage medium
CN102855264A (en) * 2011-07-01 2013-01-02 富士通株式会社 Method and device for document processing
CN103034842A (en) * 2012-12-05 2013-04-10 上海合合信息科技发展有限公司 Professional notebook computer facilitating electronization and electronic thumbnail photo display method thereof
CN103093228A (en) * 2013-01-17 2013-05-08 上海交通大学 Chinese detection method in natural scene image based on connected domain
CN103839060A (en) * 2012-11-26 2014-06-04 阿里巴巴集团控股有限公司 Single-word region combination method and device
CN104573685A (en) * 2015-01-29 2015-04-29 中南大学 Natural scene text detecting method based on extraction of linear structures
US20160086026A1 (en) * 2014-09-23 2016-03-24 Konica Minolta Laboratory U.S.A., Inc. Removal of graphics from document images using heuristic text analysis and text recovery
WO2016069005A1 (en) * 2014-10-31 2016-05-06 Hewlett-Packard Development Company, L.P. Text line detection
CN105844275A (en) * 2016-03-25 2016-08-10 北京云江科技有限公司 Method for positioning text lines in text image
CN105844207A (en) * 2015-01-15 2016-08-10 富士通株式会社 Text line extraction method and text line extraction equipment

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090316219A1 (en) * 2008-06-18 2009-12-24 Canon Kabushiki Kaisha Image processing apparatus, image processing method and computer-readable storage medium
CN102855264A (en) * 2011-07-01 2013-01-02 富士通株式会社 Method and device for document processing
CN103839060A (en) * 2012-11-26 2014-06-04 阿里巴巴集团控股有限公司 Single-word region combination method and device
CN103034842A (en) * 2012-12-05 2013-04-10 上海合合信息科技发展有限公司 Professional notebook computer facilitating electronization and electronic thumbnail photo display method thereof
CN103093228A (en) * 2013-01-17 2013-05-08 上海交通大学 Chinese detection method in natural scene image based on connected domain
US20160086026A1 (en) * 2014-09-23 2016-03-24 Konica Minolta Laboratory U.S.A., Inc. Removal of graphics from document images using heuristic text analysis and text recovery
WO2016069005A1 (en) * 2014-10-31 2016-05-06 Hewlett-Packard Development Company, L.P. Text line detection
CN105844207A (en) * 2015-01-15 2016-08-10 富士通株式会社 Text line extraction method and text line extraction equipment
CN104573685A (en) * 2015-01-29 2015-04-29 中南大学 Natural scene text detecting method based on extraction of linear structures
CN105844275A (en) * 2016-03-25 2016-08-10 北京云江科技有限公司 Method for positioning text lines in text image

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
WEIJUAN WEN等: "An Efficient Method for Text Location and Segmentation", 《2009 WRI WORLD CONGRESS ON SOFTWARE ENGINEERING》 *
张文杰: "基于移动终端的报纸版面分析及识别", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109558876A (en) * 2018-11-20 2019-04-02 浙江口碑网络技术有限公司 Character recognition processing method and device
CN109948413A (en) * 2018-12-29 2019-06-28 禾多科技(北京)有限公司 Method for detecting lane lines based on the fusion of high-precision map
CN109948413B (en) * 2018-12-29 2021-06-04 禾多科技(北京)有限公司 Lane line detection method based on high-precision map fusion

Similar Documents

Publication Publication Date Title
CN104035997B (en) Scientific and technical information acquisition and pushing method based on text classification and image deep mining
CN110795919B (en) Form extraction method, device, equipment and medium in PDF document
CN101887523B (en) Method for detecting image spam email by picture character and local invariant feature
CN111046784A (en) Document layout analysis and identification method and device, electronic equipment and storage medium
CN105719243B (en) Image processing apparatus and method
CN102968637A (en) Complicated background image and character division method
CN102193946A (en) Method and system for adding tags into media file
CN108205676B (en) The method and apparatus for extracting pictograph region
Prusty et al. Indiscapes: Instance segmentation networks for layout parsing of historical indic manuscripts
CN110334217A (en) A kind of element abstracting method, device, equipment and storage medium
CN103617192B (en) The clustering method and device of a kind of data object
CN113780229A (en) Text recognition method and device
CN112883926B (en) Identification method and device for form medical images
CN112434555B (en) Key value pair region identification method and device, storage medium and electronic equipment
CN111814425A (en) Book automatic typesetting implementation method based on book character information
CN110276352A (en) Index identification method, device, electronic equipment and computer readable storage medium
CN106227808A (en) A kind of method removing mail interference information and method for judging rubbish mail
CN107977593A (en) Image processing apparatus and image processing method
CN109461195A (en) A kind of chart extracting method, device and equipment based on SVG
CN107291774A (en) Error sample recognition methods and device
Yang et al. Semi-automatic ground truth generation for chart image recognition
CN109726369A (en) A kind of intelligent template questions record Implementation Technology based on normative document
CN114119949A (en) Method and system for generating enhanced text synthetic image
US20100299535A1 (en) Method and apparatus for extracting raster images from portable electronic document
CN112149654B (en) Invoice text information identification method based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20180501

WD01 Invention patent application deemed withdrawn after publication