CN1004386B - Image understanding system - Google Patents

Image understanding system Download PDF

Info

Publication number
CN1004386B
CN1004386B CN85106850.2A CN85106850A CN1004386B CN 1004386 B CN1004386 B CN 1004386B CN 85106850 A CN85106850 A CN 85106850A CN 1004386 B CN1004386 B CN 1004386B
Authority
CN
China
Prior art keywords
file
image
text
statement
grammer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired
Application number
CN85106850.2A
Other languages
Chinese (zh)
Other versions
CN85106850A (en
Inventor
中野康明
藤泽浩道
东野纯一
江尼正员
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hitachi Ltd
Original Assignee
Hitachi Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi Ltd filed Critical Hitachi Ltd
Priority to CN85106850.2A priority Critical patent/CN1004386B/en
Publication of CN85106850A publication Critical patent/CN85106850A/en
Publication of CN1004386B publication Critical patent/CN1004386B/en
Expired legal-status Critical Current

Links

Images

Landscapes

  • Processing Or Creating Images (AREA)

Abstract

The present invention relates to an image understanding system which uses syntax for describing an image document, and represents the structure of an unknown input image by analyzing a sentence (namely the structure of the syntax) written according to the syntax. In other words, the image is described by the syntax as substructures and the relative relations among the substructures. In syntactic analysis, after the substructures and the relative relations among the substructures are identified, search is carried out in the unknown input image for the existence of the substructures and the relative relations among the substructures, and the structure of the unknown input image is represented according to the result of the search.

Description

Image understanding system
In general, the present invention relates to a kind of document image disposal system, more particularly, related to a kind of document image disposal system that is suitable for doing the input block of e-file video memory.
Some common e-file storeies are just stored each page of file as an image, and the supplementary that is used for information retrieval must be imported from the outside individually with code input device (such as keyboard).Yet in order to make file input service robotization, preferably by automatically reading the exercise question described in the file, author's name etc. generates supplementary.In order further to improve information retrieval, need to realize the automatic input of Chart Title and chapter title, perhaps pass through the identification Automatic Extraction keyword of text itself, require simultaneously the image of file destination is divided into each several part, such as title, author, summary, text, numeral, illustration etc., to reduce storage space and to increase the degree of freedom of retrieving.
In order to address the above problem, worked out at present one and can understand file content and according to the system of the result treatment file of understanding, disclosed in " basic research of articles published in newspapers or periodicals montage system " by You Jienuoguqi (Yoji Noguchi) and Zhu Niqituoyate (Junichi Toyoter) is exactly an example (the 23rd national congress's file of Information Processing Society of Japan plucked need 6C-1) of this system.Yet, owing to this file understanding system develops at the montage of newspaper, so not clear whether can applying it in the file with arbitrary format.In addition, it is just partly cut apart character, does not have open and cut apart and discern the method that combines.
The purpose of this invention is to provide a kind of Image understanding system, it can handle general document image, according to their structure they is cut apart, and making it may identification character part when wanting of what is the need in office.
In order to finish above-mentioned purpose, the present invention used a kind of description document image structure grammer and the statement (structure of file) with this syntactic representation analyzed, with the structure of the unknown image of identification input.This grammer is image description some minor structures and the relativeness between them thereof.In analytic process, discerned after minor structure and their relativeness, do search, see in the unknown input imagery whether have these minor structures and relativeness, if exist, further decompose the inside of these minor structures again, to analyze; If there is no, then search for other possibility.We can understand the structure of the unknown image of input from a kind of like this result of search.
Be brief description of the drawings below
Fig. 1 represents an example of file;
Fig. 2 represents one embodiment of the invention;
Fig. 3,4,5 and 6 is process flow diagrams, is used for the processing procedure of control module shown in the key diagram 2;
Fig. 7 is the reference diagram of a file example of expression;
Fig. 8,9,10,11,12 and 13 is the key drawings that are used to explain the 4th embodiment principle of work of the present invention;
Figure 14 and Figure 15 are process flow diagrams, are used for the processing of explaining that the 4th embodiment control module 102 of the present invention is carried out;
Figure 16 and Figure 17 are key drawings, are used to explain the contents processing shown in Figure 15.
Before these embodiments of description, let us is at first explained the parsing method of embodiment of the present invention.Though be to be that example is described with the technical papers below, because syntax format has difference to a certain degree, so, also the present invention can be applied in other file and go by changing some part of grammer.Therefore, the present invention is not subjected to the concrete restriction of technical papers.
Fig. 1 example one page have the technical papers of predetermined format, the grammer example (being called " file grammer " later on) of an expression file structure will be described below.
(original text is capable)
<file〉∷=|<technical papers〉|<paperback edition novel〉|~|<patent 〉
2.<technical papers〉∷=<exercise question page or leaf 〉
3.<technical papers〉∷=<technical papers〉(+<continue page or leaf〉:)
<exercise question page or leaf〉∷=|<UDC〉η<item content〉η
<author's summary〉η<text〉η<exercise question page or leaf
Separator 〉
5.<continue page or leaf〉∷=<title〉η<text〉η<page or leaf separator 〉
6.<UDC〉∷=" UDC "
Figure 85106850_IMG2
<cycle numeral〉(
Figure 85106850_IMG3
" CL "<cycle numeral 〉)
7.<title〉∷=<Japanese exercise question 〉
Figure 85106850_IMG4
<volume number 〉 <numeral 〉
8.<volume number〉∷=" VOL "
Figure 85106850_IMG6
<numeral 〉 " NO "
Figure 85106850_IMG8
<numeral 〉
9.<item content〉∷=<Japanese exercise question〉η<english title 〉
10.<Japanese exercise question〉∷=<Japanese line of text district 〉
<11. english title〉∷=<English line of text district 〉
<12. author's summary〉∷=<summary〉
Figure 85106850_IMG9
<group of authors 〉
13.<summary〉 ∷=<English line of text district 〉
<14. group of authors〉∷=<author 〉
<15. group of authors〉∷=<group of authors〉(η<author 〉)
<16. author〉∷=<Japanese line of text 〉
Figure 85106850_IMG10
<English line of text 〉
17.<number of pages〉 ∷=<numeral 〉
<18. text〉∷=<hurdle 〉
Figure 85106850_IMG11
<hurdle 〉
<19. hurdle〉∷=<joint〉(η<hurdle 〉)
<20. joint〉∷=<chapter title〉η<section header 〉
η<joint text 〉
<21. joint〉∷=<section header〉η<joint text 〉
<22. joint〉∷=<joint text 〉
<23. joint〉∷=" list of references " η<list of references table 〉
<24. chapter title〉∷=" numeral " <Japanese line of text 〉
<25. section header〉∷=<cycle numeral 〉 <Japanese line of text 〉
<26. joint text〉∷=<section〉(η<joint text 〉)
<27. section〉∷=<Japanese line of text district 〉
<28. section〉∷=<chart 〉
<29. Japanese line of text district〉∷=<Japanese line of text〉η (<Japanese line of text district 〉)
<30. Japanese line of text〉∷=" Japanese character " (<Japanese line of text 〉)
<31. Japanese line of text〉∷=" Japanese character " α (<Japanese line of text 〉)
<32. Japanese line of text〉∷=" Japanese character " β (<Japanese line of text 〉)
<33. English line of text district〉∷=<English line of text〉η (<English line of text district 〉)
<34. English line of text〉∷=<word 〉
Figure 85106850_IMG15
" PLM "
Figure 85106850_IMG16
(<English line of text district 〉)
<35. word〉∷=" letter " (
Figure 85106850_IMG17
<word 〉)
<36. word〉∷=" letter " (α<word 〉)
<37. word〉∷=" letter " (β<word 〉)
<38. word〉∷=| { name of Britain } | { Britain mechanism name } | { Britain ground
Name } || { general English words } |
<39. numeral〉∷=" numeral " (
Figure 85106850_IMG18
<numeral 〉)
<40. cycle numeral〉∷=<numeral 〉
<41. cycle numeral〉∷=<cycle numeral 〉 <numeral 〉
<42. cycle numeral〉∷=<cycle numeral 〉 " FR "
<43. numeral〉∷=| 0|1|~| 9|
" 44. letter " ∷=| a|b|c|d|~| A|B|~| 0|1|~|
" 45. Japanese character " ∷=| あ | レ |~|
Figure 85106850_IMG21
| イ |~| day | upright |~| a|b|~| A|B|~| 0|1|~|
46.《DLM》 ∷=| |,|·|~|
47.《CL》 ∷=|∶|
48.《PR》 ∷=|.|,|〕|
<49. chart〉∷=|-figure-| η<Japanese explanation 〉
<50. chart〉∷=<Japanese explanation〉η<the English explanation〉η<table 〉
<51. chart〉∷=<box 〉
<52. box〉∷=|-field-|
Figure 85106850_IMG22
|<joint 〉
<53. Japanese explanation〉∷=" word-figure "
Figure 85106850_IMG23
<numeral 〉
Figure 85106850_IMG24
<Japanese line of text 〉
<54. Japanese explanation〉∷=" word-Biao "
Figure 85106850_IMG25
<numeral 〉 <Japanese line of text 〉
<55. Japanese explanation〉∷=<Japanese explanation〉η<Japanese line of text district 〉
<56. English the explanation〉∷=" FIG "
Figure 85106850_IMG27
<numeral 〉
Figure 85106850_IMG28
<English line of text 〉
<57. English the explanation〉∷=" TAB "
Figure 85106850_IMG29
<numeral 〉
Figure 85106850_IMG30
<English line of text 〉
<58. English the explanation〉∷=<the English explanation〉η<English line of text district 〉
59.《FIG》 ∷=|Fig.|
60.《TAB》 ∷=|TABle|
" 61. word-Tu " ∷=| illustration |
<62. word-Biao〉∷=| table |
63.《VOL》 ∷=|VOL|
64.《NO》 ∷=|NO.|
65.《UDC》 ∷=|U.D.C.|
<66. table〉∷=<box〉γ<table〉(δ<table 〉)
<67. table〉∷=<box〉δ<table〉(γ<table 〉)
<68. table〉∷=<box 〉
" 69. list of references " ∷=| list of references |
<70. list of references table〉∷=<Japanese list of references〉(η<list of references table 〉)
<71. list of references table〉∷=<English list of references〉(η<list of references table 〉)
<72. Japanese list of references〉∷=<numeral 〉 " PR " <Japanese line of text 〉
<73. Japanese list of references〉∷=<Japanese list of references〉(η<Japanese line of text group 〉)
<74. English list of references〉∷=<numeral 〉
Figure 85106850_IMG33
" PR " <English line of text 〉
<75. English reference〉∷=<English list of references〉(η<English line of text group 〉)
The structure of above-mentioned file syntactic representation ordinary file, but taken passages the part of some relevant technologies papers especially.Now to explain this grammer as example with reference to figure 1.At first explain used symbol.
<〉nonterminal symbol (summary notion)
" " terminal symbol (character string)
{ } terminal symbol (character string in dictionary)
|-| terminal symbol (minor structure in image)
The rule of ∷=rewriting
| or (or)
() can omit
+, , η, α, β,
Figure 85106850_IMG36
, γ, δ are the operator between minor structure.
The various files of the first rule expression of above-mentioned grammer all are suitable for, and technical papers is one of them.The second rule expression is only by exercise question page or leaf (figure.1,1) the one piece of technical papers that constitutes exists.
On behalf of arbitrary number (comprising 0), three sigma rule can be added to certain piece of number of pages that file is last.The 4th rule expression, on the exercise question page or leaf, item content (Fig. 1,3) is positioned at UDC symbol following, and (UDC is general decimal system classification (Fig. 1,2)) " author's summary " (Fig. 1,4) be positioned at the former below, then be text (Fig. 1,7), be " number of pages " (Fig. 1,9) at last.Here, resemble shown in the 12nd rule, " author's summary " expression " group of authors " (Fig. 1,6) is on the right of summary (Fig. 1,5), while this summary is " English line of text district ", as shown in the 13rd rule.Group of authors can be made up of the author of rule shown in 14, perhaps form by many authors, other author (arbitrary number) be added in this group of authors below, as shown in the 15th rule.The author is made of the English line of text of Japanese line of text (name) horizontal integration (name), as shown in the 16th rule.Because text (Fig. 1,7) vertically divides one page equally in this embodiment, so introduced the notion of " hurdle " (Fig. 1,8), text is that horizontal combination by the hurdle constitutes like this, shown in the 18th rule.Each hurdle is made up of some continuous joints, as shown in the 19th rule.The joint text is made up of section, as shown in the 26th rule.Section can be Japanese line of text or chart, shown in rule the 27th and the 28th.The Japanese line of text is by level Consecutive Days Chinese character quilt
Figure 85106850_IMG37
, α and β connect to form, as shown in rule the 30th and the 32nd.Here,
Figure 85106850_IMG38
Represent simple level continuous, α represents the stroke of level, the stroke and β representative level makes progress, and these situations all can occur.So-called Japanese character comprises flat section name, katakana, Chinese character, letter, numeral etc., shown in the 45th rule.
Understand a file, suppose that at first input file is one piece of technical papers, whether sequential search has the structure with this file syntactic description then.Like this, different image processings will be used each operator.For example, vertically continuous because operator η represents some minor structures, so, be used for the continuous processing procedure of detection of vertical minor structure just corresponding to this operator η.An example as this processing has a kind of like this processing, that is: the detection of vertical white pixel is continuous.Same, it is continuous to detect vertical white pixel, and the processing of separating character just belongs to
Figure 85106850_IMG39
; And the inclination that detects white pixel processing continuous and separating character just belongs to β.
The description that provides from above as can be known, the file grammer that is proposed in the present invention hierarchically, the structure of a complicated file has repeatedly been described in circulation.Therefore, those objects that always is difficult for description usually can be described in this grammer, do not have file and the not clear file of minor structure state of determining the text line number such as those.Come the actual relationship of descriptor structure by some operators, check the represented relation of operator by image processing then, we can understand the file of broad variety.
Such as, we will describe optimum implementation of the present invention with reference to the accompanying drawings in detail.
Fig. 2 is the block scheme of the structure of a device of expression, and the document handling system that this device uses is according to one embodiment of the invention.By each ingredient of bus 101 coupling arrangements, control the whole operation of this device by control module 102.Scan and digitizing by the information (document image) on 104 pairs of files 103 of photoelectric conversion device.By bus 101 it is stored in the storer 1051 then.Storer 1051 and the storer 1052,1053,1054 that occurs have later constituted the part of storer 105 together.When the fileinfo digitizing, can realize previously known efficient coding, do the memory span that can save the storage file image like this.
In the following description, digitizing realizes with pixel of every bit, but can represent a pixel with many-valued, can also use the opto-electronic conversion of colour scanner that the pixel with chromatic information further is provided.Use known position correction and rotation correction by 102 pairs of document images of control module and obtain reference image, this reference image is stored in the storer 1052.By the programmed control of control module 102, realize in the following manner the file of this reference image is understood, and the result who understands is delivered to file device 106.
Fig. 3 is a process flow diagram, with PAD(case study diagram) form represented the treatment scheme that file is understood.Step 302 is whole initialization.Step 303 is a kind of circulations, and the processing below it repeats is up to end of file.At step 304 input one page image.Step 305 is cycle controls, and it is according to this one page of file grammar explanation.In step 306, extract a statement, finish grammatical analysis etc. in step 307, and this statement is accepted or is refused in decision to it.The storehouse initialization that is used for the back parser operation is finished in step 307.Stack is positioned at storer 1054.The treatment scheme of step 308 control from step 309 to step 313.Step 309 detects the existence of operator, and this is a group of branches that corresponds respectively to the processing of operator 3091-3093.3091,3092 and 3093 is to correspond respectively to operator
Figure 85106850_IMG40
, the image processing of η and α.We also will be in other these image processings of local detailed description.Whether step 310 detects operator and exists, if do not have, then this processing is withdrawed from 308 circulations or the like in step 313, transfers to the processing (307) of next line of text then.If operator exists, depress storehouse in step 311, make operator be in stack top, step 312 detects the existence of minor structure.The detection of minor structure is differentiated terminal symbol by part 3121(with it) and part 3122(discriminating nonterminal symbol) form.
Carry out the processing of step 307 grade that is used for statement handles 3122 by circulation.The discriminating of terminal symbol is a kind of like this processing, such as the identification of realization character under digital situation, and judges whether the result of this identification belongs to this digital value.
When having finished the explanation of all minor structures and operator with the upper type formula, just finished the understanding work of this one page in this document fully, the result that file is understood comprises the minor structure of (storer 1054) in the stack and its content (character string), and the operator between these minor structures.Convert at step 314 place after the code of regulation, these results are outputed to file device 106.If can not explain that then this file just can not be understood with any statement of grammer.For all line of texts, program has all withdrawed from the situation that circulation comes to this in step 313, and this state determines in step 315.If this file can not be understood, carry out the refusal program in step 317.For example, the end product that file is understood is presented on the display 107, and utilizes keyboard 108 to make amendment by person-machine dialogue.
Fig. 4 is a process flow diagram, in the image processing of the described operator η in 3092 places, promptly is that to detect the level of white pixel in the PAD mode continuous in the presentation graphs 3.In Fig. 4, step 401 is main inlets of handling, and provides the standardization image Q that is stored in the storer 1052.In step 402, according to the processing of the regulation repeating step 403-409 of scan line j, so that at a long running A(j) in obtain the summation of black pixel.Step 403 is initialization steps.Step 404 is judged the pixel Q(ij in scan line) be 1 or 0, if 1, then the running length B to black pixel haggles at step 406 place.If be 0 Q(ij), then when value q that the running length of front pixel is determined greater than step 407, summation is handled and is finished in step 408, and at the step 409 summation length B that resets.After this circulation is finished, B is added to and A(j in step 410) on, because beginning from step 404 to carry out the circulation of these steps, the summation of locating at the pixel (i=I-1) of low order end does not realize.Owing to added the judgement of step 407, thus when having only long operation when black pixel just to A(j) sue for peace, make The noise little like this.
From the program of step 411 to 420 are so a kind of processing, and it is for inserting A(j greater than the zone of threshold value δ 2) in detect less than the situation in the zone of threshold value δ 1.Step 411 is Q-character F1, the initialization of F2.Step 412 is according to the program of the regulation repeating step 413-419 of scan line j.A(j when step 413 detects beginning) whether greater than threshold value δ 2, and in step 414 pair Q-character F1 set.Step 415 detects when beginning A(j under the F1=1 state) whether less than threshold in value δ 1, and, at the same time, current j is stored as j1 in step 416 pair Q-character F2 set.Step 417 detects A(j under the F2=1 state) greater than the point of threshold value δ 2, and the preceding value of j is stored as j2 in step 418, and program withdraws from 412 or the like and circulates.Step 420 is outlets, and white pixel j1 is provided, the position of j2 and Q-character F2 operation outside.The success that the F2=1 representative detects, and the service condition of white pixel is not found in the F2=0 representative in 412 circulations, mean to detect failure.
Below, second embodiment of the present invention will be described, though this embodiment is to realize by the block scheme the same with first embodiment, used file grammer is different.In other words, represent the parameter of actual amount with some and represent the operator that concerns between minor structure (such as
Figure 85106850_IMG41
, η, α, β,, γ, δ) link expression for example below.
(1,5)、η(3,10)……
In this case, η (3,10) is illustrated in vertical direction and has 3mm at least, arrives the gap of 10mm more.The process flow diagram that is used to control first embodiment of operator η (Fig. 4) becomes Fig. 5.In Fig. 5, the same with the program of the step 401 to 419 of Fig. 4 from the program of step 501 to 519.Step 520 judges that the operation of detected white pixel is in from 3 to 10 these scopes in step 512-519.Step 512 is identical with step 420.The statement of used file grammer is more complicated than the statement of the file grammer of first embodiment in second embodiment, but the advantage that it has is the false judgment that can more easily avoid in file is understood.This grammer is applicable to the fewer file of form change.
Secondly, the 3rd embodiment of the present invention will be described.Though this embodiment can realize that the flow process of enforcement control is different with first embodiment (Fig. 3), as shown in Figure 6 by the block scheme the same with first embodiment (Fig. 3).
Fig. 6 is a process flow diagram, is illustrated in the 3rd embodiment, understands flow process with the file that the PAD mode is carried out.At first, the statement of the file grammer being write in step 601 reads in storer 53 from the file device (not shown), realizes the initialization of total system in step 602.Step 603 is repetitive cycling, and it repeats following procedure up to end.First page of image at step 604 input file.And call the image processing subroutine in step 605.At this moment, specify in which district in the image with processed.The explanation parallel work-flow of image processing subroutine and statement will be described below, utilization multiprogramming or multiprocessor, from will directly extracting figure the processed image, table, character and other terminal symbol, and represent to represent those data of extraction with the definite address in the storer.By the way, when utilization during multiprocessor, the block scheme of Fig. 2 need add some condition, but because this block scheme can change at an easy rate, so these subsidiary condition thereby be omitted.
Step 606 is Control Circulation, and it is according to this one page of file grammar explanation.In step 607, check the result of image processing, and, search for the statement of the minor structure of describing it according to the result who extracts in step 608, because this processing and image processing executed in parallel, so must wait for finishing of image processing.
Step 609 realizes the initialization of the used stack of later process.Step 610 is used to handle line of text, and the treatment scheme of control from step 611 to step 615.Step 610 is the same with step 309 among Fig. 3.Whether step 612 detects operator and exists, and if there is no, then program will withdraw from from the circulation of 609 steps.If their exist,, and operator placed stack top then stack push-down.Whether step 613 detects this minor structure and exists.For remaining image, repeat the detection of minor structure, rather than detect this part image, but, so it has been omitted because it is basic identical with Fig. 3 by the image processing subroutine.The output of the end product that step 616(file is understood) with step 617 and Fig. 3 in identical.Though the 3rd embodiment is than the first embodiment complexity,, make it not analyze those relevant grammer parts, so can handle faster with this embodiment because the result of image processing has caused grammatical analysis.
Below, before describing the 4th embodiment of the present invention, we will explain parsing method.Fig. 7 represents that an example has one page of the technical papers of predetermined format.Though following description is at technical papers, because the form of grammer has some differences, so the present invention can be applied on other file by changing a part of file grammer.Therefore, the present invention is not confined to the example of this technical papers especially.
Be the grammer example (being called " file grammer " later on) of the structure of description document below.
(defform F
(form F 1(1090 1040))
(form F 2……)
(form F 3……))
(defform F 1
(form F 11(10 90 10 50))
(form F 12(10 90 60 90)))
(defform LINE-1(%1)
(Point? Y 1(mode IN Y LESS)
(Point? Y 2(mode out Y LESS)
(form %1(O?W?Y 19Y 2)))
We will explain grammer recited above with reference to the example of figure 7.First symbol " defform F ... " presentation format F is by form F 1With at form F 1The form F that following level is continuous 2And F 3Form, as shown in Figure 8.In Fig. 7, with F, the F of Fig. 8 1, F 2And F 3These part with dashed lines of position correspondence enclose.And then form F 1, 10,90,10,40 presentation format F in four numbers in parenthesis 1Regional location (when when being 100 * 100) corresponding to the whole region representation of form F.Here, the initial point of this coordinate system is in the upper left corner.The minimum value that to represent this regional numerical value be an X-coordinate, the maximal value of an X-coordinate, the maximal value of the minimum value of a Y coordinate and a Y coordinate.As in this embodiment, when parameter value is known, then can directly write out these values.Equally, with rectangular area descriptor format F 2And F 3
Subsequent symbol " defform F 1... " presentation format F 1By vertically disposed form F 11And F 12Become.In other words, form F 11In the zone of Y direction is from 10 to 50, and form F 12The zone be from 60 to 90.The utilization initial point is at form F 1The coordinate system in the upper left corner is come descriptor format F 11And F 12Regional location.Therefore, from form F, it is a relative coordinate system.
In said method, when describing form with the rectangular area, and when form hierarchically is described as one by one zone group, just can be with a general formal description image.Certainly, without layer representation, and be that the absolute coordinate system of reference also can be described, as shown in Figure 9 with form F.In this case, the rectangular area can represent with the same quadrat method of Fig. 8, as following mode:
(defform F
(form F11(18,82,13,25))
(form F12(18,82,28,38))
(form F2 …… ……)
(form F3 …… ……)
Subsequent symbol " deffmac LINE-1(%1) " or the like.It is the definition of macrostatement.Below the description of three line of texts as the main body of macrostatement definition, first row is form %1 above the expression rectangle region.
(point? Y1(mode IN Y LESS))
(point? Y2(mode OUT Y LESS))
(form %1(O? W? Y1? Y2))
Here, symbol? W? the vertical dimension of H presentation format (highly),? the lateral dimension of W presentation format (width).Symbol? Y1 and? Y2 is a variable, discerns by searching for, and will describe this point below.
The point search of certain condition is satisfied in symbol " point " expression, and it is updated to variable.Search condition is indicated by " mode "." INOUT " expression search point is a change point from the white pixel district to the black pixel district.The axle of " Y " expression search, " LESS " represents the direction of search.A district in symbol " area " the expression hunting zone.
We will be with reference to the searching method of Figure 10 illustration about the statement of macrostatement definition.
Symbol (A) expression line of text " exercise question ..., the author ... " be present in the form.(B) and (C) be illustrated in the coordinate figure of these line of texts of Y direction, just first and second row.First the row be present in from? does Y1 arrive? Y2, and second row be from? does Y3 arrive? Y4.As described above, (B) be macrostatement, the form that it has defined first row is %1; And (C) be a macrostatement, the form of definition second row is %1.The usage of these macrostatements is as follows.
(LINE-1F1)
(LINE-2F2)
In other words, the form of first row is F1, and the form of second row is F2." point " coordinate figure of indicating of second row by (B)? the search condition of Y1 is IN Y LESS.Therefore, the condition of search is: the change point from the white pixel district to the black pixel district, search axle are Y, and its direction is LESS, that is, search is that a smaller value from the Y coordinate begins to carry out.When search is a higher value from the Y coordinate when beginning to carry out, must indicate with GREATER.Upper limit coordinate figure? Y1 satisfies these conditions.Under above-mentioned search condition, can be in the third line of (B), with the lower limit coordinate figure of " point " sign? Y2 is described as the change point from the black pixel district to the white pixel district.In other words,? the search condition of Y2 is OUT Y LESS.
(C) that below explanation is shown second row in the bright form.Second row is first row and then.Therefore, the lower limit of first row? Y2 is searched to be arrived,? Y3 represents the zone represented with area within the hunting zone.In other words do you, the rectangular area of ferret out is described as O? W? Y2? H just can similarly search for from the lower limit of first row.
In file was understood, what relate to was the statement of being write with the file grammer, and whether sequential search wherein exists described rectangle region.When search includes the rectangular area of variable, can obtain the numerical value of variable, after this just replace variable with this numerical value.
Will be explained in the operation between the rectangle region below.In an actual file, the zone shows as various ways, rather than rectangle.Figure 21 3(B) shows the example that some region shapes are non-rectangles.(C) show the example that a zone has been divided into two.Figure 13 (B) can see closing of two rectangular areas Or separate, just as shown in phantom in FIG..If rectangular area of supposition is synthetic by two rectangular areas, then the description of (C) has just become simply.In order to carry out the operation between these rectangular areas, we are shifted with the void of following mode defined range.
(map δ form F
(space ?W ?H)
(position
((?XO ?YO)
(?Xmin ?Xmax ?Ymin
Ymax))
(……))
Figure 14 represents the meaning of this definition.Is term " Space " expression reset a width and is? is W highly? the rectangle region of H is as form F, and is displaced to this district.Term " positon " expression has moved on to the rectangle region top-left coordinates of terminal point.With four values (? Xmin Xmax Ymin Ymax) expression moves on to the rectangular area of terminal point, and these 4 numerical value are copied to displacement terminal point described above.
We will be shifted to void with reference to Figure 13 and describe more clearly.As analysis purpose, hypothesis is the actual format setting now, as shown in (A).This refers to " multicolumn " or " two hurdle ".Form F 1And F 2Spatially be in horizontal adjacent position, but from imagining that semantically they are to be in vertical adjacent position as shown in (B).
Can the operation between the rectangle region be expressed as follows.
(map δ form F
Space 50 60
(positioN((10 10)(10 40 10 40))
((10 40)(40 70 10 30))
(C) actual format shown in " Space " is provided with one wide 50 high 60 rectangle region.(B) and the relation (C) be expressed as follows.
(position ((10 10)(10 40 10 40))
((10 40)(40 70 10 30)))
Rectangle region in (B) (10 40 10 40) is displaced to the zone of (C) middle initial point in (10 10).
If above-mentioned void displacement is combined, then a zone with complicated shape as shown in figure 13 can be represented by the shifting function between at least two rectangular areas.For example, Figure 13 (B) can be expressed as the displacement of two rectangle regions that have different size and keep being adjacent to each other.
Can recognize that as the description that provides above the file grammer of Ti Chuing is shown the combination of rectangular area to the structural table of file in the present invention, and by syntactic representation the relation between the rectangular area.Therefore, can increase the ability represented of file, under the unclear situation of the not enough or specific rectangular area shape of the text line number in the zone, be that the object that is difficult to handle can have been described now with conventional method always.Therefore, can the very wide multiple file of analyst coverage.
Below, the 4th embodiment of the present invention will be described with reference to the drawings.
Except the processing difference of control module 102, this embodiment is implemented by the device shown in Fig. 2 block scheme with the same method of first embodiment.That we will suppose to write with above-mentioned file grammer, be stored in advance in the storer 1053 as the file statement of target.Control module 102 these statements of utilization carry out the file of standardization image and understand processing.Here, the meaning of term " file is understood processing " is that data are divided into many rectangle regions, and is classified in each zone.In the zone that obtains in the result who understand to handle as file, this part image as the presumptive area of searched targets is delivered to character recognition unit 6, to discern the pattern of inner character.Generally, the image of original has complicated shape, but because of being rectangle as the resulting zone of file understanding result, so according to known certain methods, can easily carry out Character segmentation and identification.As the resulting character code string of character identification result, or edit the retrieving information that character code string that this character code string obtains is this input file.So the input file retrieving information that obtains and the digital image of file are addressed in the file device 106.When the digital image of file is outputed to file device 106, can export respectively as unit with a plurality of rectangular areas that separate.
To describe this document below in detail and understand processing.Figure 14 and Figure 15 are process flow diagrams, are used for the control flow that supporting paper is understood.The flow process of this control is with PAD(program Analysis-Diagram procedure analysis chart) form writes out.Finish the profile of extraction document image in step 1100, and it is stored in the storer 1054.Profile extracts available known certain methods.A kind of D(is as the method for " extracted region links to each other ") can be used for replacing profile to extract.Each the profile i that extracts from step 1200 has extracted the maximal value and the minimum value of X and Y coordinates.
Xmim(i),Xmax(i),Ymin(l),Ymax(i)
The outermost layer rectangle of profile i can be determined according to these four numerical value.Step 1300,1400 and 1500 is respectively initialization, grammatical analysis main body and grammatical analysis are handled and are stopped judging.In step 1300, the statement that writes out and be stored in the storer 1053 with the file grammer is copied in the working storage 1055, and various tables and initialization of variable in the program.
The main body of grammatical analysis 1400 is formed to 1460 by 1410.Step 1410 is controlled, and program from 1420 to 1450 is repeated up to step 1460 finish the termination judgement.1420, extract a statement of being write with the file grammer.Term " unsolved statement " expression includes its value those line of texts of undetermined variable still, or also undetermined those line of texts of pairing file area.Make a decision in step 1430,, the program of step 1440 is skipped so that when not having unresolved statement.At this moment carrying out termination judges.If the statement of extracting out in step 1420 is a unresolved statement, the then program of execution in step 1440.This is to carry out the kind judgement of statement and the part of branch, and the content of Chu Liing changes with the kind of statement then.To Figure 14,15 etc. explanation only relate to " form statement " promptly below situation:
(form FO
(?Xmin ?Xmax ?Ymin ?Ymax)
(Shrinl:?X ?Y))
Yet as for some other statement, also execution belongs to the distinctive processing of these statements.
At Figure 15,1441-1448 is a part of handling predicate " form ".Step 1441 checks whether registered format label FO, if do not have, in step 1442 it is registered in the form shfft.Is variable name write in 1442 inspections step by step? Xmin, Xmax, Ymin, Ymax, X, the character string of Y position is variable or numeral, if they are variablees, checks whether they have registered.If they are not registration also, then they are registered to argument table, if these variablees have been registered, then check whether determined numerical value.If do not determine, then should " form " handle promptly come to an end (in this case, this statement is unresolved statement).If they are determined, then with the variable name in the above-mentioned numerical value replacement statement.For a definite example, when
?Xmin=0,?Xmax=90,
Ymin and? Ymax: not registration
?X=5 ?Y=5,
Then above-mentioned statement can be as follows for it:
(form FO
(0 90 ?Ymin ?Ymax)
(Shrink 5 5),
And variable? Ymin and? Ymax registers to argument table, and its value is indefinite.
In step 1443, whether all replaced according to the variable name in statement and to realize branch by numerical value, if all replace, " form " executive routine of performing step 1444 then.The detailed process that " form " carries out is represented by step 1445-1448.Step 1445 expression repeats following procedure to the profile i that step 1200 extracted.In step 1446, the minimum value of the X and Y coordinates of profile i and maximal value, that is,
Xmin(i), Xmax(i), Ymin(i), Ymax(i) with corresponding to the numerical value of variable in the statement, that is,
Xmin, Xmax, Ymin, Ymax, X, Y compares, and checks whether this profile satisfies relation of plane down:
?Xmin<Xmin(i)<Xmax(i)< Xmax
Ymin<Ymin(i)<Ymax(i)<?Ymax
X<Xmax(i)-Xmin(i)
Y<Ymax(i)-Ymin(i)
When above-mentioned condition satisfies, profile i is registered in the subscale of FO in step 1447.When not having the profile that satisfies above-mentioned condition, step 1448 is with the Q-character set of grammatical analysis failure.
As mentioned above, the program of step 1441 to 1448 can detect the structure that whether exists in the input imagery corresponding to statement " form ".This has also just caught correct statement, rather than " form " statement.Under the situation of " form ", there is not output data, but decides according to statement, a kind of statement is wherein arranged, the parameter substitution of gained when its variable is analyzed is used for its result other statement then.
Step 1450 check to be analyzed the failure Q-character, when analyzing failure, then recalls and rehears.At this moment, control, write as original state again by the variable that parameter replaced, and search for other possibility so that program is got back to settled statement.
The whether set of step 1460 check and analysis failures Q-character is perhaps being recalled and is being reheard with the whether set of post analysis failure Q-character, and the judgement that terminates.
Step 1500 is that the data of analysis result gained are passed to outside part.Be sent to outside data and comprise rectangle region coordinate on the file that is detected, or the like, this file is corresponding with the form label.
When failing with the statement analysis of analyzing failure Q-character set work sign, this failure just can not have been understood.At this moment, will carry out the refusal program.For example, the last or intermediate result that file is understood is shown on the display 108, and proofreaies and correct by man-machine conversation.
Below, with reference to the content of Figure 16 more particularly bright " form " execution.Figure 16 (A) is illustrated in noise in the image ( ) and character 1, A, 2, the situation that the B pattern exists.
Figure 16 (B) is illustrated in " form " statement when carrying out, and the situation of parameter is (form F(20 80 10 50)
(Shrink 0 0))
Figure 16 (C) is illustrated in " form " statement when carrying out, and the situation of parameter is (form F(20 80 10 50)
(Shrink 5 5))
As shown in the drawing, in (B) situation, noise and character I, A pattern have had registered in the list of elements of form F.In situation (C), though character I, A pattern have been registered, noise is registration not, by shrinkage mark it is eliminated.After " form " carries out,, can the rectangle region standardization of form F be resembled shown in the figure with character pattern in the zone.So the size in zone can be discerned neatly.
The system of selection of profile when now, we explain clearly that with reference to Figure 17 " form " carries out.Figure 17 (A) expression outermost layer rectangle, it is the result of the image processing that some profile stacks of step 1200 constitute among Figure 14.Reference number 5 expression noises, 1 to 4 is character patterns, and 6 to 8 be called " in-profile ".Figure 17 (B) represents their Xmin, Xmax, Ymin and Ymax.Whether they are included among the form F, are by whether satisfied relation of plane down judges that this relation is:
20<Xmin(i)<Xmax(i)<80
10<Ymin(i)<Ymax(i)<50
5<Xmin(i)-Xmin(i)
5<Ymax(i)-Ymin(i)
Under this situation, profile i=1 and 3 satisfies.Because 3 character pattern comprises 6 pattern, so it can be removed from form F.
As mentioned above, the present invention can automatically analyze the grammer of the file destination of being stored.Because enter ancillary information is unwanted or can reduces to few from the keyboard, so input can be simplified significantly.And, because resolving into minor structure and store these minor structures, a file of input replaces document image, so save file storing space, perhaps can realize utilizing advanced person's retrieval of minor structure.

Claims (1)

1, a kind of Image understanding system, comprise the device that input imagery is converted to digitized picture by electro-optic detector, it is characterized in that this system also comprises memory storage, be used to store the statement of writing according to a kind of grammer, this kind grammer is to describe image by the relativeness between a plurality of minor structures and described minor structure, wherein, with operator described minor structure and their relative association are described, wherein said operator represents that minor structure is separated by the operation of level or vertical white pixel, be that parameter by reality couples together, parser device is used for the statement that writes out according to described grammer respectively; Image searching device, whether be used to search for has specified minor structure of described analysis and their relativeness to exist in described digitized picture; Map architecture is understood device, is used for understanding according to described Search Results the structure of described input imagery.
CN85106850.2A 1985-09-11 1985-09-11 Image understanding system Expired CN1004386B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN85106850.2A CN1004386B (en) 1985-09-11 1985-09-11 Image understanding system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN85106850.2A CN1004386B (en) 1985-09-11 1985-09-11 Image understanding system

Publications (2)

Publication Number Publication Date
CN85106850A CN85106850A (en) 1987-08-05
CN1004386B true CN1004386B (en) 1989-05-31

Family

ID=4795319

Family Applications (1)

Application Number Title Priority Date Filing Date
CN85106850.2A Expired CN1004386B (en) 1985-09-11 1985-09-11 Image understanding system

Country Status (1)

Country Link
CN (1) CN1004386B (en)

Also Published As

Publication number Publication date
CN85106850A (en) 1987-08-05

Similar Documents

Publication Publication Date Title
WO2018034426A1 (en) Method for automatically correcting error in tagged corpus by using kernel pdr
WO2021141361A1 (en) Method for keyword extraction and electronic device implementing the same
CN1004330B (en) Liquid-liquid extraction process for separating rare-earth elements
Rapp Identifying word translations in non-parallel texts
EP0098168B1 (en) Address translation buffer control system
WO2017104934A1 (en) Device and method for common type conversion of plc control program
CN1004275B (en) Method for producing copolymer
CN1004386B (en) Image understanding system
Kogure Strategic lazy incremental copy graph unification
WO2019074185A1 (en) Electronic apparatus and control method thereof
KR920000099A (en) Electron Beam Exposure Equipment
Nilsson On the embedding of d= 4, N= 8 gauged supergravity in d= 11, N= 1 supergravity
EP0352377A1 (en) Word processing apparatus and method
CN1003196B (en) High-resolution font high-speed rotation method
Lepage et al. ALEPH: an EBMT system based on the preservation of proportional analogies between sentences across languages
Hatcher Spaces of incompressible surfaces
Inomata et al. General relativity as a limit of the de Sitter gauge theory
Beauquier An undecidable problem about rational sets and contour words of polyominoes
CN85102777B (en) Input method of chinese character font
CN85100275B (en) Character generator and controller shared by phototypesetter and printer
Stone et al. Structures of the affine families of switching functions
Novotný Reducing operators for generalized grammars
CN1004657B (en) Circuit arrangement capable of centralizing control of switching network
Fujiyoshi Epsilon-free grammars and lexicalized grammars that generate the class of the mildly context-sensitive languages
CN1005291B (en) Rear projection apparatus

Legal Events

Date Code Title Description
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C06 Publication
PB01 Publication
C13 Decision
GR02 Examined patent application
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CX01 Expiry of patent term