EP2093709A1 - Dokumentbild-merkmalwerterzeugungseinrichtung, dokumentbild-merkmalwerterzeugungsverfahren und dokumentbild-merkmalwerterzeugungsprogramm - Google Patents

Dokumentbild-merkmalwerterzeugungseinrichtung, dokumentbild-merkmalwerterzeugungsverfahren und dokumentbild-merkmalwerterzeugungsprogramm Download PDF

Info

Publication number
EP2093709A1
EP2093709A1 EP07832850A EP07832850A EP2093709A1 EP 2093709 A1 EP2093709 A1 EP 2093709A1 EP 07832850 A EP07832850 A EP 07832850A EP 07832850 A EP07832850 A EP 07832850A EP 2093709 A1 EP2093709 A1 EP 2093709A1
Authority
EP
European Patent Office
Prior art keywords
feature
feature value
feature point
integrating
points
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP07832850A
Other languages
English (en)
French (fr)
Other versions
EP2093709A4 (de
Inventor
Tatsuo Akiyama
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Publication of EP2093709A1 publication Critical patent/EP2093709A1/de
Publication of EP2093709A4 publication Critical patent/EP2093709A4/de
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5838Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/56Information retrieval; Database structures therefor; File system structures therefor of still image data having vectorial format
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/18Extraction of features or characteristics of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Definitions

  • the present invention relates to a document image feature value generating device, a document image feature value generating method, a document image feature value generating program, a document image feature value storage device, a document image feature value storage method, a document image feature value storage program, a document image feature value collating device, a document image feature value collating method, and a document image feature value collating program, and more particularly, to a document image feature value generating device of a document image retrieval feature value, a document image feature value generating method, a document image feature value generating program, retrieval using the feature value, a document image feature value storage device, a document image feature value storage method, a document image feature value storage program, a document image feature value collating device storing the feature value, a document image feature value collating method, and a document image feature value collating program in which a bad influence on an image retrieval or identification which is caused by a difference of a connected region appearance resulting from a difference of an imaging condition of a registered document image or a retrieved
  • Non Patent Literature 1 a device which brings a document or an image and computes a feature value of the document or the image is known.
  • An example of a system employing the device is disclosed in Non Patent Literature 1.
  • the system is a system which includes a document image retrieval feature value generating system R, and the document image retrieval feature value generating system R includes a registered image feature value computing device R1, a retrieved image feature value computing deice R2, and a registered image feature value storage device R3 as illustrated in FIG. 31 .
  • the registered image feature value computing device R1 includes a feature point extracting unit R11 and a registered image feature value computing unit R12.
  • the feature point extracting unit R11 includes a binary image generating section R111, a word region computing section R112, and a word centroid computing section R113.
  • the registered image feature value computing unit R12 includes a principal feature point setting section R121, a neighbor feature point computing section R122, a neighbor feature point selecting section R123, and an invariant computing section R124.
  • the invariant computing section R124 outputs a computation result from the registered feature value computing device R1 and inputs the computation result into the registered image feature value storage device R3.
  • the feature point extracting unit R11 extracts a feature point from the input registered image (which is referred to as feature point extraction processing) (step SR11).
  • step SR11 When the feature point is obtained through feature point extraction processing of step SR11, the procedure proceeds to step SR12, and the registered image feature value computing unit R12 ( FIG. 32 ) performs processing (which is referred to as registered image feature value extraction processing) for computing a feature value from the obtained feature point.
  • the registered image feature value computing device R1 performs feature point extraction processing and registered image feature value extraction processing to generate the feature value.
  • the binary image generating section R111 ( FIG. 32 ) performs adaptive binarization for the input registered image to generate a binary image (step SR111).
  • step SR112 the word region computing section R112 ( FIG. 32 ) applies a Gaussian filter to the generated binary image and then performs the binarization again to obtain a word region of the registered image (step SR112).
  • processing of step SR112 is referred to as word region labeling processing.
  • the word centroid computing section R113 ( FIG. 32 ) computes a centroid of the word region of the registered image (which is referred to as word centroid computation processing) which is used as the feature point (step SR113).
  • the registered image feature value computing unit R12 ( FIG. 32 ) performs feature value computation (which refers to registered image feature value extraction processing described above) using the feature point (step SR12 of FIG. 33 ).
  • the principal feature point setting section R12 sets a certain feature point among feature points computed by the feature point extracting unit R11 as a principal feature point (step SR1201).
  • the neighbor feature point computing section R122 selects N feature points (which are referred to as near feature points) around the principal feature point (step SR1202).
  • step SR1203 feature vectors of all combinations of M feature points which can be selected from the N near feature points are computed.
  • one combination of M near feature points is first selected from among combinations of M points by the neighbor feature point selecting section R123 ( FIG. 32 ) (step SR1203), and a feature value (vector) is computed by the invariant computing section R124 ( FIG. 32 ) (step SR1204).
  • the neighbor feature point selecting section R123 orders the selected M points in such a way that a certain principal feature point is located at a center, and M points are ordered counterclockwise using the certain principal feature point as an axis, starting from a feature point nearest to the principal feature point.
  • the invariant computing section R124 computes one feature vector from one of combinations of M points which are ordered.
  • the invariant computing section R124 stores and selects orders of all combinations of f points which can be selected from a combination of M points which are ordered, repetitively computes a predetermined kind of an invariant from combinations of f points whose orders are stored using a predetermined method to thereby compute a feature vector, and then stores the feature vector.
  • the number of times that computation is repeated is defined by Formula 1 M C f and one M C f -dimensional feature vector is generated by repeating Formula 1 M C f times.
  • FIG. 36 illustrates that M points are selected from among N points, and f points are selected from among M points.
  • FIG. 36 illustrates a case in which N is 7, M is 6, and f is 4 as a concrete example
  • FIG. 37 illustrates how combinations of M points selected from N points or combinations of f points selected from M points are used in registered image feature value computation to compute a feature vector (step SR1204).
  • the M C f (which corresponds to the number of all combinations of f points which can be selected from a combination of M points)-dimensional feature vector defined by Formula 1 is computed from N C M (which corresponds to the number of all combinations of M points which can be selected from a combination of N points).
  • one 15-dimentional feature vector is generated in such a way that orders of all combinations of 4 points which can be selected from a combination of 6 points whose are ordered are stored and selected, a predetermined kind of invariant is computed from combinations of 4 points whose orders are stored, and the computation is repeated 15 times (step SR1204 of FIG. 35 ).
  • the invariant computing section R124 computes a feature vector for one of combinations of M points whose are ordered by a neighbor feature point which is nearest from a certain principal feature point.
  • the invariant computing section R124 generates 7 15-dimentional feature vectors by repeating computation of a feature vector 7 times which is the number of times of N C M defined by Formula 1.
  • the invariant computing section R124 determines whether or not computation of a feature vector for all combination of M points which can be selected from among N points, i.e., all of N C M (for example, 7) combinations, is completed (step SR1205 of FIG. 35 ).
  • the procedure returns to step SR1203, so that neighbor feature points which are not selected yet are selected by the neighbor feature point selecting section R123, and then a feature vector which is not computed yet is computed.
  • the invariant computing section R124 proceeds to step SR1206 and determines whether or not computation of a feature vector for all feature points is completed.
  • step SR12 of FIG. 32 When computation of a feature vector for all feature points is completed, registered image feature value extraction processing (step SR12 of FIG. 32 ) is finished.
  • the procedure returns to step SR1201 and so sets a feature point which is not computed yet as a principal feature point and computes a feature vector which is not selected, so that all feature vectors are computed through registered image feature value extraction processing.
  • processing that is, registered image feature value extraction processing of FIG. 33 ) of the registered image feature value computing unit R12 is continuously performed until all feature vectors are computed using all feature points extracted by the feature point extracting unit R11 as a principal feature point.
  • Non Patent Literature 1 registration processing of a feature value is performed by hashing after a feature value vector is computed, but this is not related to a problem to be solved by the present invention and thus is not described here.
  • the invariant computing section R124 ( FIG. 32 ) computes a feature value vector in which a value of each geometric invariant is used as an element by computing geometric invariants computed from f feature points in a predetermined order with respect to all of f feature points which are selected from among M points.
  • a value of f depends on a used geometric invariant. Therefore, a kind of a geometric invariant used cannot but be determined when a conversion which is desired to be allowed between a registered image and a retrieved image is determined.
  • Non Patent Literature 1 since an affine invariant computed from arbitrary 4-point coordinates is used as a geometric invariant under the assumption that an affine conversion is allowed, f is 4 (four). Also, N and M need to be previously set to values which satisfy N ⁇ M ⁇ f, respectively.
  • FIG. 38 illustrates a configuration of the retrieved image feature value computing device R2.
  • the retrieved image feature value computing device R2 ( FIG. 38 ) includes a feature point extracting unit R21 and a retrieved image feature value computing unit R22.
  • the feature point extracting unit R21 ( FIG. 38 ) includes a binary image generating section R211, a word region computing section R212 and a word centroid computing section R213.
  • the feature point extracting unit R21 is identical to the feature point extracting section R11 ( FIG. 32 ) of the registered image feature value computing device R1.
  • the feature value computing unit R22 ( FIG. 38 ) includes a principal feature point setting section R221, a neighbor feature point computing section R222, a neighbor feature point selecting section R223, a cyclic computing section R224, and an invariant computing section R225.
  • the principal feature point setting section R221, the neighbor feature point computing section R222, the neighbor feature point selecting section R223, and the invariant computing section R225 are identical to the principal feature point setting section R121, the neighbor feature point computing section R122, the neighbor feature point selecting section R123, and the invariant computing section R124 of the registered image feature value computing device R1 ( FIG. 32 ), respectively.
  • the retrieved image feature value computing device R2 ( FIG. 38 ) further includes the cyclic computing section R224 compared to the registered image feature value computing device R1.
  • the feature point extracting unit R21 extracts a feature point from the input registered image (step SR21 of FIG. 39 ).
  • the feature point extracting unit R21 ( FIG. 38 ) extracts the feature point by the same operation as the feature point extracting unit R11 ( FIG. 32 ) of the registered image feature value computing device R1.
  • the feature point extracting unit R21 ( FIG. 38 ) performs the same operation as the feature point extracting unit R11 ( FIG. 32 ), and thus duplicated description will not be repeated.
  • the retrieved image feature value computing unit R22 ( FIG. 38 ) of the retrieved image feature value computing device R2 computes a feature value for document image retrieval (which is referred to as retrieved image feature value extraction processing) (step SR22).
  • An operation of the retrieved image feature value computing unit R22 ( FIG. 38 ) is almost identical to an operation of the registered image feature value computing unit R12 ( FIG. 32 ) as illustrated in the flowchart of FIG. 39 .
  • the retrieved image feature value computing device R2 is different from the registered image feature value computing device R1 in the fact that it obtains a cyclic permutation for an ordered combination obtained by the neighbor feature point selecting section R223 ( FIG. 38 ) in step SR2204 of FIG. 40 .
  • the cyclic permutation refers to a permutation in which, when a set of elements (e.g., ⁇ P1, P2, P3, P4 ⁇ ) which are ordered is given, an order of elements is changed in order to circulate an order of elements which is P1 ⁇ P2 ⁇ P3 ⁇ P4 ( ⁇ P1) (conceptually, is replaced with an order in which elements are shifted to the right n times according to P1 ⁇ P2 ⁇ P3 ⁇ P4 ( ⁇ P1), respectively) to thereby obtain an ordered set.
  • a set of elements e.g., ⁇ P1, P2, P3, P4 ⁇
  • an order of elements is changed in order to circulate an order of elements which is P1 ⁇ P2 ⁇ P3 ⁇ P4 ( ⁇ P1) (conceptually, is replaced with an order in which elements are shifted to the right n times according to P1 ⁇ P2 ⁇ P3 ⁇ P4 ( ⁇ P1), respectively) to thereby obtain an ordered set.
  • neighbor feature points correspond to elements ⁇ P1, P2, ... ⁇ .
  • step SR2204 a cyclic permutation is computed as described above, and, in step SR2205, feature vectors are obtained for all cyclic permutations related to all of ordered combinations of M points selected in step SR2203.
  • step SR2206 it is determined whether or not computation of feature vectors for all cyclic arrangements of M points is completed.
  • the procedure returns to step SR2204 and computes a feature vectors for a cyclic arrangement which is not computed yet.
  • step SR2207 determines whether or not computation of a feature vector for all combinations of M points which can be selected from among N points is completed.
  • the procedure returns to step SR2203 and computes a feature vector for a combination of M points which is not selected yet.
  • step SR2208 determines whether or not computation of a feature vector for all feature points is completed.
  • step SR2208 it is determined whether or not computation of a feature vector for all feature points is completed, and when computation of a feature vector for all feature points is not completed, the procedure returns to step SR2201, and sets a feature point which is not computed yet as a principal feature point and computes a feature vector.
  • N, M and f of the registered image feature value computing unit R12 are used as values of N, M and f.
  • the registered image feature value storage device R3 ( FIG. 31 ) is a device which stores feature values (vectors) computed in the registered image feature value computing device R1.
  • Non Patent Literatures 1 to 4 only a centroid of a connected component acquired from a binarized image is used as a feature point. Therefore, connected regions determined from a registered image and a retrieved image which have to correspond to each other contact in one image and are separated in the other image, so that feature points used for computation of a feature value are arranged at substantially different locations.
  • the present invention is devised to solve the above problem, it is an object of the present invention to provide a document image feature value generating device, a document image feature value generating method, a document image feature value generating program, a document image feature value storage device, a document image feature value storage method, a document image feature value storage program, a document image feature value collating device, a document image feature value collating method, and a document image feature value collating program in which feature values which correspond to each other can be obtained even though connected regions of interest of a registered image and a retrieved image contact in one image and are separated in the other image.
  • a document image feature value generating device which extracts feature points from an input image which is input and generates a feature value from the feature points, including: integrating feature point extracting means for determining connected regions from the input image which is input, computing centroids of the connected regions and the feature points, integrating at least some of the connected regions, and obtaining an integrating feature point from a centroid of an integrated connected region; and feature value generating means for setting a principal feature point from among the feature points obtained and generating a feature value of the integrated connected region from neighbor feature points which are arranged near to the principal feature point and the integrating feature point.
  • a document image feature value generating method of a document image feature value generating device which extracts feature points from an input image which is input and generates a feature value from the feature points, including: an integrating feature point extracting step of determining connected regions from the input image which is input, computing centroids of the connected regions and the feature points, integrating at least some of the connected regions, and obtaining an integrating feature point from a centroid of an integrated connected region; and a feature value generating step of setting a principal feature point from among the feature points obtained and generating a feature value of the integrated connected region from neighbor feature points which are arranged near to the principal feature point and the integrating feature point.
  • a document image feature value generating program of a document image feature value generating device which extracts feature points from an input image which is input and generates a feature value from the feature points, for causing a computer to perform: an integrating feature point extracting sequence of determining connected regions from the input image which is input, computing centroids of the connected regions and the feature points, integrating at least some of the connected regions, and obtaining an integrating feature point from a centroid of an integrated connected region; and a feature value generating sequence of setting a principal feature point from among the feature points obtained and generating a feature value of the integrated connected region from neighbor feature points which are arranged near to the principal feature point and the integrating feature point.
  • a feature value is generated using a centroid of an integrated connected region as a feature point, feature points of a registered image and a retrieved image for computation of feature values are almost identical in arrangement.
  • a document image feature value generating device a document image feature value generating method, a document image feature value generating program, a document image feature value storage device, a document image feature value storage method, a document image feature value storage program, a document image feature value collating device, a document image feature value collating method, and a document image feature value collating program can be realized.
  • a document image retrieval feature value generating system D includes a registered image feature value computing device 1, a retrieved image feature value computing device 2 and a registered image feature value storage device 3.
  • the registered image feature value computing device 1 includes a feature point extracting unit 11 and a registered image feature value computing unit 12.
  • the feature point extracting unit 11 includes a binary image generating section 111, a connected region computing section 112, a connected region centroid computing unit 113, and an integrating feature point computing section 114.
  • the binary image generating section 111 generates a binary image from a certain image which is input.
  • binarization is performed using a pixel having a pixel value of a predetermined range and a pixel having a pixel value of a range other than the predetermined range like binarization using a predetermined threshold value or adaptive binarization, that is, an existing binarization method is used.
  • an input when it is a color image, it may be converted into a gray scale image using an existing method, and then binarization for the gray scale image may be performed. Also, when an input image is a color image, a method for obtaining a color spatial distribution of a pixel value, clustering a color space and performing binarization into one or more clustering regions and the remaining regions may be used.
  • the connected region computing section 112 performs connected region labeling when the binary image generated by the binary image generating section 111 is input.
  • labeling is performed using an existing method such as run ⁇ analysis and boundary tracking which are described in Non Patent Literature 3.
  • Connected region labeling is not limited to run analysis or boundary tracking, and may be performed in such a way that, for the input binary image, the same label is imparted to a pixel which belongs to the same connected region, and a different label is imparted to a pixel which belongs to a different connected region.
  • an existing image processing method may be performed, and a region corresponding to a word region may be acquired.
  • the connected region centroid computing section 113 computes a centroid of a pixel set which belongs to the same label computed by the connected region computing section 112 and uses the centroid as a feature point.
  • the integrating feature point computing section 114 selects a plurality of regions to which different labels are imparted, and obtains a new feature point (hereinafter, referred to as an integrating feature point) which can be acquired by integrating a plurality of selection regions of connected regions to which different labels are imparted and a new pixel number from feature points (hereinafter, referred to as integrated feature points) computed from the connected regions and a pixel number of the connected regions.
  • an integrating feature point a new feature point
  • integrated feature points a new pixel number from feature points
  • a set of connected regions which are close to each other are selected.
  • a minimum value of a distance between centroids or a distance from a connected region belonging to a corresponding label to a connected region belonging to a different label as a distance criterion is used, and connected regions which are within a distance equal to or less than a predetermined value are determined as connected regions which have to be integrated.
  • the feature point extracting unit 11 includes the binary image generating section 111, the connected region computing section 112, the connected region centroid computing section 113, and the integrating feature point computing section 114 and thus has a function of extracting a feature point from an input image which is input.
  • the registered image feature value computing unit 12 includes a principal feature point setting section 121, a neighbor feature point computing section 122, a neighbor feature point selecting section 123, and an invariant computing section 124.
  • the principal feature point setting section 121 sets one feature point (hereinafter, referred to as a principal feature point) in computing a feature value.
  • the neighbor feature point computing section 122 computes and outputs a predetermined number (N) of feature points around the principal feature point.
  • the neighbor feature point selecting section 123 selects a predetermined number (M) of feature points from among neighboring feature points computed by the neighboring feature point computing section 122. At this time, an integrating feature point and a feature point from which an integrating feature point is generated are not simultaneously selected.
  • the selected feature points are ordered in a predetermined direction, that is, either clockwise or counterclockwise, starting from a feature point nearest to a principal feature point as if an imaginary half line which connects the principal feature point with the feature point nearest to the principal feature point rotates centering on the principal feature point.
  • the invariant computing section 124 generates a feature value (vector) from a feature point set selected by the neighbor feature point selecting section 123 by storing geometric invariants which become elements in a predetermined order.
  • each element of a feature value is a predetermined geometric invariant.
  • the geometric invariant there are a similarity invariant, an affine invariant and a perspective invariant as described in Non Patent Literature 2.
  • the feature vector is stored in the registered image feature value storage device 3.
  • the registered image feature value computing unit 12 includes the principal feature point setting section 121, the neighbor feature point computing section 122, the neighbor feature point selecting section 123, and the invariant computing section 124, and computes a feature value (vector) using feature points output from the feature point extracting unit 11 and a connected region pixel number and stores the feature value (vector) in the registered image feature value storage device 3.
  • the retrieved image feature value computing device 2 includes a feature point extracting section 21 and a retrieved image feature value computing unit 22.
  • the feature point extracting unit 21 includes a binary image generating section 211, a connected region generating section 212, a connection region centroid computing section 213, and an integrating feature point computing section 214.
  • the feature point extracting unit 21 of the retrieved image feature value computing device 2 has the same configuration as the feature point extracting unit 11 of the registered image feature value computing device 1, and thus duplicated description will not be repeated.
  • the retrieved image feature value computing unit 22 includes a principal feature point setting section 221, a neighbor feature point computing section 222, a neighbor feature point selecting section 223, a cyclic computing section 224, and an invariant computing section 225.
  • the principal feature point setting section 221, the neighbor feature point computing section 222, the neighbor feature point selecting section 223 of the retrieved image feature value computing unit 22 have the same configuration as the principal feature point setting section 121, the neighbor feature point computing section 122, and the neighbor feature point selecting section 223 of the registered image feature value computing unit 12 ( FIG. 2 ), respectively, and thus duplicated description will not be repeated.
  • the cyclic computing section 224 receives an ordered feature point set output from the neighbor feature point selecting section 223, and circulates and changes an order of a feature point set.
  • the invariant computing section 225 of the retrieved image feature value computing device 2 is identical to the invariant computing section 124 of the registered image feature value computing unit 12 ( FIG. 2 ), and thus duplicated description will not be repeated.
  • the retrieved image feature value computing unit 22 computes a feature value (vector) using feature points output from the feature point extracting unit 21 and a connected region pixel number.
  • the registered image feature value storage device 3 ( FIG. 1 ) is a device which stores a feature value (vector) generated by the registered image feature value computing device 1 as described above and a connected region pixel number.
  • the document image retrieval feature value generating system D performs feature point extraction processing through the feature point extracting unit 11 of the registered image feature value computing device 1 ( FIG. 2 ) (step 811).
  • the registered image feature value computing unit 12 ( FIG. 2 ) performs registered image feature value computation processing (step S12).
  • step S11 feature point extraction processing
  • step S12 registered image feature value computation processing
  • FIG. 5 is a flowchart illustrating feature point extraction processing of step S11.
  • the binary image generating section 111 of the registered image feature value computing device 1 ( FIG. 2 ) generates a binary image (step S111).
  • the connected region generating section 112 ( FIG. 2 ) performs connected region labeling (step S112).
  • labeling processing using, for example, 4-connection run analysis described in Non Patent Literature 3 is performed for the generated binary image.
  • the connected region centroid computing section 113 computes a centroid location and a pixel number of each connected region (step S113).
  • the integrating feature point computing section 114 obtains an integrating feature point from feature points which are in the neighborhood of each other and a pixel number of integrated connected regions (step S114).
  • a distance criterion used to determine the neighborhood an existing distance criterion such as a Euclidean distance and a city block distance between feature points may be used.
  • a method for allowing a distance between connected regions not to exceed a threshold value TH1 may be used.
  • a predetermined value may be used as TH1, or normalization in which line cutting is performed to obtain a baseline, and then a width of a connected component obtained by projecting a connected component onto the baseline is obtained, and TH1 is divided by a value of the width may be performed.
  • the integrating feature point computing section 114 obtains a distance between connected regions for a connected region labeling image (which is referred to inter-connected region distance computation processing) (step S1141).
  • an existing criterion such as a Euclidean distance and a city block distance is used as a criterion of the distance.
  • An inter-connected region distance is obtained by an inter-centroid distance. ⁇ 0138 ⁇
  • a minimum value of a distance criterion d for an arbitrary image of a connected region CC1 and an arbitrary pixel of a different connected region CC2 may be defined by Formula 2:
  • Dist ⁇ 1 min x i y i ⁇ CC ⁇ 1 , x j y j ⁇ CC ⁇ 2 ⁇ d x i y i x j y j
  • an integrated feature point is determined, a centroid of an integrating feature point is computed, and a sum of a pixel number of connected regions which are integrated is used as a pixel number corresponding to the integrating feature point (step S1142).
  • a method for integrating two connected regions there are, for example, a method for integrating a pair of arbitrary connected regions and a method for integrating a pair of connected regions of equal to or less than a distance D.
  • a method for integrating three or more arbitrary connected regions may be applied.
  • three connected regions may be integrated, for example, when a distance between a connected component 1 and a connected component 2 is equal to or less than the distance D and a distance between a connected component 2 and a connected component 3 is equal to or less than the distance D.
  • a of connected regions may be integrated.
  • a value of A may be previously determined.
  • the feature point extracting unit 11 of the registered image feature value computing device 1 can compute a feature point, an integrating feature point and a pixel number corresponding to each connected region through feature point extraction processing ( FIG. 4 ) of step S11.
  • step S12 of FIG. 4 When feature point extraction processing of step S11 of the flowchart of FIG. 4 through the feature point extracting unit 11 is completed, registered image feature value extraction processing of step S12 is performed through the registered image feature value computing unit 12 (step S12 of FIG. 4 ).
  • step S12 of FIG. 4 an operation of the registered image feature value computing unit 12 which performs registered image feature value extraction processing
  • the principal feature point setting section 121 ( FIG. 2 ) installed in the registered image feature value computing unit 12 sets a certain feature point as a principal feature point (step S1201).
  • the neighbor feature point computing section 122 determines whether or not the principal feature point is the integrating feature point (step S1202).
  • step 1210 and subsequent steps processing of step 1210 and subsequent steps is performed, while when the principal feature point which is set is not the integrating feature point, processing of step S1203 and subsequent steps is performed.
  • the procedure proceeds to step S1203, and the neighbor feature point computing section 122 selects N feature points which are near in distance from among feature points excluding the principal feature point and the integrating feature point (step S1203).
  • the neighbor feature point selecting section 123 obtains combinations of M feature points which are selected from among N feature points and selects one combination (step S1204). At this time, M feature points are appropriately ordered as described above.
  • the invariant computing section 124 computes invariants, which can be computed from f feature point coordinates, from M feature points which are ordered and stores elements in a predetermined order (step S1205).
  • step S1204 and step S1205 illustrated in FIG. 7 Processing of step S1204 and step S1205 illustrated in FIG. 7 is performed for all of combinations of M points which can be selected from among N points selected in step S1203.
  • step S1206 it is determined whether or not computation of a feature vector for all of combination of M points which can be selected from among N points is completed.
  • step S1204 when computation of a feature vector even for one of M neighbor feature points is not completed, the procedure returns to step S1204, and so neighbor feature points of a feature vector which is not computed yet are selected, and then computation of a feature vector is performed.
  • step S1207 it is determined whether or not all of feature point sets through which a certain integrating feature point is generated are included among N feature points selected in step S1203 (step S1207).
  • step S1208 when all of feature point sets through which a certain integrating feature point, is generated exist, feature value addition computation processing is performed (step S1208).
  • step S1209 when all of feature point sets through which a certain integrating feature point is generated are not included, the procedure proceeds to step S1209.
  • step S1208 Feature value addition computation processing of step S1208 will be described later with reference to a flowchart of FIG. 8 .
  • step S1209 it is determined whether or not computation of a feature value for all feature points is completed, and when computation of a feature value for all feature points is completed, registered image feature value extraction processing (step S12 of FIG. 4 ) is finished.
  • step S1209 when it is determined in step S1209 that computation of a feature value for all feature points is not completed yet, the procedure returns to step S1201, and so a feature vector is computed using a feature point which is not computed yet as a principal feature point.
  • step S1202 when it is determined in step S1202 that the principal feature point which is set is the integrating feature point, the procedure proceeds to step S1210, and the following operation is performed.
  • the neighbor feature point computing section 122 selects N feature points which are near in distance from among feature points excluding a principal feature point, an integrated feature point through which the principal feature point is generated, and the integrating feature point (step S1210).
  • the neighbor feature point selecting section 123 obtains combinations of M feature points which are selected from among N feature points and selects one combination (step S1211). At this time, M feature points are appropriately ordered as described above.
  • the invariant computing section 124 computes invariants, which can be computed from f feature point coordinates, from M feature points which are ordered and stores elements in a predetermined order (step S1212).
  • step S1211 and step S1212 are performed for all of combinations of M points which can be selected from among N points selected in step S1210.
  • step S1213 it is determined whether or not computation of a feature vector for all of combination of M points which can be selected from among N points is completed.
  • step S1211 when computation of a feature vector even for one of M neighbor feature points is not completed, the procedure returns to step S1211, and so neighbor feature points of a feature vector which is not computed yet are selected, and then computation of a feature vector is performed.
  • step S1214 it is determined whether or not all of feature point sets through which a certain integrating feature point is generated are included among N feature points selected (step S1214).
  • step S1215 when all of feature point sets through which a certain integrating feature point is generated exist, feature value addition computation processing is performed (step S1215).
  • step S1209 when all of feature point sets through which a certain integrating feature point is generated are not included, the procedure proceeds to step S1209.
  • Feature value addition computation processing (step S1215) is identical to that of step S1208 and will be described later. After feature value addition computation processing (step S1215) is finished, the procedure proceeds to step S1209.
  • step S1209 it is determined whether or not processing is performed using all feature points as a principal feature point.
  • the procedure returns to step S1201, while when processing of computing a feature vector has been performed using all feature points as a principal feature point, registered image feature value extraction processing (step S12 of FIG. 4 ) is finished.
  • step S1208 or S1215 A feature value addition computation processing sequence of step S1208 or S1215 described above will be described with reference to a flowchart of FIG. 8 .
  • the neighbor feature point selecting section 123 ( FIG. 2 ) of the registered image feature value computing unit 12 deletes integrated feature points through which a certain integrating feature point is generated from N neighbor feature points computed by the neighbor feature point computing section 122 and instead adds an integrating feature point computed from the integrated feature points (step S12081). At this time, a total number of neighbor feature points is denoted by N'.
  • step S12083 If N' ⁇ M, the procedure proceeds to step S12083, while if N' ⁇ M, the procedure returns to step S12081.
  • step S12083 the neighbor feature point selecting section 123 obtains one combination of M feature points which are selected from among N' feature points.
  • step S1204 similarly to step S1204 ( FIG. 7 ), appropriate ordering is performed.
  • the invariant computing section 124 computes invariants, which can be computed from f feature point coordinates, from M feature points which are ordered and stores elements in a predetermined order (step S12084).
  • step S12083 and step S12084 are repeated until processing for all combinations of M feature points which are selected from among N' feature points is performed.
  • step S12085 it is determined whether or not computation of a feature vector for all of combination of M points which can be selected from among N' points is completed.
  • the procedure returns to step S12083, and so neighbor feature points of a feature vector which is not computed yet are selected, and then computation of a feature vector which is not computed yet is performed.
  • step S12086 it is determined whether or not computation of a feature vector for all of integrated feature point sets which are included in neighbor feature points and through which a certain integrating feature point is generated is completed.
  • step S1208 or S1215 feature value addition computation processing of step S1208 or S1215 is repeated until feature value addition computation processing for an integrating feature point included among N' feature points computed by the neighbor feature point computing section 122 is performed, so that all feature vectors are computed.
  • step S1209 As described above, after invariants are repetitively computed for all of integrated feature point sets which are included among neighbor feature points and through which a certain integrating feature point is generated as described above and so repetitive processing is finished, the procedure proceeds to step S1209.
  • the document image retrieval feature value generating system D performs feature point extraction processing through the feature value extracting unit 21 ( FIG. 3 ) (step S21).
  • step S21 of FIG. 4 When feature point extraction processing (step S21 of FIG. 4 ) is finished, the retrieved image feature value computing unit 22 performs retrieved image feature value computation processing (step S22).
  • Step S22 Retrieved image feature value computation processing (step S22) of the retrieved image feature value computing unit 22 will be described in detail with reference to a flowchart of FIG. 10 .
  • the principal feature point setting section 221 ( FIG. 3 ) installed in the retrieved image feature value computing unit 22 sets a certain feature point as a principal feature point (step S2201).
  • the neighbor feature point computing section 222 determines whether or not the principal feature point is the integrating feature point (step S2202).
  • step 2212 and subsequent steps processing of step 2212 and subsequent steps is performed, while when the principal feature point which is set is not the integrating feature point, processing of step S2203 and subsequent steps is performed.
  • the procedure proceeds to step S2203, and the neighbor feature point computing section 222 selects N feature points which are near in distance from among feature points excluding the principal feature point and the integrating feature point (step S2203).
  • the neighbor feature point selecting section 223 ( FIG. 3 ) obtains combinations of M feature points which are selected from among N feature points and selects one combination (step S2204). At this time, M feature points are appropriately ordered as described above.
  • the neighbor feature point selecting section 223 computes one cyclic permutation from an ordered combination of M points (step S2205).
  • the invariant computing section 225 computes invariants, which can be computed from f feature point coordinates, from M feature points which are ordered and stores elements in a predetermined order (step S2206).
  • step S2205 and step S2206 Processing of step S2205 and step S2206 is performed for all of cyclic permutation combinations computed from M points.
  • step S2207 it is determined whether or not computation of a feature vector for all of cyclic arrangements of M points is completed.
  • step S2208 it is determined whether or not computation of a feature vector for all combinations of M points which can be selected from among N points is completed.
  • step S2204 when computation of a feature vector for all combinations of M points which can be selected from among N points is not completed, the procedure proceeds to step S2204, and so a feature vector for a combination of M points which is not selected yet is computed.
  • step S2209 it is determined whether or not all of feature point sets through which a certain integrating feature point is generated are included among N feature points selected in step S2203 (step S2209).
  • step S2210 when all of feature point sets through which a certain integrating feature point is generated exist, feature value addition computation processing is performed (step S2210).
  • step S2211 when all of feature point sets through which a certain integrating feature point is generated are not included, the procedure proceeds to step S2211.
  • step S2210 Feature value addition computation processing of step S2210 will be described later with reference to a flowchart of FIG. 11 .
  • step S2211 it is determined whether or not computation of a feature value for all feature points is completed, and when computation of a feature value for all feature points is completed, retrieved image feature value extraction processing (step S22) is finished.
  • step S2211 when it is determined in step S2211 that computation of a feature value for all feature points is not completed yet, the procedure returns to step S2201, and so a feature vector is computed using a feature point which is not computed yet as a principal feature point.
  • step S2202 when it is determined in step S2202 that the principal feature point which is set is the integrating feature point, the procedure proceeds to step S2212, and the following operation is performed.
  • the neighbor feature point computing section 222 ( FIG. 3 ) of the retrieved image feature value computing unit 22 selects N feature points which are near in distance from among feature points excluding a principal feature point, an integrated feature point through which the principal feature point is generated, and the integrating feature point (step S2212).
  • the neighbor feature point selecting section 223 ( FIG. 3 ) obtains combinations of M feature points which are selected from among N feature points and selects one combination (step S2213). At this time, M feature points are appropriately ordered as described above.
  • the neighbor feature point selecting section 223 ( FIG. 3 ) computes one cyclic permutation from an ordered combination of M points (step S2214).
  • the invariant computing section 224 computes invariants, which can be computed from f feature point coordinates, from M feature points which are ordered and stores elements in a predetermined order (step S2215).
  • step S2214 and step S2215 Processing of step S2214 and step S2215 is performed for all of cyclic permutation combinations computed from M points.
  • step S2216 it is determined whether or not computation of a feature vector for all cyclic arrangements of M points is completed.
  • step S2214 when computation of a feature vector for all cyclic arrangements of M points is not completed yet, the procedure proceeds to step S2214, and so a feature vector of a cyclic arrangement which is not computed yet is computed.
  • step S2217 it is determined whether or not computation of a feature vector for all combinations of M points which can be selected from among N points is completed.
  • step S2218 It is determined whether or not all of feature point sets through which a certain integrating feature point is generated are included among N feature points selected in step S2212 (step S2218).
  • step S2219 when all of feature point sets through which a certain integrating feature point is generated exist, feature value addition computation processing (step S2219) is performed (step S2219).
  • step S2211 when all of feature point sets through which a certain integrating feature point is generated are not included, the procedure proceeds to step S2211.
  • Feature value addition computation processing (step S2219) is identical to that of step S2210 and will be described later. After feature value addition computation processing is finished, the procedure proceeds to step S2211.
  • step S2211 it is determined whether or not processing is performed using all feature points as a principal feature point.
  • step S22 of FIG. 9 retrieved image feature value extraction processing
  • step S2210 or S2219 A feature value addition computation processing sequence of step S2210 or S2219 described above will be described with reference to a flowchart of FIG. 11 .
  • the neighbor feature point selecting section 223 ( FIG. 3 ) of the retrieved image feature value computing unit 22 deletes integrated feature points through which a certain integrating feature point is generated from N neighbor feature points computed by the neighbor feature point computing section 222 and instead adds an integrating feature point computed from the integrated feature points (step S22081). At this time, a total number of neighbor feature points is denoted by N'.
  • step S22083 If N' ⁇ M, the procedure proceeds to step S22083, while if N' ⁇ M, the procedure returns to step S22081.
  • step S22083 the neighbor feature point selecting section 223 obtains one combination of M feature points which are selected from among N' feature points.
  • step S2204 FIG. 10
  • appropriate ordering is performed.
  • the neighbor feature point selecting section 223 computes one cyclic permutation from an ordered combination of M points (step S22084).
  • the invariant computing section 225 ( FIG. 3 ) computes invariants, which can be computed from f feature point coordinates, from M feature points which are ordered and stores elements in a predetermined order (step S22085).
  • step S22084 and step S22085 Processing of step S22084 and step S22085 is performed for all cyclic permutations of combinations of M feature points which are ordered.
  • step S22086 it is determined whether or not computation of a feature vector for all of cyclic permutations is completed.
  • step S22087 when a feature vector for all of cyclic permutations is computed, the procedure proceeds to step S22087.
  • step S22087 it is determined whether or not computation of a feature vector for all combinations of M points which can be selected from among N' points is completed.
  • step S22083 to step S22087 Processing of from step S22083 to step S22087 is repeated until processing for all combination of M feature points which are selected from among N' feature points is performed.
  • step S22088 it is determined whether or not computation of a feature vector for all of integrated feature point sets which are included in neighbor feature points and through which a certain integrating feature point is generated is completed.
  • processing described above is repeated until computation of a feature vector for all of integrated feature point sets through which a certain integrating feature point is generated is completed.
  • step S2210 or S2219 feature value addition computation processing ( FIG. 10 ) of step S2210 or S2219 is repeated until feature value addition computation processing for an integrating feature point included among N feature points computed by the neighbor feature point computing section 222 is performed.
  • step S2211 As described above, after invariants are repetitively computed for all of integrated feature point sets which are included among neighbor feature points and through which a certain feature point is generated as described above and so repetitive processing is finished, the procedure proceeds to step S2211.
  • a feature point generated by simulating a contact of a connected component is generated by the integration feature point computing section 114 and the integrating feature point computing section 214, a feature point set is computed by the neighbor feature point computing section 122 and the neighbor feature point computing section 222 not to simultaneously select an integrating feature point and an integrated feature point through which the integrating feature point is generated, and a feature value is generated using the feature point set. Therefore, even though connected regions contact in one of a registered image and a retrieved image and are separated in the other image, feature values which correspond to each other can be acquired.
  • the retrieved image feature value generating unit 22 includes the cyclic computing section 224, but according to a second embodiment, the cyclic computing section 224 may not be installed as shown in FIG. 12 when only rotation is considered in an image which is input in advance.
  • the cyclic computing section 224 is not installed in retrieved image feature value computing unit 22 of the retrieved image feature value computing device 2, but a cyclic computing section 125 which is identical to the cyclic computing section 224 may be included in the registered image feature value generating unit 12 as shown in FIG. 13 .
  • This case corresponds to a configuration in which the registered image feature value generating device 1 and the retrieved image feature value generating device are switched with each other.
  • steps S2205, S2214 and S22084 Processing of steps S2205, S2214 and S22084 is not performed, and instead operations of steps S2205, S2214 and S22084 are performed after steps S1204, S1211 and S12083, respectively, to perform processing for all cyclic combinations for ordered combination.
  • the document image retrieval feature value generating system D is introduced as part of a document image retrieving system Z.
  • the document image retrieving system Z illustrated in FIG. 14 includes a scanner A, a scanner B, a computer C1, a computer C2, a memory device M1, and a memory device M2.
  • the registered image feature value generating device 1 is installed in the computer C1
  • the retrieved image feature value generating device 2 is installed in the computer C2.
  • the registered image feature value storage device 3 which stores a feature value generated by the registered image feature value generating device 1 is installed in the memory device M1.
  • a collating device 4 which collates a feature value stored in the registered image feature value storage device 3 and a feature value generated by the retrieved image feature value generating device 2 is installed in the computer C2.
  • a feature vector accordance number storage device 5 which stores the number of times that a feature value (vector) generated in each registered document is matched to a feature vector generated in a retrieved image is installed in the memory device M2.
  • An identifying device 6 which determines a registered image to be retrieved based on a feature value accordance number of each registered image is installed in the computer C2.
  • the memory device M1 and the memory device M2 may be identical to each other, the computer C1 and the computer C2 may be identical to each other, and the scanner A and the scanner B may be also identical to each other.
  • a distance between connected regions as a criterion for integrating feature points is a minimum value of a city block distance d of an arbitrary image of a connected region ⁇ and an arbitrary image of a different connected region ⁇ and is defined by Formula 2.
  • a distance value between a pair of two connected regions ⁇ and ⁇ is 2 as shown in FIG. 15 .
  • CC1 defined in Formula 2 is identical in meaning to ⁇ of a connected region
  • CC2 is identical in meaning to ⁇ of a connected region.
  • portions marked by O means that connected regions ⁇ and ⁇ contact each other when a distance value between a pair of connected regions ⁇ and ⁇ by Formula 2 is 2 and a distance value of Formula 2 is 1.
  • a distance between feature points is defined by Euclidean distance.
  • a coordinate system of an image a lower left portion is a starting point, a positive direction of an x coordinate is a right direction of an image, and a positive direction of a y coordinate is a top direction of an image.
  • the scanner A picks up images of documents DObj1 and DObj2, and registered feature values are computed as images PR1A and PR2A, and then the scanner B picks up an image of a document DObj1, and it is retrieved (identifies) which one of the images PR1A and PR2A in which a feature value of an image PR1B is registered as an image PR1B is described.
  • the registered image PR1A is a 256 gray scale image and is illustrated in FIG. 16 .
  • the feature point extracting unit 11 When the registered image PR1A is input to the registered image feature value generating device 1, the feature point extracting unit 11 performs feature point extraction processing (step S11 of FIG. 4 ). First, a binary image is generated by binary image generating processing of step S111 illustrated in FIG. 5 .
  • binarization is performed using a predetermined threshold value TH0.
  • step S112 of FIG. 5 labeling processing by 4-connection run analysis described in Non Patent Literature 3 is performed.
  • FIG. 17 A connected region labeling image LI1 obtained through labeling processing is illustrated in FIG. 17 .
  • circumscribed rectangles of a label region are overlapped as a result of labeling processing.
  • FIG. 17 it is expected in FIG. 17 that one connected region exists in each alphabet. However, since a character "G" of the registered image PR1A illustrated in FIG. 16 blurs (in this case, an image is dilutely taken), a connected region of "G" illustrated in FIG. 17 is divided into two.
  • a centroid of a connected region is obtained by connected region centroid/pixel number computation processing (step S113 of FIG. 5 ) and used as a feature point.
  • a feature point is stored as illustrated in FIG. 18 together with a pixel number of the connected region.
  • a distance Dist1 between connected regions is obtained by inter-connnected region distance computation processing (step S1141 of FIG. 6 ) of integrating feature point computation processing (step S114 of FIG. 5 ).
  • TH1 2
  • a distance is obtained by a method which can detect up to at least 2 as a distance value.
  • a distance between connected regions is obtained by scanning an overall surface of a label image using, for example, a filter of FIG. 19 which can measure up to 3 as a distance value.
  • a distance value 0 means that it is not an integration target.
  • connection region which satisfies a condition is stored again in association with feature point information ( FIG. 18 ) which is already stored as shown in FIG. 20 .
  • a connected region pixel number is stored in advance through connected region centroid/pixel number computation processing (step S113 of FIG. 5 ), (x i , y i ) and (x j , y j ) are used as two feature point coordinates when obtaining a centroid of an integrating feature point in integration processing (step S1142 of FIG. 6 ), and P i and P j are used as connected component pixel number corresponding to feature points i and j.
  • an integrated region may be obtained from an connected region labeling image L11 (which is, even though not shown, an image computed through connected region labeling processing of FIG. 5 ), and a centroid of the integrated region may be used.
  • FIG. 21 an integrating feature point of a connected region obtained as a result of performing feature point extraction processing (step S11 of FIG. 11 ) is illustrated in FIG. 21 .
  • FIG. 21 a pair of regions (for example, nos. 4 and 5) which are divided into two due to a blur are connected, and an integrating feature point (for example, no. 14) is computed.
  • Nos. 9 and 13 are connected, so that an integrating feature point no. 15 is computed, and nos. 10 and 11 are connected, so that an integrating feature point no. 16 is computed.
  • FIG. 22 an image in which a connected region number is imparted to the registered image PR1A is illustrated in FIG. 22 .
  • connected region nos. 9 and 15 are very close to each other.
  • the registered image feature value generating unit 12 performs registered image feature value extraction processing (step S12 of FIG. 4 ).
  • the principal feature point setting section 121 of the registered image feature value generating unit 12 sets one feature point as a principal feature point (step S1201).
  • a connected region no. 7 illustrated in FIG. 22 is used as a principal feature point.
  • step S1202 it is determined whether or not the principal feature point is an integrating feature point (step S1202).
  • an integration flag "1" is written, and thus it can be determined whether or not the principal feature point is an integrating feature point using an integration flag. Also, in this case, a determination result is "No", and thus procedure proceeds to step S1203.
  • feature point numbers are sorted in a direction that a half line which extends from no. 7 as an axis to pass through no. 8 moves clockwise, starting from a neighbor feature point no. 4 which is nearest to the principal feature point no. 7.
  • step S1205 a feature vector is computed using an affine invariant.
  • an affine invariant refers to an amount, determined by Formula 4 with respect to two-dimensional position vectors ⁇ 1 , ⁇ 2 , ⁇ 3 , and ⁇ 4 .
  • Formula 4 Affine invariant ⁇ 3 - ⁇ 1 ⁇ 4 - ⁇ 1 ⁇ 2 - ⁇ 1 ⁇ 3 - ⁇ 1
  • step S1204 and step S1205 Processing of step S1204 and step S1205 is performed for all combinations of f points which are selected from among an ordered combination of M points.
  • step S1206 it is determined whether or not computation of a feature vector for all combinations of M points which can be selected from among N points is completed.
  • step S1204 when computation of a feature vector even for one of M neighbor feature points is not completed yet, the procedure returns to step S1204, and so neighbor feature points of a feature vector which is not computed yet are selected, and then computation of a feature vector is performed.
  • ⁇ 1 uses a first feature point coordinate of an ordered combination
  • ⁇ 2 uses a second feature point coordinate
  • ⁇ 3 uses a third feature point coordinate
  • ⁇ 4 uses a fourth feature point coordinate.
  • step S1204 and step S1205 Processing of step S1204 and step S1205 is repeated, so that a 15-dimentioanl feature value vector is generated with respect to a principal feature point no. 7 as illustrated in FIG. 24 and stored in the registered image feature value storage device 3.
  • N C M 7 feature vectors are generated with respect to a principal feature point no. 8 through processing of from step S1204 to step S1206 and stored in the registered image feature value storage device 3 (see FIG. 24 ).
  • step S1207 it is determined whether or not all of feature point sets through which an integrating feature point is generated exist among neighbor feature points.
  • step S1208 in a distance between connected region nos. 4 and 5 and a distance between nos. 10 and 11 among connected regions which are imparted corresponding to N points selected in step S1203, a distance between connected regions is 2 ⁇ TH1, and so feature value addition computation processing ( FIG. 8 ) of step S1208 is performed.
  • connected region nos. 4 and 5 are removed from a combination of N points, and an integrating feature point no. 14 ( FIG. 22 ) generated by integrating nos. 4 and 5 is added (step S12081 of FIG. 8 ).
  • step S12082 of FIG. 8 determination of step S12082 of FIG. 8 is performed. In this case, since it is determined as “Yes”, processing of step S12083 is performed. Also, in the present example, since it is always "Yes", processing of step S12082 may be skipped.
  • step S12084 a feature vector is obtained in step S12084.
  • This step is identical in an operation to step S1205.
  • step S1201 to step S1209 Processing of from step S1201 to step S1209 is repeated until processing is performed by setting all of first to sixteenth feature points as a principal feature point.
  • PR1A is preferably output as a retrieval (identification) result.
  • a feature vector accordance number corresponding to each registered document which is stored in the feature vector accordance number storage device 5 is initialized to zero (0).
  • step S21 feature point extraction processing is performed through the feature point extracting unit 21 of the retrieved image feature value computing device 2. This processing is the same processing as feature point extraction processing (step S11) through the feature point extracting unit 11 of the registered image feature value computing device 1.
  • retrieved image feature value computation processing (step S22) performed through the retrieved image feature value computing unit 22 will be described.
  • Retrieved image feature value computation processing (step S22 of FIG. 9 ) is performed in cooperation with feature value collation processing performed through the feature value collating device 4 which is described later.
  • the principal feature point setting section 221 of the retrieved image feature value computing device 2 sets one feature point as a principal feature point (step S2201 of FIG. 10 ).
  • a feature point no. 6 is set as a principal feature point after feature values (vectors) are generated by setting feature points nos. 1 to 5 as principal feature points will be described.
  • step S2202 of FIG. 10 it is determined whether or not the principal feature point is the integrating feature point. Since the feature point no. 6 is not the integrating feature point ( FIG. 20 ), the procedure proceeds to step S2203.
  • 7 feature points no. 7, 4, 3, 2, 9, 10, and 8 are selected ( FIG. 28 ).
  • feature points nos. 7, 4, 3, 2, 9, and 10 are selected. This combination is ordered similarly to an ordering method by processing of step S1204.
  • step S2206 ( FIG. 10 ).
  • the feature value generating method is identical to the feature value generating method of step S1205.
  • step S2205 and step S2206 Since processing of step S2205 and step S2206 is performed for all of cyclic combinations, a feature value (vector) is computed even for a different ordered combination, for example, 4, 3, 2, 9, 10, and 7.
  • step S2206 whenever one feature value (vector) is computed, collation processing which will be described later is performed through the feature value collating device 4 ( FIG. 14 ).
  • step S41 of FIG. 29 a distance between a certain feature vector stored in the registered image feature value storage device 3 and a feature vector generated from a retrieved image is computed (step S41), and it is determined as matched when it does not exceed a predetermined threshold value TH1 (step S42).
  • An existing criterion which can compute a distance between two vectors may be used as a distance criterion.
  • an error square sum of two vectors is used as a distance, and 0.15 is used as a value of TH1.
  • a 0-th feature vector, which is not circulated, of an ordered combination for the principal feature point no. 6 of the retrieved image PR1b which is computed in step S2206 ( FIG. 10 ) is illustrated in FIG. 30 .
  • An error square sum with a feature vector no. 8 for a principal feature point no. 7 computed from the registered image PR1A is 0.14.
  • step S42 it is determined whether or not this value exceeds the threshold value TH1.
  • step S43 when it does not exceed the threshold value TH1, the procedure proceeds to step S43, and two feature vectors are regarded as matched to each other, so that a feature vector accordance number corresponding to a registered image (which is PR1A in this example) in which the matched feature vector is computed increases by one (step S43).
  • step S2207 FIG. 10
  • processing of the retrieved image feature value computing device 2 is subsequently performed.
  • collation processing is performed whenever a feature value (vector) is generated from a registered image, but the present embodiment is not limited to it, and identification (retrieval) result which is identical in result may be obtained in such a way that all feature vectors are computed from a retrieved image in advance and stored in an appropriate storage device, and collation processing for all combination of feature vectors generated from a retrieved image and feature vectors generated from a registered image is performed.
  • step S2209 ( FIG. 10 )
  • a determination result is "No"
  • the procedure proceeds to step S2211.
  • step S2211 it is determined whether or not processing for all of feature points is completed.
  • a determination result is "No"
  • processing is continued by setting a feature which is not set as a principal feature point yet as a principal feature point.
  • an accordance number for example, an accordance number related to the registered image PR1A, which is stored in the feature vector accordance number storage device 5 ( FIG. 14 ), becomes 6, while an accordance number related to the registered image PR2A becomes 1.
  • the identifying device 6 determines a registered image which is largest in feature vector accordance number as an identical image with reference to an accordance number stored in the feature vector accordance number storage device 5 ( FIG. 14 ).
  • the registered image PR1A is output.
  • an ID or a name which is specific to an image or a combination thereof may be output.
  • Results corresponding to J candidates which are highest in feature vector accordance number as well as a single identical image may be output.
  • an eighth feature vector for a principal feature point no. 7 computed from the retrieved image PR1A is a feature value (vector) which is generated by integrating feature points.
  • the feature value (vector) is a feature vector which cannot be generated by a conventional method
  • a feature value (vector) generated by integrating feature points according to the present example is effective in retrieving or identifying a document.
  • connected regions which are near to each other are integrated. Since separation or contact is regarded as a phenomenon occurring between connected regions which are near to each other, a method for integrating only connected regions which are near to each other can significantly reduce complexity required for integration, compared to a method for integrating all pairs of connected regions, and thus is effective.
  • a distance Dist1 between connected regions is used to determine whether to integrate feature points.
  • a contact of a connected component can be precisely simulated, and thus it is effective, compared to a case in which a distance between feature points is used as a criterion.
  • this method is not easily affected by a gradient of a document between a registered image and a retrieved image and also does not require gradient correction of a registered image and a retrieved image and thus is effective.
  • the present, invention can be applied to a document retrieving device which retrieves a document image taken by an imaging device using an image picked up from a stored document.
  • the present invention can be also applied to a device for retrieving or identifying an object using an identifier which is designed to identify a distinctive character string of an object surface, for example, an address region image of a mail, or an object independently.
  • This retrieving or identifying device can be applied to a system which tracks an object using images taken from various portions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Library & Information Science (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Processing Or Creating Images (AREA)
EP07832850.7A 2006-11-30 2007-11-30 Dokumentbild-merkmalwerterzeugungseinrichtung, dokumentbild-merkmalwerterzeugungsverfahren und dokumentbild-merkmalwerterzeugungsprogramm Withdrawn EP2093709A4 (de)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2006324095 2006-11-30
PCT/JP2007/073156 WO2008066152A1 (fr) 2006-11-30 2007-11-30 Dispositif, procédé et programme de génération de valeur caractéristique d'image de document

Publications (2)

Publication Number Publication Date
EP2093709A1 true EP2093709A1 (de) 2009-08-26
EP2093709A4 EP2093709A4 (de) 2013-04-24

Family

ID=39467938

Family Applications (1)

Application Number Title Priority Date Filing Date
EP07832850.7A Withdrawn EP2093709A4 (de) 2006-11-30 2007-11-30 Dokumentbild-merkmalwerterzeugungseinrichtung, dokumentbild-merkmalwerterzeugungsverfahren und dokumentbild-merkmalwerterzeugungsprogramm

Country Status (3)

Country Link
EP (1) EP2093709A4 (de)
JP (1) JP4957924B2 (de)
WO (1) WO2008066152A1 (de)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105590082A (zh) * 2014-10-22 2016-05-18 北京拓尔思信息技术股份有限公司 文档图像识别方法

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010053109A1 (ja) 2008-11-10 2010-05-14 日本電気株式会社 画像照合装置、画像照合方法および画像照合用プログラム
JP5958460B2 (ja) 2011-02-23 2016-08-02 日本電気株式会社 特徴点照合装置、特徴点照合方法、および特徴点照合プログラム
WO2014061221A1 (ja) * 2012-10-18 2014-04-24 日本電気株式会社 画像部分領域抽出装置、画像部分領域抽出方法および画像部分領域抽出用プログラム

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030128876A1 (en) * 2001-12-13 2003-07-10 Kabushiki Kaisha Toshiba Pattern recognition apparatus and method therefor
WO2006092957A1 (ja) * 2005-03-01 2006-09-08 Osaka Prefecture University Public Corporation 文書・画像検索方法とそのプログラム、文書・画像登録装置および検索装置

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10207985A (ja) * 1997-01-27 1998-08-07 Oki Electric Ind Co Ltd 文字切り出し方法および文字切り出し装置
JP4632860B2 (ja) 2005-05-18 2011-02-16 Necエナジーデバイス株式会社 二次電池及びその製造方法

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030128876A1 (en) * 2001-12-13 2003-07-10 Kabushiki Kaisha Toshiba Pattern recognition apparatus and method therefor
WO2006092957A1 (ja) * 2005-03-01 2006-09-08 Osaka Prefecture University Public Corporation 文書・画像検索方法とそのプログラム、文書・画像登録装置および検索装置

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
HUI FU ET AL: "Gaussian Mixture Modeling of Neighbor Characters for Multilingual Text Extraction in Images", IMAGE PROCESSING, 2006 IEEE INTERNATIONAL CONFERENCE ON, IEEE, PI, 1 October 2006 (2006-10-01), pages 3321-3324, XP031049388, ISBN: 978-1-4244-0480-3 *
HUI FU ET AL: "Maximum-Minimum Similarity Training for Text Extraction", 1 January 2006 (2006-01-01), NEURAL INFORMATION PROCESSING LECTURE NOTES IN COMPUTER SCIENCE;;LNCS, SPRINGER, BERLIN, DE, PAGE(S) 268 - 277, XP019046692, ISBN: 978-3-540-46484-6 * the whole document * *
See also references of WO2008066152A1 *
TOMOHIRO NAKAI ET AL: "Use of Affine Invariants in Locally Likely Arrangement Hashing for Camera-Based Document Image Retrieval", 1 January 2006 (2006-01-01), DOCUMENT ANALYSIS SYSTEMS VII LECTURE NOTES IN COMPUTER SCIENCE;;LNCS, SPRINGER, BERLIN, DE, PAGE(S) 541 - 552, XP019028003, ISBN: 978-3-540-32140-8 * the whole document * *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105590082A (zh) * 2014-10-22 2016-05-18 北京拓尔思信息技术股份有限公司 文档图像识别方法
CN105590082B (zh) * 2014-10-22 2019-02-22 北京拓尔思信息技术股份有限公司 文档图像识别方法

Also Published As

Publication number Publication date
WO2008066152A1 (fr) 2008-06-05
JP4957924B2 (ja) 2012-06-20
EP2093709A4 (de) 2013-04-24
JPWO2008066152A1 (ja) 2010-03-11

Similar Documents

Publication Publication Date Title
CN102667810B (zh) 数字图像中的面部识别
US8687886B2 (en) Method and apparatus for document image indexing and retrieval using multi-level document image structure and local features
Türkyılmaz et al. License plate recognition system using artificial neural networks
JP2005182730A (ja) ドキュメントの自動分離
US5359671A (en) Character-recognition systems and methods with means to measure endpoint features in character bit-maps
CN112613502A (zh) 文字识别方法及装置、存储介质、计算机设备
Fabrizio et al. Textcatcher: a method to detect curved and challenging text in natural scenes
Lepsøy et al. Statistical modelling of outliers for fast visual search
JP2006338313A (ja) 類似画像検索方法,類似画像検索システム,類似画像検索プログラム及び記録媒体
Srivastava et al. Image classification using SURF and bag of LBP features constructed by clustering with fixed centers
CN106203539A (zh) 识别集装箱箱号的方法和装置
Pal et al. Grey relational analysis based keypoints selection in bag-of-features for histopathological image classification
EP2093709A1 (de) Dokumentbild-merkmalwerterzeugungseinrichtung, dokumentbild-merkmalwerterzeugungsverfahren und dokumentbild-merkmalwerterzeugungsprogramm
Sree et al. An evolutionary computing approach to solve object identification problem in image processing applications
Subramanian et al. Content‐Based Image Retrieval Using Colour, Gray, Advanced Texture, Shape Features, and Random Forest Classifier with Optimized Particle Swarm Optimization
CN104966109A (zh) 医疗化验单图像分类方法及装置
Le et al. Document retrieval based on logo spotting using key-point matching
CN114495139A (zh) 一种基于图像的作业查重系统及方法
Huang et al. Chart image classification using multiple-instance learning
Xu et al. Robust seed localization and growing with deep convolutional features for scene text detection
US6052483A (en) Methods and apparatus for classification of images using distribution maps
CN105224619B (zh) 一种适用于视频/图像局部特征的空间关系匹配方法及系统
WO2020039260A2 (en) Systems and methods for segmentation of report corpus using visual signatures
CN115984588A (zh) 图像背景相似度分析方法、装置、电子设备及存储介质
CN112132150B (zh) 文本串识别方法、装置及电子设备

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20090630

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC MT NL PL PT RO SE SI SK TR

DAX Request for extension of the european patent (deleted)
A4 Supplementary search report drawn up and despatched

Effective date: 20130321

RIC1 Information provided on ipc code assigned before grant

Ipc: G06K 9/00 20060101AFI20130315BHEP

Ipc: G06K 9/46 20060101ALI20130315BHEP

Ipc: G06F 17/30 20060101ALI20130315BHEP

17Q First examination report despatched

Effective date: 20130409

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN

18W Application withdrawn

Effective date: 20140602