WO2017179728A1 - Image recognition device, image recognition method, and image recognition program - Google Patents

Image recognition device, image recognition method, and image recognition program Download PDF

Info

Publication number
WO2017179728A1
WO2017179728A1 PCT/JP2017/015390 JP2017015390W WO2017179728A1 WO 2017179728 A1 WO2017179728 A1 WO 2017179728A1 JP 2017015390 W JP2017015390 W JP 2017015390W WO 2017179728 A1 WO2017179728 A1 WO 2017179728A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
image recognition
actual size
feature
feature point
Prior art date
Application number
PCT/JP2017/015390
Other languages
French (fr)
Japanese (ja)
Inventor
郁子 椿
Original Assignee
シャープ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by シャープ株式会社 filed Critical シャープ株式会社
Publication of WO2017179728A1 publication Critical patent/WO2017179728A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis

Definitions

  • the following disclosure relates to an image recognition apparatus, an image processing method, and an image recognition program that perform image recognition using a local feature amount of an image.
  • Image recognition is a technique for identifying what an object is included in an image.
  • general object recognition is a technique for identifying a category of an object
  • specific object recognition is a technique for searching for the same object from an image database.
  • Patent Document 1 discloses an image recognition method in which a feature vector is extracted from an image of an object, the object is expressed by a large number of feature vectors, and an object having a matching feature is searched from an image database. .
  • One aspect of the present invention has been made in view of the above circumstances, and an object thereof is to provide an image recognition apparatus, an image recognition method, and an image recognition program that are unlikely to cause erroneous recognition in image recognition. is there.
  • an image recognition apparatus is an image recognition apparatus that identifies an object included in a query image, and is a standard image determined in advance from the query image.
  • a special object having a known size is detected, a detection unit that detects information indicating an actual size of a region corresponding to the special object in the query image, and a feature point of the query image is extracted from the query image.
  • An image recognition method is an image recognition method for identifying an object included in a query image, and a special standard size known in advance is known from the query image.
  • a detection step of detecting an object and detecting information indicating an actual size of a region corresponding to the special object in the query image; an extraction step of extracting a feature point of the query image from the query image; and the query image And an image recognition step for identifying an object included in the query image on the basis of information indicating the actual size of the region corresponding to the special object and a feature point of the query image.
  • FIG. 1 is a block diagram illustrating a schematic configuration example of an image recognition device according to a first embodiment of the present invention. It is a figure which shows the image containing a face image. It is a figure for demonstrating the feature point actual size size. It is a flowchart for demonstrating the flow of the process in the image recognition apparatus which concerns on 1st embodiment of this invention. It is a flowchart for demonstrating the flow of the process in the image recognition apparatus which concerns on 1st embodiment of this invention. It is a block diagram which shows the example of schematic structure of the image recognition apparatus which concerns on 2nd embodiment of this invention. It is a flowchart for demonstrating the flow of a process in the image recognition apparatus which concerns on 2nd embodiment of this invention. It is a flowchart for demonstrating the flow of a process in the image recognition apparatus which concerns on 2nd embodiment of this invention.
  • FIG. 1 is a block diagram showing a schematic configuration example of an image recognition apparatus 10 according to the first embodiment of the present invention.
  • an image recognition apparatus 10 includes a learning image input unit 11 that inputs a learning image including an image of an object to be compared, a special object detection unit 12 that performs object detection related to a special object, A feature amount extraction unit 13 that extracts a feature amount from an image, an actual size calculation unit 14 that calculates a feature point actual size, a storage unit 15 that stores a database, and a query image including an image of an object to be recognized are input.
  • a query image input unit 16 a feature amount matching unit 17 that matches feature amounts, an object ID selection unit 18 that selects an object ID using a matching result, and an output unit 19 that outputs a selection result of the object ID.
  • the actual size calculation unit 14, the feature amount matching unit 17, and the object ID selection unit 18 may be referred to as an image recognition unit.
  • the image recognition apparatus 10 performs object recognition by performing image recognition method processing in two phases, a learning phase and an identification phase.
  • the image recognition apparatus 10 performs a process of inputting a plurality of learning images and creating a database.
  • the identification phase the image recognition device 10 inputs a query image and performs a process of identifying an image similar to the query image from the learning image input in the learning phase.
  • the identification phase will be described.
  • the learning image input unit 11 captures a learning image from the outside, and outputs the captured learning image to the special object detection unit 12 and the feature amount extraction unit 13.
  • Each learning image is given an object ID in advance, and the learning image input unit 11 also captures the object ID from the outside and outputs it to the feature amount extraction unit 13.
  • the learning image input unit 11 may capture an image of an image file recorded in the recording device, or may capture an image acquired through a network.
  • the image captured by the learning image input unit 11 may be a still image or a moving image.
  • the learning image input unit 11 decomposes the moving image into frame images, and sequentially outputs the decomposed frame images to the special object detection unit 12 and the feature amount extraction unit 13.
  • the special object detection unit 12 first performs a special object detection process on the image received from the learning image input unit 11.
  • the special object is an object having an image recognition method for recognizing the object, such as a human face or a car license plate, and an object whose actual size is roughly determined.
  • the special object detection process is a process of detecting whether or not a special object designated in advance is included in the image and detecting information indicating the actual size of the detected special object region.
  • the special object detection unit 12 detects whether a face is included in the image as the special object detection process, and detects the face area size as the special object area size.
  • FIG. 2 shows an image including a face image which is an example of a special object.
  • the image A1 shown in FIG. 2 includes an image of the subject's face A2.
  • the face area size is a value indicating the size of the area of the face A2 in the entire image A1, and for example, the width of the area A3 in which the area of the face A2 in FIG. It is assumed that the number of pixels is A4) in FIG.
  • the face area size is set to the number of pixels of the width A4 of the area A3. Note that such face detection processing can be performed using, for example, the Viola-Jones method.
  • the special object detection unit 12 calculates the special object resolution using the detected special object region size.
  • the special object resolution is a value indicating how many mm of an area of the surface of the special object each pixel represents in the special object area, and is calculated by dividing the actual size of the special object by the special object area size. .
  • the special object detection unit 12 outputs the obtained special object resolution and the center coordinates of the special object region to the actual size calculation unit 14 as information indicating the actual size of the special object region.
  • the special object detection unit 12 calculates the special object resolution for each special object region, and the obtained special object resolution and the center of the special object region are calculated. The coordinates are output to the actual size calculation unit 14.
  • the special object detection unit 12 outputs a signal indicating that no special object exists to the actual size calculation unit 14.
  • the feature amount extraction unit 13 extracts a local feature amount of each learning image received from the learning image input unit 11.
  • SIFT Scale-Invariant Feature Transform
  • the feature quantity extraction unit 13 detects feature points using SIFT and obtains the coordinates of each feature point.
  • the feature amount extraction unit 13 calculates the scale of feature points at the same time.
  • the scale is a value indicating at which resolution the feature point is obtained.
  • the feature quantity extraction unit 13 calculates a SIFT feature quantity for each obtained feature point.
  • the SIFT feature amount is a 128-dimensional vector obtained from the luminance gradient in the peripheral area of the feature point.
  • the feature quantity extraction unit 13 outputs the SIFT feature quantity, coordinates, and scale of each feature point to the actual size calculation unit 14.
  • the feature quantity extraction unit 13 also outputs the object ID received from the learning image input unit 11 to the actual size calculation unit 14 together.
  • the actual size calculation unit 14 calculates the feature point actual size size of each feature point using the special object resolution received from the special object detection unit 12 and the scale received from the feature amount extraction unit 13.
  • the feature point actual size size is a value indicating the size of the texture characterizing the feature point on the actual subject.
  • the feature point actual size is a value obtained by multiplying the number of pixels on the actual subject by the line number connecting the center point of the special object region to each feature point and the actual object resolution.
  • the number of pixels on the actual subject other than the center point of the special object area and the line segment connecting any point in the special object area to each feature point may be used as the actual feature point size.
  • FIG. 3 is a diagram for explaining the actual size of feature points.
  • the center point of the special object area B1 is a point B3.
  • the feature point B2 can be detected using SIFT for each pixel included in the special object region B1.
  • the size of the segment B4 connecting the point B3 to the detected feature point B2 is the actual feature point size of the feature point B2.
  • the face area size is the number of pixels having the width A4 of the area A3
  • the size of the line segment B4 is also calculated as the horizontal size.
  • the actual size of the feature points can be obtained in the same manner by considering the horizontal direction in the vertical direction. Furthermore, even when the face region size is a two-dimensional size in the horizontal and vertical directions, the actual feature point size can be obtained in the same manner by calculating the vertical and horizontal sizes.
  • the image recognition apparatus 10 detects a special object whose actual size can be estimated even when the actual size of the texture is not known, and uses the size to make a feature The point size can be estimated.
  • the feature amount extraction unit 13 obtains feature points by SIFT.
  • the feature point by SIFT has the concept of scale, and in order to consider this, the actual size calculation unit 14 obtains the final feature point actual size by multiplying the actual feature size obtained above by the scale. Calculate as
  • the actual size calculation unit 14 compares the coordinates of the feature points received from the feature amount extraction unit 13 with the center coordinates of each special object region received from the special object detection unit 12, and the special object region with the closest distance. Is used to calculate the actual size of the feature point.
  • the actual size calculation unit 14 sets the actual size of the feature point to zero.
  • the actual size calculation unit 14 outputs the feature amount of each feature point and the feature point actual size to the storage unit 15.
  • the actual size calculation unit 14 also outputs the object ID received from the feature amount extraction unit 13 to the storage unit 15.
  • the storage unit 15 stores a database.
  • the database is composed of fields (columns) that hold feature point IDs, object IDs, feature amounts, and feature point actual size sizes, and one record (row) is data related to one feature point.
  • the storage unit 15 receives the feature amount, the feature point actual size size, and the object ID related to one feature point from the actual size calculation unit 14, the storage unit 15 creates a feature point ID and stores it in the feature point ID field of the storage unit 15.
  • the feature point ID is an arbitrary value that varies depending on the feature point.
  • the feature amount, the actual feature point size, the object ID, and the feature point ID are stored in each field of the storage unit 15. Note that object IDs of a plurality of feature points obtained from one learning image have the same value.
  • step S1 the learning image input unit 11 inputs a learning image and an object ID.
  • step S2 the special object detection unit 12 detects a special object.
  • step S3 the feature quantity extraction unit 13 detects feature points and extracts feature quantities of the feature points.
  • step S4 the actual size calculation unit 14 calculates the feature point actual size of each feature point, and combines the feature point actual size, the feature point ID, the object ID, and the feature amount together with the storage unit 15.
  • the image recognition apparatus 10 performs this process on each learning image.
  • the query image input unit 16 captures a query image from the outside and outputs it to the special object detection unit 12.
  • the query image input unit 16 may capture an image file recorded in the recording device as a query image, or may capture an image acquired through a network as a query image.
  • the special object detection unit 12 performs the same processing as in the learning phase on the image received from the query image input unit 16 and outputs the special object resolution to the actual size calculation unit 14.
  • the special object detection unit 12 calculates the special object resolution for each special object area, and the obtained special object resolution is calculated using the special object area.
  • the actual size is output to the actual size calculator 14 as information indicating the actual size.
  • the special object detection unit 12 outputs a signal indicating that no special object exists to the actual size calculation unit 14.
  • the feature amount extraction unit 13 performs the same processing as that in the learning phase on the query image to calculate the feature amount. Then, the feature amount extraction unit 13 outputs the feature amount, coordinates, and scale of each feature point to the actual size calculation unit 14.
  • the actual size calculation unit 14 performs the same processing as in the learning phase using the special object resolution received from the special object detection unit 12 and the scale received from the feature amount extraction unit 13, and calculates the actual feature size size. To do. Then, the actual size calculation unit 14 outputs the feature amount of each feature point and the feature point actual size size to the feature amount matching unit 17.
  • the feature amount collation unit 17 collates the feature amount of each feature point of the query image received from the actual size calculation unit 14 with the feature amount in the database of the storage unit 15. This feature amount is a feature amount of a feature point detected from each learning image.
  • the verification is performed according to the following procedure.
  • the feature amount matching unit 17 compares the feature point actual size size between the feature point of the query image and all feature points in the database, and the absolute value of the difference between the feature point actual size sizes is within a predetermined threshold value. Are extracted from the database as candidate feature points.
  • the feature point actual size of the feature point of the query image is 0, that is, when the special object detection unit 12 does not detect the special object, the feature amount matching unit 17 performs all of the features in the database.
  • the object ID selection unit 18 selects the object ID having the maximum number of votes from the number of votes for each object ID received from the feature amount matching unit 17 and outputs it to the output unit 19 as a recognition result.
  • the output unit 19 outputs the object ID received from the object ID selection unit 18 to the outside of the image recognition device 10.
  • step S21 the query image input unit 16 inputs a query image.
  • step S22 the special object detection unit 12 detects a special object.
  • step S23 the feature amount extraction unit 13 detects feature points and extracts feature amounts of the feature points.
  • step S24 the actual size calculation unit 14 calculates the actual feature size of each feature point.
  • the feature amount matching unit 17 extracts candidate feature points from the storage unit 15 for each feature point of the query image.
  • the feature amount matching unit 17 obtains a feature point having the smallest distance between the feature amounts from the candidate feature points for each feature point of the query image, and the object ID of the feature point.
  • step S27 the object ID selection unit 18 selects the object ID having the maximum number of votes in step S26 as a recognition result.
  • step S28 the output unit 19 outputs the recognition result.
  • the feature point actual size size is not taken into consideration, so the feature point having the smallest distance is selected from all the feature points in the database. Therefore, a feature point having a completely different feature amount size on the actual subject may be selected and voted for an object ID different from the object ID to be voted.
  • candidate feature points are extracted based on the actual feature point actual size, and the feature point having the smallest distance is selected from the candidate feature points. There is an effect that it becomes difficult to select completely different feature points. As a result, there is an effect that it is difficult for the object ID selection unit 18 to select an incorrect object ID.
  • the feature quantity extraction unit 13 employs a method of determining feature point coordinates by detecting feature points.
  • the present invention is not limited to this, and the feature point coordinates may be determined by other methods.
  • the feature quantity extraction unit 13 may pre-arrange the coordinates of the feature points at grid-like positions and extract the feature quantities at the coordinates.
  • the SIFT feature amount is used as the feature amount.
  • the present invention is not limited to this, and other local feature amounts such as SURF and HOG may be used.
  • the feature amount of each feature point is stored in the storage unit 15 as it is.
  • the present invention is not limited to this, and vector quantized data is recorded as a feature amount using the Bag-of-Features method. May be.
  • clustering is performed by incorporating the actual feature point size together with the feature amount in the Bag-of-Features clustering process.
  • the feature amount matching unit 17 calculates the candidate feature point that minimizes the distance between the feature amounts between the feature points of the query image from the candidate feature points.
  • the candidate feature point whose distance is approximately the minimum may be obtained.
  • the feature amount matching unit 17 can perform an approximate nearest neighbor search by using an ANN (Appearance Nearest Neighbor). Thereby, calculation time can be shortened.
  • ANN Appearance Nearest Neighbor
  • the output unit 19 outputs only the recognition result.
  • the feature point actual size may be output together with the recognition result. This may output the feature point actual size size of all the feature points of the query image, or may output a statistically obtained value such as an average value using the feature point actual size size.
  • erroneous recognition in image recognition can be reduced by considering the size of the feature amount on the actual subject.
  • FIG. 6 is a block diagram illustrating a schematic configuration example of the image recognition apparatus 30 according to the second embodiment of the present invention.
  • the image recognition apparatus 30 according to the present embodiment includes a learning image input unit 11, a special object detection unit 12, a depth structure estimation unit 31 that calculates the composition of an image, and a feature amount extraction unit 13.
  • An actual size calculation unit 32 for calculating the actual size of the feature point a storage unit 15, a query image input unit 16, a feature amount verification unit 17, an object ID selection unit 18, and an output unit 19.
  • the depth structure estimation unit 31 receives the learning image from the special object detection unit 12, and estimates the depth structure of the learning image.
  • the depth structure is a structure composed of depth values indicating the distance between the subject and the camera for each pixel.
  • This depth structure can be estimated using, for example, the technique disclosed in Japanese Patent Application Laid-Open No. 2005-151534.
  • the composition ratio is determined according to the value of the high-frequency component evaluation value of the luminance signal of the non-stereo image input from the high-frequency component evaluation unit at the top of the screen and the high-frequency component evaluation unit at the bottom of the screen.
  • Three types of basic depth models are synthesized according to the ratio. Further, the synthesized basic depth model is superimposed on the R signal of the non-stereo image to obtain final depth estimation data.
  • the depth structure estimation unit 31 outputs the estimated depth structure to the actual size calculation unit 32.
  • the actual size calculation unit 32 first uses the special object resolution received from the special object detection unit 12 and the scale received from the feature amount extraction unit 13 in the same manner as the actual size size calculation unit 14 of the first embodiment. The actual feature point size of each feature point is calculated. Thereafter, the actual size calculation unit 32 corrects the feature point actual size using the depth structure received from the depth structure estimation unit 31. The correction of the actual size of the feature point is performed by multiplying the actual size of the feature point by p / q, where p is the depth value at the center point of the special object region used to calculate the actual size of the feature point and q is the depth value of the feature point. It is better to do it.
  • the size in the captured image that occurs when the distance at which the special object and each feature point are imaged is different. Changes can be corrected.
  • the correction is not limited to this example, and may be performed by other methods such as reducing (qp).
  • step S41 the learning image input unit 11 inputs a learning image and an object ID.
  • step S42 the special object detection unit 12 detects a special object.
  • step S43 the depth structure estimation unit 31 estimates the depth structure.
  • step S44 the feature quantity extraction unit 13 detects feature points and extracts feature quantities of the feature points.
  • step S45 the actual size calculation unit 32 calculates the actual feature size of each feature point, and corrects the actual feature size using the depth structure.
  • the actual size calculation unit 32 stores the corrected feature point actual size, the feature point ID, the object ID, and the feature amount in the storage unit 15 together.
  • the image recognition device 30 performs this processing for each learning image.
  • the depth structure estimation unit 31 receives the query image from the special object detection unit 12, performs the same processing as in the learning phase, and estimates the depth structure of the query image.
  • the depth structure estimation unit 31 outputs the estimated depth structure to the actual size calculation unit 32.
  • the actual size calculation unit 32 uses the special object resolution received from the special object detection unit 12 and the scale received from the feature amount extraction unit 13 to perform the same processing as in the learning phase, and the actual feature size actual size. Is calculated. Thereafter, the actual size calculation unit 32 corrects the actual size of the feature point using the depth structure received from the depth structure estimation unit 31, as in the learning phase. Then, the actual size calculation unit 32 outputs the feature amount of each feature point and the corrected feature point actual size size to the feature amount matching unit 17.
  • step S51 a query image is input to the special object detection unit 12 and the feature amount extraction unit 13 through the query image input unit 16.
  • step S52 the special object detection unit 12 detects a special object.
  • step S53 the depth structure estimation unit 31 estimates the depth structure.
  • step S54 the feature amount extraction unit 13 detects feature points and extracts feature amounts of the feature points.
  • step S55 the actual size calculation unit 32 calculates the actual feature size of each feature point, and corrects the actual feature size using the depth structure.
  • step S56 the feature amount matching unit 17 extracts candidate feature points from the storage unit 15 using the corrected actual feature size for each feature point of the query image.
  • the feature amount matching unit 17 obtains a feature point having the smallest distance between the feature amounts from the candidate feature points for each feature point of the query image, and the object ID of the feature point.
  • step S58 the object ID selection unit 18 selects the object ID having the maximum number of votes in step S57 as the recognition result.
  • the output unit 19 outputs the recognition result.
  • the actual size calculation unit 32 generates an in-captured image that occurs when the distance at which the special object and each feature point are captured is different.
  • the change in size can be corrected. For this reason, erroneous recognition can be further reduced.
  • control blocks of the image recognition devices 10 and 30 (in particular, the special object detection unit 12, the feature amount extraction unit 13, the actual size calculation units 14 and 32, the feature amount comparison unit 17, the output unit 19, and the depth structure estimation unit 31) It may be realized by a logic circuit (hardware) formed in an integrated circuit (IC chip) or the like, or may be realized by software using a CPU (Central Processing Unit).
  • the image recognition device 30 includes a CPU that executes instructions of a program that is software that realizes each function, and a ROM (Read Only Memory) in which the program and various data are recorded so as to be readable by a computer (or CPU).
  • a storage device (these are referred to as “recording media”), a RAM (Random Access Memory) for expanding the program, and the like are provided.
  • the computer (or CPU) reads the program from the recording medium and executes the program, thereby achieving the object of one embodiment of the present invention.
  • a “non-temporary tangible medium” such as a tape, a disk, a card, a semiconductor memory, a programmable logic circuit, or the like can be used.
  • the program may be supplied to the computer via an arbitrary transmission medium (such as a communication network or a broadcast wave) that can transmit the program.
  • an arbitrary transmission medium such as a communication network or a broadcast wave
  • one embodiment of the present invention can also be realized in the form of a data signal embedded in a carrier wave, in which the program is embodied by electronic transmission.
  • the image recognition devices 10 and 30 are image recognition devices 10 and 30 that select an image similar to a query image from each learning image, and are specially selected in advance from the query image and each learning image.
  • a special object detection unit 12 that detects a special object determined as an object, a feature amount of a feature point detected from the query image, and a feature amount of a feature point detected from each learning image.
  • a size calculation unit 14, 32, and a feature amount matching unit 17 that compares the feature amount of the feature point of the query image with the feature amount of the feature point of each learning image. 7, the feature point actual size is compared with the feature quantity of the feature point of the query image that is similar, the feature quantity of the feature point of each learning image.
  • the actual size calculation units 4 and 32 are the number of pixels from a predetermined point in the special object region to each of the feature points. And the special object resolution indicating the actual size per pixel in the special object area calculated by the special object detection unit 12 and the scale of the characteristic points calculated by the feature amount extraction unit 13. Each of the actual size may be calculated.
  • the predetermined point is either the center in the area of the special object or the point in the area of the special object. May be.
  • a reference point that matches the special object can be selected.
  • the image recognition device 30 further includes a depth structure estimation unit 31 that estimates the depth structure of the query image and each learning image in any one of the aspects 1 to 3, and the actual size.
  • the size calculation unit 32 may correct the actual size of the feature points using the depth structure.
  • the image recognition devices 10 and 30 according to the aspect 5 of the present invention include a selection result obtained by selecting an image similar to the query image from the learning images, and the actual feature point actual size. You may further provide the output part 19 which outputs size.
  • the size of the object included in the image on the subject can be known.
  • An image recognition method is an image recognition method for selecting an image similar to a query image from each learning image, and is previously determined as a special object from the query image and each learning image.
  • a special object detecting step of detecting a special object, a feature amount of a feature point detected from the query image, a feature amount extracting step of extracting a feature amount of a feature point detected from each learning image, and the query image A size calculation step of calculating a feature point actual size size corresponding to the special object and feature point detected from each of the learning images, and a feature point actual size size corresponding to the special object and feature point detected from the respective learning images;
  • a feature amount matching step that compares the feature amount of the feature point with the feature amount of the feature point of each learning image, and in the feature amount matching step, And the feature quantity of the feature point of the query image feature points actual size are similar, comparing the feature quantity of the feature point of each learning image.
  • the image recognition apparatuses 10 and 30 according to each aspect of the present invention may be realized by a computer.
  • the image recognition is performed by causing the computer to operate as each unit (software element) included in the image recognition apparatus 30.
  • An image recognition program of the image recognition apparatuses 10 and 30 that realize the apparatus 30 by a computer and a computer-readable recording medium on which the image recognition program is recorded also fall within the category of one aspect of the present invention.
  • Image recognition device Special object detector (detector) 13 Feature extraction unit (extraction unit) 14, 32 Actual size calculation unit (image recognition unit) 17 Feature amount matching unit (image recognition unit) 19 Output unit 31 Depth structure estimation unit

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

Provided is an image recognition device (10), comprising: a detection unit (12) which detects, from a query image, a predetermined particular object with an established standard size, and detects information which indicates the exact size of a region in the query image which corresponds to the particular object; an extraction unit (13) which extracts a feature point of the query image from the query image; and image recognition units (14, 17, 18) which recognize the object which is included in the query image on the basis of the information which indicates the actual size of the region in the query image which corresponds to the particular object and the feature point of the query image.

Description

画像認識装置、画像認識方法および画像認識プログラムImage recognition apparatus, image recognition method, and image recognition program
 以下の開示は、画像の局所的な特徴量を用いて画像認識を行う画像認識装置、画像処理方法、および画像認識プログラムに関するものである。 The following disclosure relates to an image recognition apparatus, an image processing method, and an image recognition program that perform image recognition using a local feature amount of an image.
 画像認識とは、画像中に含まれる物体が何であるかを識別する技術である。画像認識の中で、一般物体認識は、物体のカテゴリを識別する技術であり、特定物体認識は、画像データベースの中から同一物体の検索を行う技術である。 Image recognition is a technique for identifying what an object is included in an image. Among image recognition, general object recognition is a technique for identifying a category of an object, and specific object recognition is a technique for searching for the same object from an image database.
 特許文献1には、物体が撮像された画像から、特徴ベクトルを抽出して物体を多数の特徴ベクトルで表現し、特徴の一致する物体を画像データベース中から検索する画像認識方法が示されている。 Patent Document 1 discloses an image recognition method in which a feature vector is extracted from an image of an object, the object is expressed by a large number of feature vectors, and an object having a matching feature is searched from an image database. .
国際公開第2008/026414号International Publication No. 2008/026414
 特許文献1に記載の技術をはじめとする従来の画像認識方法においては、画像のスケールの変化が起こっても同様の特徴量を抽出できるスケール不変な局所特徴量が用いられている。しかしながら、スケール不変な局所特徴量を用いる従来の画像認識方法においては、実際の被写体の上では大きさが全く異なるが、スケールを変えると類似となる特徴量同士が誤って対応するという問題があった。その結果、一部分のテクスチャが相似関係にある異なる物体同士が、同一のものとして誤って認識される問題があった。 In a conventional image recognition method including the technique described in Patent Document 1, a local invariant feature quantity that can extract the same feature quantity even when the scale of the image changes is used. However, in the conventional image recognition method using local feature quantities that do not change the scale, the size is completely different on an actual subject, but there is a problem that similar feature quantities correspond erroneously when the scale is changed. It was. As a result, there is a problem that different objects having similar textures in part are mistakenly recognized as the same object.
 本発明の一態様は、上述のような実情に鑑みてなされたものであり、その目的は、画像認識において誤認識が生じにくい画像認識装置、画像認識方法、及び画像認識プログラムを提供することにある。 One aspect of the present invention has been made in view of the above circumstances, and an object thereof is to provide an image recognition apparatus, an image recognition method, and an image recognition program that are unlikely to cause erroneous recognition in image recognition. is there.
 上記の課題を解決するために、本発明の一態様に係る画像認識装置は、クエリ画像に含まれる物体を識別する画像認識装置であって、前記クエリ画像から、予め定められた、標準的な大きさが既知である特殊物体を検出し、当該クエリ画像における当該特殊物体に対応する領域の実寸サイズを示す情報を検出する検出部と、前記クエリ画像から、前記クエリ画像の特徴点を抽出する抽出部と、前記クエリ画像における前記特殊物体に対応する領域の実寸サイズを示す情報、および、前記クエリ画像の特徴点に基づいて、前記クエリ画像に含まれる物体を識別する画像認識部とを備えている。 In order to solve the above-described problem, an image recognition apparatus according to an aspect of the present invention is an image recognition apparatus that identifies an object included in a query image, and is a standard image determined in advance from the query image. A special object having a known size is detected, a detection unit that detects information indicating an actual size of a region corresponding to the special object in the query image, and a feature point of the query image is extracted from the query image. An extraction unit; and an image recognition unit for identifying an object included in the query image based on information indicating an actual size of a region corresponding to the special object in the query image and a feature point of the query image. ing.
 また、本発明の一態様に係る画像認識方法は、クエリ画像に含まれる物体を識別する画像認識方法であって、前記クエリ画像から、予め定められた、標準的な大きさが既知である特殊物体を検出し、当該クエリ画像における当該特殊物体に対応する領域の実寸サイズを示す情報を検出する検出ステップと、前記クエリ画像から、前記クエリ画像の特徴点を抽出する抽出ステップと、前記クエリ画像における前記特殊物体に対応する領域の実寸サイズを示す情報、および、前記クエリ画像の特徴点に基づいて、前記クエリ画像に含まれる物体を識別する画像認識ステップとを含む。 An image recognition method according to an aspect of the present invention is an image recognition method for identifying an object included in a query image, and a special standard size known in advance is known from the query image. A detection step of detecting an object and detecting information indicating an actual size of a region corresponding to the special object in the query image; an extraction step of extracting a feature point of the query image from the query image; and the query image And an image recognition step for identifying an object included in the query image on the basis of information indicating the actual size of the region corresponding to the special object and a feature point of the query image.
 本発明の一態様によれば、画像認識における誤認識を低減することができる、という効果を奏する。 According to one aspect of the present invention, there is an effect that erroneous recognition in image recognition can be reduced.
本発明の第一の実施形態に係る画像認識装置の概略構成例を示すブロック図である。1 is a block diagram illustrating a schematic configuration example of an image recognition device according to a first embodiment of the present invention. 顔画像を含む画像を示す図である。It is a figure which shows the image containing a face image. 特徴点実寸サイズを説明するための図である。It is a figure for demonstrating the feature point actual size size. 本発明の第一の実施形態に係る画像認識装置における処理の流れを説明するためのフロー図である。It is a flowchart for demonstrating the flow of the process in the image recognition apparatus which concerns on 1st embodiment of this invention. 本発明の第一の実施形態に係る画像認識装置における処理の流れを説明するためのフロー図である。It is a flowchart for demonstrating the flow of the process in the image recognition apparatus which concerns on 1st embodiment of this invention. 本発明の第二の実施形態に係る画像認識装置の概略構成例を示すブロック図である。It is a block diagram which shows the example of schematic structure of the image recognition apparatus which concerns on 2nd embodiment of this invention. 本発明の第二の実施形態に係る画像認識装置における処理の流れを説明するためのフロー図である。It is a flowchart for demonstrating the flow of a process in the image recognition apparatus which concerns on 2nd embodiment of this invention. 本発明の第二の実施形態に係る画像認識装置における処理の流れを説明するためのフロー図である。It is a flowchart for demonstrating the flow of a process in the image recognition apparatus which concerns on 2nd embodiment of this invention.
 以下、添付図面を参照しながら本発明の実施の形態について詳細に説明する。図面において同じ機能を有する部分については同じ符号を付し、繰り返しの説明は省略する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the drawings, portions having the same function are denoted by the same reference numerals, and repeated description is omitted.
 (第一の実施形態)
 図1は、本発明の第一の実施形態に係る画像認識装置10の概略構成例を示すブロック図である。図1に示すように、本実施形態の画像認識装置10は、比較対象の物体の画像を含む学習画像を入力する学習画像入力部11と、特殊物体に関する物体検出を行う特殊物体検出部12と、画像から特徴量を抽出する特徴量抽出部13と、特徴点実寸サイズを算出する実寸サイズ算出部14と、データベースを記憶する記憶部15と、認識対象の物体の画像を含むクエリ画像を入力するクエリ画像入力部16と、特徴量を照合する特徴量照合部17と、照合結果を用いて物体IDを選択する物体ID選択部18と、物体IDの選択結果を出力する出力部19とを備えている。実寸サイズ算出部14、特徴量照合部17および物体ID選択部18を画像認識部と呼ぶこともある。
(First embodiment)
FIG. 1 is a block diagram showing a schematic configuration example of an image recognition apparatus 10 according to the first embodiment of the present invention. As shown in FIG. 1, an image recognition apparatus 10 according to the present embodiment includes a learning image input unit 11 that inputs a learning image including an image of an object to be compared, a special object detection unit 12 that performs object detection related to a special object, A feature amount extraction unit 13 that extracts a feature amount from an image, an actual size calculation unit 14 that calculates a feature point actual size, a storage unit 15 that stores a database, and a query image including an image of an object to be recognized are input. A query image input unit 16, a feature amount matching unit 17 that matches feature amounts, an object ID selection unit 18 that selects an object ID using a matching result, and an output unit 19 that outputs a selection result of the object ID. I have. The actual size calculation unit 14, the feature amount matching unit 17, and the object ID selection unit 18 may be referred to as an image recognition unit.
 画像認識装置10は、学習フェーズと識別フェーズとの2つのフェーズの画像認識方法の処理を行うことで、物体認識を行う。学習フェーズにおいては、画像認識装置10は、複数枚の学習用画像を入力してデータベースを作成する処理を行う。識別フェーズにおいては、画像認識装置10は、クエリ画像を入力して、学習フェーズにおいて入力された学習画像からクエリ画像と類似する画像を識別する処理を行う。以下では、学習フェーズについて説明した後、識別フェーズについて説明する。 The image recognition apparatus 10 performs object recognition by performing image recognition method processing in two phases, a learning phase and an identification phase. In the learning phase, the image recognition apparatus 10 performs a process of inputting a plurality of learning images and creating a database. In the identification phase, the image recognition device 10 inputs a query image and performs a process of identifying an image similar to the query image from the learning image input in the learning phase. Hereinafter, after describing the learning phase, the identification phase will be described.
 学習フェーズでは、学習画像入力部11は、外部から学習用画像を取り込み、取り込んだ学習用画像を特殊物体検出部12と特徴量抽出部13とへ出力する。それぞれの学習用画像には、物体IDが予め付与されており、学習画像入力部11は、その物体IDも合わせて外部から取り込んで特徴量抽出部13へ出力する。学習画像入力部11は、記録装置に記録された画像ファイルの画像を取り込んでもよいし、ネットワークを通じて取得される画像を取り込んでもよい。また、学習画像入力部11において取り込む画像は、静止画であっても動画であってもよい。取り込んだ画像が動画の場合は、学習画像入力部11において動画をフレーム画像へ分解し、分解されたフレーム画像を順次特殊物体検出部12、および特徴量抽出部13に出力する。 In the learning phase, the learning image input unit 11 captures a learning image from the outside, and outputs the captured learning image to the special object detection unit 12 and the feature amount extraction unit 13. Each learning image is given an object ID in advance, and the learning image input unit 11 also captures the object ID from the outside and outputs it to the feature amount extraction unit 13. The learning image input unit 11 may capture an image of an image file recorded in the recording device, or may capture an image acquired through a network. The image captured by the learning image input unit 11 may be a still image or a moving image. When the captured image is a moving image, the learning image input unit 11 decomposes the moving image into frame images, and sequentially outputs the decomposed frame images to the special object detection unit 12 and the feature amount extraction unit 13.
 特殊物体検出部12は、まず、学習画像入力部11から受け取った画像に対して、特殊物体検出処理を行う。特殊物体とは、人間の顔、自動車のナンバープレート等、その物体を専用に認識する画像認識手法がある物体であり、また、実際の大きさが大よそ決まっている物体である。特殊物体検出処理とは、画像中に予め指定した特殊物体が含まれているかどうかを検出し、検出した特殊物体領域の実寸サイズを示す情報を検出する処理である。 The special object detection unit 12 first performs a special object detection process on the image received from the learning image input unit 11. The special object is an object having an image recognition method for recognizing the object, such as a human face or a car license plate, and an object whose actual size is roughly determined. The special object detection process is a process of detecting whether or not a special object designated in advance is included in the image and detecting information indicating the actual size of the detected special object region.
 以下、例として、顔を特殊物体として指定した場合について説明する。特殊物体が顔である場合、特殊物体検出部12は、特殊物体検出処理として、画像中に顔が含まれているかどうかを検出し、その顔領域サイズを特殊物体領域サイズとして検出する。 Hereinafter, the case where a face is designated as a special object will be described as an example. When the special object is a face, the special object detection unit 12 detects whether a face is included in the image as the special object detection process, and detects the face area size as the special object area size.
 図2に、特殊物体の一例である顔の画像を含む画像を示す。図2に示される画像A1内には、被写体の顔A2の画像が含まれている。ここで、顔領域サイズとは、顔A2の領域の画像A1全体内での大きさを示す値であり、例えば、図2における顔A2の領域を直線で上下左右に囲んだ領域A3の横幅(図2におけるA4)もしくは縦幅、又はその両方の画素数とする。以下では、説明の簡略化のため、顔領域サイズを領域A3の横幅A4の画素数とする。なお、このような顔検出処理は、例えば、Viola-Jonesの手法を用いて行うことができる。 FIG. 2 shows an image including a face image which is an example of a special object. The image A1 shown in FIG. 2 includes an image of the subject's face A2. Here, the face area size is a value indicating the size of the area of the face A2 in the entire image A1, and for example, the width of the area A3 in which the area of the face A2 in FIG. It is assumed that the number of pixels is A4) in FIG. In the following, for simplification of description, the face area size is set to the number of pixels of the width A4 of the area A3. Note that such face detection processing can be performed using, for example, the Viola-Jones method.
 次に、特殊物体検出部12は、検出した特殊物体領域サイズを用いて特殊物体解像度を算出する。特殊物体解像度とは、特殊物体領域において1画素が特殊物体の表面の何mmの領域を表わしているかを示す値であり、特殊物体の実際の大きさを特殊物体領域サイズで割ることによって算出する。ただし、特殊物体の実際の大きさは分からないため、予め分かっている標準的な特殊物体の大きさを利用する。例えば、特殊物体が顔である場合、平均的な顔の横幅である160mmを用いる。この値を用いると、例えば、顔領域サイズ(顔領域の横幅)が320画素である場合、特殊物体解像度は0.5[mm/画素](=160[mm]/320[画素])と算出される。特殊物体検出部12は、得られた特殊物体解像度と特殊物体領域の中心座標とを、特殊物体領域の実寸サイズを示す情報として、実寸サイズ算出部14に出力する。なお、複数個の特殊物体が検出された場合は、特殊物体検出部12は、それぞれの特殊物体領域に対して特殊物体解像度を算出し、得られたそれぞれの特殊物体解像度と特殊物体領域の中心座標とを実寸サイズ算出部14へ出力する。また、特殊物体が検出されなかった場合は、特殊物体検出部12は、特殊物体が存在しないことを示す信号を実寸サイズ算出部14へ出力する。 Next, the special object detection unit 12 calculates the special object resolution using the detected special object region size. The special object resolution is a value indicating how many mm of an area of the surface of the special object each pixel represents in the special object area, and is calculated by dividing the actual size of the special object by the special object area size. . However, since the actual size of the special object is unknown, a standard special object size that is known in advance is used. For example, when the special object is a face, an average face width of 160 mm is used. Using this value, for example, when the face area size (width of the face area) is 320 pixels, the special object resolution is calculated as 0.5 [mm / pixel] (= 160 [mm] / 320 [pixel]). Is done. The special object detection unit 12 outputs the obtained special object resolution and the center coordinates of the special object region to the actual size calculation unit 14 as information indicating the actual size of the special object region. When a plurality of special objects are detected, the special object detection unit 12 calculates the special object resolution for each special object region, and the obtained special object resolution and the center of the special object region are calculated. The coordinates are output to the actual size calculation unit 14. When no special object is detected, the special object detection unit 12 outputs a signal indicating that no special object exists to the actual size calculation unit 14.
 特徴量抽出部13は、学習画像入力部11から受け取った各学習画像の局所特徴量を抽出する。例えば、局所特徴量を抽出する方法としてSIFT(Scale-Invariant Feature Transform)を用いる場合について以下に説明する。まず、特徴量抽出部13は、SIFTを用いて特徴点を検出し、各特徴点の座標を得る。特徴点を検出する際に、特徴量抽出部13は、特徴点のスケールも同時に算出する。スケールは、特徴点がどの解像度で得られたかを表す値である。次に、特徴量抽出部13は、得られた各特徴点に対して、SIFT特徴量を算出する。SIFT特徴量は、特徴点の周辺領域の輝度勾配から得られる128次元のベクトルである。特徴量抽出部13は、各特徴点のSIFT特徴量と座標とスケールとを、実寸サイズ算出部14へ出力する。特徴量抽出部13は、学習画像入力部11から受け取った物体IDも合わせて実寸サイズ算出部14へ出力する。 The feature amount extraction unit 13 extracts a local feature amount of each learning image received from the learning image input unit 11. For example, the case where SIFT (Scale-Invariant Feature Transform) is used as a method for extracting local feature amounts will be described below. First, the feature quantity extraction unit 13 detects feature points using SIFT and obtains the coordinates of each feature point. When detecting feature points, the feature amount extraction unit 13 calculates the scale of feature points at the same time. The scale is a value indicating at which resolution the feature point is obtained. Next, the feature quantity extraction unit 13 calculates a SIFT feature quantity for each obtained feature point. The SIFT feature amount is a 128-dimensional vector obtained from the luminance gradient in the peripheral area of the feature point. The feature quantity extraction unit 13 outputs the SIFT feature quantity, coordinates, and scale of each feature point to the actual size calculation unit 14. The feature quantity extraction unit 13 also outputs the object ID received from the learning image input unit 11 to the actual size calculation unit 14 together.
 実寸サイズ算出部14は、特殊物体検出部12から受け取った特殊物体解像度と特徴量抽出部13から受け取ったスケールとを用いて、各特徴点の特徴点実寸サイズを算出する。特徴点実寸サイズとは、特徴点を特徴付けているテクスチャの実際の被写体上における大きさを示す値である。 The actual size calculation unit 14 calculates the feature point actual size size of each feature point using the special object resolution received from the special object detection unit 12 and the scale received from the feature amount extraction unit 13. The feature point actual size size is a value indicating the size of the texture characterizing the feature point on the actual subject.
 ここで、特徴点実寸サイズの求め方について次に詳細に説明する。例えば、特殊物体領域の中心点から各特徴点までを結んだ線分の、実際の被写体上における画素数に、特殊物体解像度を乗算した値を特徴点実寸サイズとする。また、特徴点実寸サイズの別の一例として、特殊物体領域の中心点以外で、かつ、特殊物体領域内の任意の点から各特徴点までを結んだ線分の、実際の被写体上における画素数に、特殊物体解像度を乗算した値を特徴点実寸サイズとしても構わない。 Here, how to obtain the actual size of the feature point will be described in detail. For example, the feature point actual size is a value obtained by multiplying the number of pixels on the actual subject by the line number connecting the center point of the special object region to each feature point and the actual object resolution. In addition, as another example of the actual size of the feature point, the number of pixels on the actual subject other than the center point of the special object area and the line segment connecting any point in the special object area to each feature point In addition, a value obtained by multiplying the special object resolution may be used as the actual feature point size.
 図3は、特徴点実寸サイズを説明するための図である。図3において、例えば顔領域を特殊物体領域B1とすると、特殊物体領域B1の中心点は点B3である。また、特殊物体領域B1に含まれる各画素に対してSIFTを用いて特徴点B2を検出することができる。本実施形態においては、点B3から検出した特徴点B2までを結んだ線分B4の大きさが、特徴点B2の特徴点実寸サイズとなる。上記の説明のように、本実施形態では、顔領域サイズを領域A3の横幅A4の画素数としているため、線分B4の大きさも水平方向の大きさで算出している。しかし、顔領域サイズが水平方向でなく、垂直方向とした場合であっても、水平方向を垂直方向に置きかえて考えることにより、同様に特徴点実寸サイズを求めることができる。さらにまた、顔領域サイズを水平と垂直方向の2次元のサイズとした場合であっても、垂直、水平、それぞれの大きさを算出することによって、同様に特徴点実寸サイズを求めることができる。 FIG. 3 is a diagram for explaining the actual size of feature points. In FIG. 3, for example, when the face area is a special object area B1, the center point of the special object area B1 is a point B3. Further, the feature point B2 can be detected using SIFT for each pixel included in the special object region B1. In the present embodiment, the size of the segment B4 connecting the point B3 to the detected feature point B2 is the actual feature point size of the feature point B2. As described above, in this embodiment, since the face area size is the number of pixels having the width A4 of the area A3, the size of the line segment B4 is also calculated as the horizontal size. However, even when the face area size is not the horizontal direction but the vertical direction, the actual size of the feature points can be obtained in the same manner by considering the horizontal direction in the vertical direction. Furthermore, even when the face region size is a two-dimensional size in the horizontal and vertical directions, the actual feature point size can be obtained in the same manner by calculating the vertical and horizontal sizes.
 以上のようにすることにより、画像認識装置10は、テクスチャの実際の大きさが分からない場合であっても、実際の大きさが推測できる特殊物体を検出し、その大きさを利用して特徴点実寸サイズを推定することができる。なお本実施形態では、特徴量抽出部13は、特徴点をSIFTで求めている。SIFTによる特徴点にはスケールの概念があり、それを考慮するため、実寸サイズ算出部14は、上記で求めた特徴点実寸サイズに対し、さらにスケールを乗算した値を最終的な特徴点実寸サイズとして算出する。 By doing as described above, the image recognition apparatus 10 detects a special object whose actual size can be estimated even when the actual size of the texture is not known, and uses the size to make a feature The point size can be estimated. In the present embodiment, the feature amount extraction unit 13 obtains feature points by SIFT. The feature point by SIFT has the concept of scale, and in order to consider this, the actual size calculation unit 14 obtains the final feature point actual size by multiplying the actual feature size obtained above by the scale. Calculate as
 次に、特殊物体検出部12において、複数の特殊物体が検出された場合について説明する。この場合、実寸サイズ算出部14は、特徴量抽出部13から受け取った特徴点の座標を、特殊物体検出部12から受け取った各特殊物体領域の中心座標と比較し、距離が最も近い特殊物体領域の特殊物体解像度を特徴点実寸サイズの算出に用いる。 Next, a case where a plurality of special objects are detected in the special object detection unit 12 will be described. In this case, the actual size calculation unit 14 compares the coordinates of the feature points received from the feature amount extraction unit 13 with the center coordinates of each special object region received from the special object detection unit 12, and the special object region with the closest distance. Is used to calculate the actual size of the feature point.
 また、特殊物体検出部12から特殊物体が存在しないことを示す信号を受け取った場合は、実寸サイズ算出部14は、特徴点実寸サイズを0とする。 Further, when a signal indicating that no special object exists is received from the special object detection unit 12, the actual size calculation unit 14 sets the actual size of the feature point to zero.
 実寸サイズ算出部14は、最後に、各特徴点の特徴量と特徴点実寸サイズを記憶部15へ出力する。実寸サイズ算出部14は、特徴量抽出部13から受け取った物体IDも合わせて記憶部15へ出力する。 Finally, the actual size calculation unit 14 outputs the feature amount of each feature point and the feature point actual size to the storage unit 15. The actual size calculation unit 14 also outputs the object ID received from the feature amount extraction unit 13 to the storage unit 15.
 記憶部15は、データベースを記憶している。データベースは、特徴点ID、物体ID、特徴量、及び特徴点実寸サイズを保持するフィールド(列)から構成され、一つのレコード(行)は、一つの特徴点に関するデータである。記憶部15は、実寸サイズ算出部14から一つの特徴点に関する特徴量、特徴点実寸サイズ、及び物体IDを受け取ると、特徴点IDを作成して、それを記憶部15の特徴点IDのフィールドに記憶する。特徴点IDは、特徴点によって異なる任意の値である。そして、特徴量、特徴点実寸サイズ、物体ID、及び特徴点IDは、記憶部15のそれぞれのフィールドに記憶される。なお、1枚の学習画像から得られた複数の特徴点の物体IDは同じ値である。 The storage unit 15 stores a database. The database is composed of fields (columns) that hold feature point IDs, object IDs, feature amounts, and feature point actual size sizes, and one record (row) is data related to one feature point. When the storage unit 15 receives the feature amount, the feature point actual size size, and the object ID related to one feature point from the actual size calculation unit 14, the storage unit 15 creates a feature point ID and stores it in the feature point ID field of the storage unit 15. To remember. The feature point ID is an arbitrary value that varies depending on the feature point. The feature amount, the actual feature point size, the object ID, and the feature point ID are stored in each field of the storage unit 15. Note that object IDs of a plurality of feature points obtained from one learning image have the same value.
 学習フェーズにおける画像認識方法の処理の流れの一例を、図4を参照しながら説明する。 An example of the processing flow of the image recognition method in the learning phase will be described with reference to FIG.
 まず、ステップS1として、学習画像入力部11は、学習画像と物体IDとを入力する。 First, as step S1, the learning image input unit 11 inputs a learning image and an object ID.
 次に、ステップS2として、特殊物体検出部12は、特殊物体を検出する。 Next, as step S2, the special object detection unit 12 detects a special object.
 その後、ステップS3として、特徴量抽出部13は、特徴点を検出し、各特徴点の特徴量を抽出する。 Thereafter, as step S3, the feature quantity extraction unit 13 detects feature points and extracts feature quantities of the feature points.
 最後に、ステップS4として、実寸サイズ算出部14は、各特徴点の特徴点実寸サイズを算出し、この特徴点実寸サイズと、特徴点ID、物体ID、及び特徴量とを合わせて記憶部15に記憶させる。学習フェーズでは、画像認識装置10は、この処理を各学習用画像について行う。 Finally, as step S4, the actual size calculation unit 14 calculates the feature point actual size of each feature point, and combines the feature point actual size, the feature point ID, the object ID, and the feature amount together with the storage unit 15. Remember me. In the learning phase, the image recognition apparatus 10 performs this process on each learning image.
 次に、識別フェーズについて説明する。識別フェーズでは、クエリ画像入力部16は、外部からクエリ画像を取り込み、特殊物体検出部12に出力する。クエリ画像入力部16は、記録装置に記録された画像ファイルをクエリ画像として取り込んでもよいし、ネットワークを通じて取得する画像をクエリ画像として取り込んでもよい。 Next, the identification phase will be described. In the identification phase, the query image input unit 16 captures a query image from the outside and outputs it to the special object detection unit 12. The query image input unit 16 may capture an image file recorded in the recording device as a query image, or may capture an image acquired through a network as a query image.
 特殊物体検出部12は、クエリ画像入力部16から受け取った画像に対して、学習フェーズの場合と同様の処理を行い、特殊物体解像度を実寸サイズ算出部14に出力する。なお、複数個の特殊物体が検出された場合は、特殊物体検出部12は、それぞれの特殊物体領域に対して特殊物体解像度を算出し、得られたそれぞれの特殊物体解像度を、特殊物体領域の実寸サイズを示す情報として、実寸サイズ算出部14に出力する。また、特殊物体が検出されなかった場合は、特殊物体検出部12は、特殊物体が存在しないことを示す信号を実寸サイズ算出部14へ出力する。 The special object detection unit 12 performs the same processing as in the learning phase on the image received from the query image input unit 16 and outputs the special object resolution to the actual size calculation unit 14. When a plurality of special objects are detected, the special object detection unit 12 calculates the special object resolution for each special object area, and the obtained special object resolution is calculated using the special object area. The actual size is output to the actual size calculator 14 as information indicating the actual size. When no special object is detected, the special object detection unit 12 outputs a signal indicating that no special object exists to the actual size calculation unit 14.
 特徴量抽出部13は、クエリ画像に対して、学習フェーズの場合と同様の処理を行い、特徴量を算出する。そして、特徴量抽出部13は、各特徴点の特徴量と座標とスケールとを、実寸サイズ算出部14へ出力する。 The feature amount extraction unit 13 performs the same processing as that in the learning phase on the query image to calculate the feature amount. Then, the feature amount extraction unit 13 outputs the feature amount, coordinates, and scale of each feature point to the actual size calculation unit 14.
 実寸サイズ算出部14は、特殊物体検出部12から受け取った特殊物体解像度と特徴量抽出部13から受け取ったスケールとを用いて、学習フェーズの場合と同様の処理を行い、特徴点実寸サイズを算出する。そして、実寸サイズ算出部14は、各特徴点の特徴量と特徴点実寸サイズとを特徴量照合部17へ出力する。 The actual size calculation unit 14 performs the same processing as in the learning phase using the special object resolution received from the special object detection unit 12 and the scale received from the feature amount extraction unit 13, and calculates the actual feature size size. To do. Then, the actual size calculation unit 14 outputs the feature amount of each feature point and the feature point actual size size to the feature amount matching unit 17.
 特徴量照合部17は、実寸サイズ算出部14から受け取ったクエリ画像の各特徴点の特徴量を、記憶部15のデータベース内の特徴量と照合する。この特徴量は、各学習画像から検出された特徴点の特徴量である。照合は次の手順で行う。まず、特徴量照合部17は、クエリ画像の特徴点と、データベース内の全ての特徴点との間で特徴点実寸サイズを比較し、特徴点実寸サイズの差分の絶対値が予め定めた閾値以内の特徴点を候補特徴点としてデータベースから抽出する。この際、クエリ画像の特徴点の特徴点実寸サイズが0である場合、つまり、特殊物体検出部12で特殊物体が検出されなかった場合は、特徴量照合部17は、データベース内の全ての特徴点を候補特徴点として抽出する。また、クエリ画像の特徴点の特徴点実寸サイズが0でなく、データベース内のある特徴点の特徴点実寸サイズが0である場合、特徴量照合部17は、データベース内のその特徴点も候補特徴点として抽出する。次に、特徴量照合部17は、クエリ画像の特徴点と各候補特徴点との特徴量(ベクトル)の距離を求める。そして、特徴量照合部17は、距離が最小となる候補特徴点を選択し、その物体IDを調べ、その物体に1票を投じる。特徴量照合部17は、この手順を、クエリ画像の各特徴点に対して行う。特徴量照合部17は、各物体IDに対する得票数を物体ID選択部18へ出力する。 The feature amount collation unit 17 collates the feature amount of each feature point of the query image received from the actual size calculation unit 14 with the feature amount in the database of the storage unit 15. This feature amount is a feature amount of a feature point detected from each learning image. The verification is performed according to the following procedure. First, the feature amount matching unit 17 compares the feature point actual size size between the feature point of the query image and all feature points in the database, and the absolute value of the difference between the feature point actual size sizes is within a predetermined threshold value. Are extracted from the database as candidate feature points. At this time, when the feature point actual size of the feature point of the query image is 0, that is, when the special object detection unit 12 does not detect the special object, the feature amount matching unit 17 performs all of the features in the database. Extract points as candidate feature points. If the feature point actual size size of the feature point of the query image is not 0 and the feature point actual size size of a feature point in the database is 0, the feature amount matching unit 17 also selects the feature point in the database as a candidate feature. Extract as a point. Next, the feature amount matching unit 17 obtains the distance of the feature amount (vector) between the feature point of the query image and each candidate feature point. Then, the feature amount matching unit 17 selects a candidate feature point having the smallest distance, examines the object ID, and casts one vote on the object. The feature amount matching unit 17 performs this procedure for each feature point of the query image. The feature amount matching unit 17 outputs the number of votes for each object ID to the object ID selecting unit 18.
 物体ID選択部18は、特徴量照合部17から受け取った各物体IDに対する得票数から、得票数が最大となった物体IDを選択し、それを認識結果として出力部19に出力する。出力部19は、物体ID選択部18から受け取った物体IDを画像認識装置10の外部に出力する。 The object ID selection unit 18 selects the object ID having the maximum number of votes from the number of votes for each object ID received from the feature amount matching unit 17 and outputs it to the output unit 19 as a recognition result. The output unit 19 outputs the object ID received from the object ID selection unit 18 to the outside of the image recognition device 10.
 識別フェーズにおける画像認識方法の処理の流れの一例を、図5を参照しながら説明する。 An example of the processing flow of the image recognition method in the identification phase will be described with reference to FIG.
 まず、ステップS21として、クエリ画像入力部16は、クエリ画像を入力する。 First, as step S21, the query image input unit 16 inputs a query image.
 次に、ステップS22として、特殊物体検出部12は、特殊物体を検出する。 Next, as step S22, the special object detection unit 12 detects a special object.
 その後、ステップS23として、特徴量抽出部13は、特徴点を検出し、各特徴点の特徴量を抽出する。 Thereafter, in step S23, the feature amount extraction unit 13 detects feature points and extracts feature amounts of the feature points.
 次に、ステップS24として、実寸サイズ算出部14は、各特徴点の特徴点実寸サイズを算出する。 Next, in step S24, the actual size calculation unit 14 calculates the actual feature size of each feature point.
 その後、ステップS25として、特徴量照合部17は、クエリ画像の各特徴点に対して、記憶部15の中から候補特徴点を抽出する。 Thereafter, as step S25, the feature amount matching unit 17 extracts candidate feature points from the storage unit 15 for each feature point of the query image.
 次に、ステップS26として、特徴量照合部17は、クエリ画像の各特徴点に対して、候補特徴点の中から特徴量間の距離が最小となる特徴点を求め、その特徴点の物体IDに投票する。 Next, as step S26, the feature amount matching unit 17 obtains a feature point having the smallest distance between the feature amounts from the candidate feature points for each feature point of the query image, and the object ID of the feature point. Vote for
 その後、ステップS27として、物体ID選択部18は、ステップS26における得票数が最大となった物体IDを認識結果として選択する。 Thereafter, in step S27, the object ID selection unit 18 selects the object ID having the maximum number of votes in step S26 as a recognition result.
 その後、ステップS28において、出力部19は、認識結果を出力する。 Thereafter, in step S28, the output unit 19 outputs the recognition result.
 本実施形態における誤認識を低減する効果について以下に説明する。従来技術における特徴量を照合する機構においては、特徴点実寸サイズを考慮しないため、データベース内の全ての特徴点の中から距離が最小となる特徴点を選択する。そのため、実際の被写体上での特徴量のサイズが全く異なる特徴点が選択され、投票されるべき物体IDとは異なる物体IDに投票されてしまう場合がある。本実施形態によれば、特徴点実寸サイズを元に候補特徴点を抽出し、候補特徴点の中から距離が最小となる特徴点を選択するため、実際の被写体上での特徴量のサイズが全く異なる特徴点は選択され難くなるという効果がある。その結果、物体ID選択部18において誤った物体IDが選択され難くなるという効果がある。 The effect of reducing misrecognition in this embodiment will be described below. In the conventional feature matching mechanism, the feature point actual size size is not taken into consideration, so the feature point having the smallest distance is selected from all the feature points in the database. Therefore, a feature point having a completely different feature amount size on the actual subject may be selected and voted for an object ID different from the object ID to be voted. According to the present embodiment, candidate feature points are extracted based on the actual feature point actual size, and the feature point having the smallest distance is selected from the candidate feature points. There is an effect that it becomes difficult to select completely different feature points. As a result, there is an effect that it is difficult for the object ID selection unit 18 to select an incorrect object ID.
 なお、上述の例では、特徴量抽出部13において、特徴点検出を行って特徴点の座標を決める方法を採用したが、それに限らず、他の方法で特徴点の座標を決めてもよい。例えば、特徴量抽出部13は、格子状の位置に特徴点の座標を予め配置し、その座標において特徴量を抽出してもよい。 In the above example, the feature quantity extraction unit 13 employs a method of determining feature point coordinates by detecting feature points. However, the present invention is not limited to this, and the feature point coordinates may be determined by other methods. For example, the feature quantity extraction unit 13 may pre-arrange the coordinates of the feature points at grid-like positions and extract the feature quantities at the coordinates.
 また、上述の例では、特徴量としてSIFT特徴量を用いた例を示したが、それに限らず、SURF、HOG等の他の局所特徴量を用いてもよい。 In the above-described example, the SIFT feature amount is used as the feature amount. However, the present invention is not limited to this, and other local feature amounts such as SURF and HOG may be used.
 また、上述の例では、記憶部15において、各特徴点の特徴量をそのまま記憶したが、それに限らず、Bag-of-Features手法を用いて、ベクトル量子化されたデータを特徴量として記録してもよい。その場合、Bag-of-Featuresのクラスタリングの過程で特徴量と合わせて特徴点実寸サイズも組み込んでクラスタリングを行う。 In the above-described example, the feature amount of each feature point is stored in the storage unit 15 as it is. However, the present invention is not limited to this, and vector quantized data is recorded as a feature amount using the Bag-of-Features method. May be. In this case, clustering is performed by incorporating the actual feature point size together with the feature amount in the Bag-of-Features clustering process.
 また、上述の例では、特徴量照合部17において、候補特徴点の中から、クエリ画像の特徴点との間の特徴量間の距離が最小となる候補特徴点を求めたが、それに限らず、距離が近似的に最小となる候補特徴点を求めてもよい。例えば、特徴量照合部17は、ANN(Approximate Nearest Neighbor)を用いることで、近似最近傍探索を行うことができる。これにより、計算時間を短縮することができる。 Further, in the above-described example, the feature amount matching unit 17 calculates the candidate feature point that minimizes the distance between the feature amounts between the feature points of the query image from the candidate feature points. The candidate feature point whose distance is approximately the minimum may be obtained. For example, the feature amount matching unit 17 can perform an approximate nearest neighbor search by using an ANN (Appearance Nearest Neighbor). Thereby, calculation time can be shortened.
 また、上述の例では、出力部19において、認識結果のみを出力したが、それに限らず、認識結果と合わせて特徴点実寸サイズを出力してもよい。これは、クエリ画像の全ての特徴点における特徴点実寸サイズを出力してもよいし、それを用いて、平均値等の統計的に得られる値を出力してもよい。 In the above example, the output unit 19 outputs only the recognition result. However, the feature point actual size may be output together with the recognition result. This may output the feature point actual size size of all the feature points of the query image, or may output a statistically obtained value such as an average value using the feature point actual size size.
 また、上述の例は、特定物体認識と一般物体認識との両方において用いることができる。 Also, the above-described example can be used for both specific object recognition and general object recognition.
 上述のとおり、本実施形態の画像認識装置10によれば、特徴量の実際の被写体上でのサイズを考慮することで、画像認識における誤認識を低減することができる。 As described above, according to the image recognition apparatus 10 of the present embodiment, erroneous recognition in image recognition can be reduced by considering the size of the feature amount on the actual subject.
 (第二の実施形態)
 図6は、本発明の第二の実施形態に係る画像認識装置30の概略構成例を示すブロック図である。図6に示すように、本実施形態の画像認識装置30は、学習画像入力部11と、特殊物体検出部12と、画像の構図を算出する奥行き構造推定部31と、特徴量抽出部13と、特徴点実寸サイズを算出する実寸サイズ算出部32と、記憶部15と、クエリ画像入力部16と、特徴量照合部17と、物体ID選択部18と、出力部19とを備えている。
(Second embodiment)
FIG. 6 is a block diagram illustrating a schematic configuration example of the image recognition apparatus 30 according to the second embodiment of the present invention. As illustrated in FIG. 6, the image recognition apparatus 30 according to the present embodiment includes a learning image input unit 11, a special object detection unit 12, a depth structure estimation unit 31 that calculates the composition of an image, and a feature amount extraction unit 13. , An actual size calculation unit 32 for calculating the actual size of the feature point, a storage unit 15, a query image input unit 16, a feature amount verification unit 17, an object ID selection unit 18, and an output unit 19.
 学習フェーズにおいて、奥行き構造推定部31は、特殊物体検出部12から学習画像を受け取り、学習画像の奥行き構造を推定する。奥行き構造とは、各画素について被写体とカメラ間の距離を示す奥行き値で構成される構造である。この奥行き構造は、例えば、特開2005-151534号に開示されている技術等を用いて推定することができる。当該技術では、画面上部の高域成分評価部および画面下部の高域成分評価部から入力された非立体画像の輝度信号の高域成分評価値の値に応じて合成比率を決定し、その合成比率に応じて3種類の基本奥行きモデルを合成する。さらに、合成した基本奥行きモデルを、非立体画像のR信号に重畳して最終的な奥行き推定データとする。奥行き構造推定部31は、推定した奥行き構造を実寸サイズ算出部32へ出力する。 In the learning phase, the depth structure estimation unit 31 receives the learning image from the special object detection unit 12, and estimates the depth structure of the learning image. The depth structure is a structure composed of depth values indicating the distance between the subject and the camera for each pixel. This depth structure can be estimated using, for example, the technique disclosed in Japanese Patent Application Laid-Open No. 2005-151534. In this technology, the composition ratio is determined according to the value of the high-frequency component evaluation value of the luminance signal of the non-stereo image input from the high-frequency component evaluation unit at the top of the screen and the high-frequency component evaluation unit at the bottom of the screen. Three types of basic depth models are synthesized according to the ratio. Further, the synthesized basic depth model is superimposed on the R signal of the non-stereo image to obtain final depth estimation data. The depth structure estimation unit 31 outputs the estimated depth structure to the actual size calculation unit 32.
 実寸サイズ算出部32は、まず、第一の実施形態の実寸サイズ算出部14と同様に、特殊物体検出部12から受け取った特殊物体解像度と特徴量抽出部13から受け取ったスケールとを用いて、各特徴点の特徴点実寸サイズを算出する。その後、実寸サイズ算出部32は、奥行き構造推定部31から受け取った奥行き構造を用いて特徴点実寸サイズを補正する。特徴点実寸サイズの補正は、特徴点実寸サイズの算出に用いた特殊物体領域の中心点における奥行き値をp、特徴点の奥行き値をqとする場合、特徴点実寸サイズにp/qを乗じることによって行うとよい。これにより、特殊物体と各特徴点とが異なる奥行きにある場合に、特殊物体と各特徴点とのそれぞれに対し、撮像される際の距離が異なる場合に生じる、撮像画像内での大きさの変化を補正することができる。なお、補正はこの例に限らず、(q-p)を減じる等、他の方法によって行ってもよい。 The actual size calculation unit 32 first uses the special object resolution received from the special object detection unit 12 and the scale received from the feature amount extraction unit 13 in the same manner as the actual size size calculation unit 14 of the first embodiment. The actual feature point size of each feature point is calculated. Thereafter, the actual size calculation unit 32 corrects the feature point actual size using the depth structure received from the depth structure estimation unit 31. The correction of the actual size of the feature point is performed by multiplying the actual size of the feature point by p / q, where p is the depth value at the center point of the special object region used to calculate the actual size of the feature point and q is the depth value of the feature point. It is better to do it. As a result, when the special object and each feature point are at different depths, the size in the captured image that occurs when the distance at which the special object and each feature point are imaged is different. Changes can be corrected. The correction is not limited to this example, and may be performed by other methods such as reducing (qp).
 学習フェーズにおける画像認識方法の処理の流れの一例を、図7を参照しながら説明する。 An example of the processing flow of the image recognition method in the learning phase will be described with reference to FIG.
 まず、ステップS41として、学習画像入力部11は、学習画像と物体IDとを入力する。 First, as step S41, the learning image input unit 11 inputs a learning image and an object ID.
 次に、ステップS42として、特殊物体検出部12は、特殊物体を検出する。 Next, as step S42, the special object detection unit 12 detects a special object.
 さらに、ステップS43として、奥行き構造推定部31は、奥行き構造を推定する。 Furthermore, as step S43, the depth structure estimation unit 31 estimates the depth structure.
 その後、ステップS44として、特徴量抽出部13は、特徴点を検出し、各特徴点の特徴量を抽出する。 Thereafter, in step S44, the feature quantity extraction unit 13 detects feature points and extracts feature quantities of the feature points.
 最後に、ステップS45として、実寸サイズ算出部32は、各特徴点の特徴点実寸サイズを算出し、奥行き構造を用いて特徴点実寸サイズを補正する。 Finally, in step S45, the actual size calculation unit 32 calculates the actual feature size of each feature point, and corrects the actual feature size using the depth structure.
 そして、実寸サイズ算出部32は、補正した特徴点実寸サイズと、特徴点ID、物体ID、及び特徴量とを合わせて記憶部15に記憶させる。学習フェーズでは、画像認識装置30は、この処理を各学習用画像について行う。 The actual size calculation unit 32 stores the corrected feature point actual size, the feature point ID, the object ID, and the feature amount in the storage unit 15 together. In the learning phase, the image recognition device 30 performs this processing for each learning image.
 次に、識別フェーズについて説明する。識別フェーズにおいて、奥行き構造推定部31は、特殊物体検出部12からクエリ画像を受け取り、学習フェーズの場合と同様の処理を行い、クエリ画像の奥行き構造を推定する。奥行き構造推定部31は、推定した奥行き構造を実寸サイズ算出部32へ出力する。 Next, the identification phase will be described. In the identification phase, the depth structure estimation unit 31 receives the query image from the special object detection unit 12, performs the same processing as in the learning phase, and estimates the depth structure of the query image. The depth structure estimation unit 31 outputs the estimated depth structure to the actual size calculation unit 32.
 実寸サイズ算出部32は、まず、特殊物体検出部12から受け取った特殊物体解像度と特徴量抽出部13から受け取ったスケールとを用いて、学習フェーズの場合と同様の処理を行い、特徴点実寸サイズを算出する。その後、実寸サイズ算出部32は、学習フェーズの場合と同様に、奥行き構造推定部31から受け取った奥行き構造を用いて特徴点実寸サイズを補正する。そして、実寸サイズ算出部32は、各特徴点の特徴量と補正した特徴点実寸サイズとを特徴量照合部17へ出力する。 First, the actual size calculation unit 32 uses the special object resolution received from the special object detection unit 12 and the scale received from the feature amount extraction unit 13 to perform the same processing as in the learning phase, and the actual feature size actual size. Is calculated. Thereafter, the actual size calculation unit 32 corrects the actual size of the feature point using the depth structure received from the depth structure estimation unit 31, as in the learning phase. Then, the actual size calculation unit 32 outputs the feature amount of each feature point and the corrected feature point actual size size to the feature amount matching unit 17.
 識別フェーズにおける画像認識方法の処理の流れの一例を、図8を参照しながら説明する。 An example of the processing flow of the image recognition method in the identification phase will be described with reference to FIG.
 まず、ステップS51として、クエリ画像入力部16を通じてクエリ画像が、特殊物体検出部12、及び特徴量抽出部13に入力される。 First, as step S51, a query image is input to the special object detection unit 12 and the feature amount extraction unit 13 through the query image input unit 16.
 次に、ステップS52として、特殊物体検出部12は、特殊物体を検出する。 Next, as step S52, the special object detection unit 12 detects a special object.
 さらに、ステップS53として、奥行き構造推定部31は、奥行き構造を推定する。 Furthermore, as step S53, the depth structure estimation unit 31 estimates the depth structure.
 その後、ステップS54として、特徴量抽出部13は、特徴点を検出し、各特徴点の特徴量を抽出する。 Thereafter, in step S54, the feature amount extraction unit 13 detects feature points and extracts feature amounts of the feature points.
 次に、ステップS55として、実寸サイズ算出部32は、各特徴点の特徴点実寸サイズを算出し、奥行き構造を用いて特徴点実寸サイズを補正する。 Next, in step S55, the actual size calculation unit 32 calculates the actual feature size of each feature point, and corrects the actual feature size using the depth structure.
 その後、ステップS56として、特徴量照合部17は、クエリ画像の各特徴点に対して、補正された特徴点実寸サイズを用いて記憶部15の中から候補特徴点を抽出する。 Thereafter, in step S56, the feature amount matching unit 17 extracts candidate feature points from the storage unit 15 using the corrected actual feature size for each feature point of the query image.
 次に、ステップS57として、特徴量照合部17は、クエリ画像の各特徴点に対して、候補特徴点の中から特徴量間の距離が最小となる特徴点を求め、その特徴点の物体IDに投票する。 Next, as step S57, the feature amount matching unit 17 obtains a feature point having the smallest distance between the feature amounts from the candidate feature points for each feature point of the query image, and the object ID of the feature point. Vote for
 その後、ステップS58として、物体ID選択部18は、ステップS57における得票数が最大となった物体IDを認識結果として選択する。その後、ステップS59において、出力部19は、認識結果を出力する。 Thereafter, in step S58, the object ID selection unit 18 selects the object ID having the maximum number of votes in step S57 as the recognition result. Thereafter, in step S59, the output unit 19 outputs the recognition result.
 上述のとおり、本実施形態の画像認識装置30によれば、実寸サイズ算出部32が、特殊物体と各特徴点とのそれぞれに対し、撮像される際の距離が異なる場合に生じる、撮像画像内での大きさの変化を補正することができる。このため、誤認識をさらに低減することができる。 As described above, according to the image recognition device 30 of the present embodiment, the actual size calculation unit 32 generates an in-captured image that occurs when the distance at which the special object and each feature point are captured is different. The change in size can be corrected. For this reason, erroneous recognition can be further reduced.
 (ソフトウェアによる実現例)
 画像認識装置10,30の制御ブロック(特に特殊物体検出部12、特徴量抽出部13、実寸サイズ算出部14,32、特徴量照合部17、出力部19、および奥行き構造推定部31)は、集積回路(ICチップ)等に形成された論理回路(ハードウェア)によって実現してもよいし、CPU(Central Processing Unit)を用いてソフトウェアによって実現してもよい。
(Example of software implementation)
The control blocks of the image recognition devices 10 and 30 (in particular, the special object detection unit 12, the feature amount extraction unit 13, the actual size calculation units 14 and 32, the feature amount comparison unit 17, the output unit 19, and the depth structure estimation unit 31) It may be realized by a logic circuit (hardware) formed in an integrated circuit (IC chip) or the like, or may be realized by software using a CPU (Central Processing Unit).
 後者の場合、画像認識装置30は、各機能を実現するソフトウェアであるプログラムの命令を実行するCPU、上記プログラムおよび各種データがコンピュータ(またはCPU)で読み取り可能に記録されたROM(Read Only Memory)または記憶装置(これらを「記録媒体」と称する)、上記プログラムを展開するRAM(Random Access Memory)などを備えている。そして、コンピュータ(またはCPU)が上記プログラムを上記記録媒体から読み取って実行することにより、本発明の一態様の目的が達成される。上記記録媒体としては、「一時的でない有形の媒体」、例えば、テープ、ディスク、カード、半導体メモリ、プログラマブルな論理回路などを用いることができる。また、上記プログラムは、該プログラムを伝送可能な任意の伝送媒体(通信ネットワークや放送波等)を介して上記コンピュータに供給されてもよい。なお、本発明の一態様は、上記プログラムが電子的な伝送によって具現化された、搬送波に埋め込まれたデータ信号の形態でも実現され得る。 In the latter case, the image recognition device 30 includes a CPU that executes instructions of a program that is software that realizes each function, and a ROM (Read Only Memory) in which the program and various data are recorded so as to be readable by a computer (or CPU). Alternatively, a storage device (these are referred to as “recording media”), a RAM (Random Access Memory) for expanding the program, and the like are provided. The computer (or CPU) reads the program from the recording medium and executes the program, thereby achieving the object of one embodiment of the present invention. As the recording medium, a “non-temporary tangible medium” such as a tape, a disk, a card, a semiconductor memory, a programmable logic circuit, or the like can be used. The program may be supplied to the computer via an arbitrary transmission medium (such as a communication network or a broadcast wave) that can transmit the program. Note that one embodiment of the present invention can also be realized in the form of a data signal embedded in a carrier wave, in which the program is embodied by electronic transmission.
 〔まとめ〕
 本発明の態様1に係る画像認識装置10,30は、クエリ画像に類似した画像を各学習画像から選択する画像認識装置10,30であって、前記クエリ画像および前記各学習画像から予め特殊な物体と定められている特殊物体を検出する特殊物体検出部12と、前記クエリ画像から検出される特徴点の特徴量、および前記各学習画像から検出される特徴点の特徴量を抽出する特徴量抽出部13と、前記クエリ画像から検出された特殊物体および特徴点に応じた特徴点実寸サイズ、並びに前記各学習画像から検出された特殊物体および特徴点に応じた特徴点実寸サイズを算出する実寸サイズ算出部14,32と、前記クエリ画像の特徴点の特徴量と、前記各学習画像の特徴点の特徴量とを比較する特徴量照合部17とを備え、前記特徴量照合部17は、前記特徴点実寸サイズが類似する前記クエリ画像の特徴点の特徴量と、前記各学習画像の特徴点の特徴量を比較する。
[Summary]
The image recognition devices 10 and 30 according to the first aspect of the present invention are image recognition devices 10 and 30 that select an image similar to a query image from each learning image, and are specially selected in advance from the query image and each learning image. A special object detection unit 12 that detects a special object determined as an object, a feature amount of a feature point detected from the query image, and a feature amount of a feature point detected from each learning image The actual size for calculating the feature point actual size according to the extraction unit 13 and the special object and feature point detected from the query image, and the feature point actual size according to the special object and feature point detected from each learning image A size calculation unit 14, 32, and a feature amount matching unit 17 that compares the feature amount of the feature point of the query image with the feature amount of the feature point of each learning image. 7, the feature point actual size is compared with the feature quantity of the feature point of the query image that is similar, the feature quantity of the feature point of each learning image.
 上記の構成によれば、画像認識における誤認識を低減することができる。 According to the above configuration, erroneous recognition in image recognition can be reduced.
 本発明の態様2に係る画像認識装置10,30は、上記態様1において、前記実寸サイズ算出部4,32は、前記特殊物体の領域内の所定の点から前記特徴点の各々までの画素数と、前記特殊物体検出部12が算出した特殊物体の領域における1画素あたりの実寸サイズを示す特殊物体解像度と、前記特徴量抽出部13が算出した特徴点のスケールとを用いて、前記特徴点実寸サイズの各々を算出してもよい。 In the image recognition apparatuses 10 and 30 according to the aspect 2 of the present invention, the actual size calculation units 4 and 32 are the number of pixels from a predetermined point in the special object region to each of the feature points. And the special object resolution indicating the actual size per pixel in the special object area calculated by the special object detection unit 12 and the scale of the characteristic points calculated by the feature amount extraction unit 13. Each of the actual size may be calculated.
 上記の構成によれば、特殊物体の領域における1画素あたりの実寸サイズと、特徴点のスケールとの両方が用いられるので、画像のスケールの変化が起こった場合にも誤認識が生じにくい。 According to the above configuration, since both the actual size per pixel in the area of the special object and the feature point scale are used, erroneous recognition is unlikely to occur even when the scale of the image changes.
 本発明の態様3に係る画像認識装置10,30は、上記態様2において、前記所定の点は、前記特殊物体の領域内の中心、もしくは、前記特殊物体の領域内のいずれかの点であってもよい。 In the image recognition devices 10 and 30 according to aspect 3 of the present invention, in the aspect 2, the predetermined point is either the center in the area of the special object or the point in the area of the special object. May be.
 上記の構成によれば、特殊物体に合った基準となる点が選択され得る。 According to the above configuration, a reference point that matches the special object can be selected.
 本発明の態様4に係る画像認識装置30は、上記態様1~3の何れか一態様において、前記クエリ画像および前記各学習画像の奥行き構造を推定する奥行き構造推定部31をさらに備え、前記実寸サイズ算出部32において前記奥行き構造を用いて前記特徴点実寸サイズを補正してもよい。 The image recognition device 30 according to aspect 4 of the present invention further includes a depth structure estimation unit 31 that estimates the depth structure of the query image and each learning image in any one of the aspects 1 to 3, and the actual size. The size calculation unit 32 may correct the actual size of the feature points using the depth structure.
 上記の構成によれば、誤認識をさらに低減することができる。 According to the above configuration, erroneous recognition can be further reduced.
 本発明の態様5に係る画像認識装置10,30は、上記態様1~4の何れか一態様において、前記クエリ画像に類似した画像を前記各学習画像から選択した選択結果と、前記特徴点実寸サイズとを出力する出力部19をさらに備えてもよい。 In any one of the above aspects 1 to 4, the image recognition devices 10 and 30 according to the aspect 5 of the present invention include a selection result obtained by selecting an image similar to the query image from the learning images, and the actual feature point actual size. You may further provide the output part 19 which outputs size.
 上記の構成によれば、画像に含まれる物体の、被写体上における大きさを知ることができる。 According to the above configuration, the size of the object included in the image on the subject can be known.
 本発明の態様6に係る画像認識方法は、クエリ画像に類似した画像を各学習画像から選択する画像認識方法であって、前記クエリ画像および前記各学習画像から予め特殊な物体と定められている特殊物体を検出する特殊物体検出ステップと、前記クエリ画像から検出される特徴点の特徴量、および前記各学習画像から検出される特徴点の特徴量を抽出する特徴量抽出ステップと、前記クエリ画像から検出された特殊物体および特徴点に応じた特徴点実寸サイズ、並びに前記各学習画像から検出された特殊物体および特徴点に応じた特徴点実寸サイズを算出するサイズ算出ステップと、前記クエリ画像の特徴点の特徴量と、前記各学習画像の特徴点の特徴量とを比較する特徴量照合ステップとを含み、前記特徴量照合ステップにおいて、前記特徴点実寸サイズが類似する前記クエリ画像の特徴点の特徴量と、前記各学習画像の特徴点の特徴量を比較する。 An image recognition method according to aspect 6 of the present invention is an image recognition method for selecting an image similar to a query image from each learning image, and is previously determined as a special object from the query image and each learning image. A special object detecting step of detecting a special object, a feature amount of a feature point detected from the query image, a feature amount extracting step of extracting a feature amount of a feature point detected from each learning image, and the query image A size calculation step of calculating a feature point actual size size corresponding to the special object and feature point detected from each of the learning images, and a feature point actual size size corresponding to the special object and feature point detected from the respective learning images; A feature amount matching step that compares the feature amount of the feature point with the feature amount of the feature point of each learning image, and in the feature amount matching step, And the feature quantity of the feature point of the query image feature points actual size are similar, comparing the feature quantity of the feature point of each learning image.
 前記の構成によれば、態様1に係る画像認識装置と同様の効果を奏することができる。 According to the above configuration, the same effects as those of the image recognition device according to aspect 1 can be obtained.
 本発明の各態様に係る画像認識装置10,30は、コンピュータによって実現してもよく、この場合には、コンピュータを上記画像認識装置30が備える各部(ソフトウェア要素)として動作させることにより上記画像認識装置30をコンピュータにて実現させる画像認識装置10,30の画像認識プログラム、およびそれを記録したコンピュータ読み取り可能な記録媒体も、本発明の一態様の範疇に入る。 The image recognition apparatuses 10 and 30 according to each aspect of the present invention may be realized by a computer. In this case, the image recognition is performed by causing the computer to operate as each unit (software element) included in the image recognition apparatus 30. An image recognition program of the image recognition apparatuses 10 and 30 that realize the apparatus 30 by a computer and a computer-readable recording medium on which the image recognition program is recorded also fall within the category of one aspect of the present invention.
 本発明は上述した各実施形態に限定されるものではなく、請求項に示した範囲で種々の変更が可能であり、異なる実施形態にそれぞれ開示された技術的手段を適宜組み合わせて得られる実施形態についても本発明の一態様の技術的範囲に含まれる。さらに、各実施形態にそれぞれ開示された技術的手段を組み合わせることにより、新しい技術的特徴を形成することができる。
(関連出願の相互参照)
 本出願は、2016年4月14日に出願された日本国特許出願:特願2016-081433に対して優先権の利益を主張するものであり、それを参照することにより、その内容の全てが本書に含まれる。
The present invention is not limited to the above-described embodiments, and various modifications are possible within the scope shown in the claims, and embodiments obtained by appropriately combining technical means disclosed in different embodiments. Is also included in the technical scope of one embodiment of the present invention. Furthermore, a new technical feature can be formed by combining the technical means disclosed in each embodiment.
(Cross-reference of related applications)
This application claims the benefit of priority to the Japanese patent application filed on April 14, 2016: Japanese Patent Application No. 2016-081433. By referring to it, all of its contents Included in this document.
 10,30 画像認識装置
 12 特殊物体検出部(検出部)
 13 特徴量抽出部(抽出部)
 14,32 実寸サイズ算出部(画像認識部)
 17 特徴量照合部(画像認識部)
 19 出力部
 31 奥行き構造推定部
10, 30 Image recognition device 12 Special object detector (detector)
13 Feature extraction unit (extraction unit)
14, 32 Actual size calculation unit (image recognition unit)
17 Feature amount matching unit (image recognition unit)
19 Output unit 31 Depth structure estimation unit

Claims (11)

  1.  クエリ画像に含まれる物体を識別する画像認識装置であって、
     前記クエリ画像から、予め定められた、標準的な大きさが既知である特殊物体を検出し、当該クエリ画像における当該特殊物体に対応する領域の実寸サイズを示す情報を検出する検出部と、
     前記クエリ画像から、前記クエリ画像の特徴点を抽出する抽出部と、
     前記クエリ画像における前記特殊物体に対応する領域の実寸サイズを示す情報、および、前記クエリ画像の特徴点に基づいて、前記クエリ画像に含まれる物体を識別する画像認識部と
    を備えていることを特徴とする画像認識装置。
    An image recognition device for identifying an object included in a query image,
    A detection unit that detects a predetermined special object having a known standard size from the query image, and detects information indicating an actual size of an area corresponding to the special object in the query image;
    An extraction unit for extracting feature points of the query image from the query image;
    An image recognizing unit for identifying an object included in the query image based on information indicating an actual size of a region corresponding to the special object in the query image and a feature point of the query image. A featured image recognition apparatus.
  2.  前記画像認識部は、前記クエリ画像に含まれる物体を、各学習画像に含まれる物体から選択することを特徴とする請求項1に記載の画像認識装置。 The image recognition apparatus according to claim 1, wherein the image recognition unit selects an object included in the query image from objects included in each learning image.
  3.  前記画像認識部は、前記クエリ画像における前記特殊物体に対応する領域の実寸サイズを示す情報に基づいて、前記各学習画像の特徴点から、候補特徴点を抽出し、前記クエリ画像の特徴点と、当該候補特徴点とを照合することにより、前記クエリ画像に含まれる物体を前記各学習画像に含まれる物体から選択することを特徴とする請求項2に記載の画像認識装置。 The image recognition unit extracts candidate feature points from the feature points of each learning image based on information indicating the actual size of the region corresponding to the special object in the query image, and the feature points of the query image The image recognition apparatus according to claim 2, wherein an object included in the query image is selected from objects included in the learning images by collating the candidate feature points.
  4.  前記検出部は、前記各学習画像から、前記各学習画像における前記特殊物体に対応する領域の実寸サイズを示す情報を検出し、
     前記抽出部は、前記各学習画像から、前記各学習画像の特徴点を抽出し、
     前記画像認識部は、
      前記各学習画像における前記特殊物体に対応する領域の実寸サイズを示す情報に基づいて、前記各学習画像の特徴点の特徴点実寸サイズを算出し、
      前記クエリ画像における前記特殊物体に対応する領域の実寸サイズを示す情報に基づいて、前記クエリ画像の特徴点の特徴点実寸サイズを算出し、
      前記各学習画像の特徴点の特徴点実寸サイズと、前記クエリの特徴点の特徴点実寸サイズとを比較することにより、前記各学習画像の特徴点から、前記候補特徴点を抽出することを特徴とする請求項3に記載の画像認識装置。
    The detection unit detects, from each learning image, information indicating an actual size of a region corresponding to the special object in each learning image,
    The extraction unit extracts feature points of the learning images from the learning images,
    The image recognition unit
    Based on the information indicating the actual size of the area corresponding to the special object in each learning image, the feature point actual size of the feature point of each learning image is calculated,
    Based on the information indicating the actual size of the area corresponding to the special object in the query image, the feature point actual size of the feature point of the query image is calculated,
    The candidate feature points are extracted from the feature points of each learning image by comparing the feature point actual size size of the feature point of each learning image with the feature point actual size size of the feature point of the query. The image recognition apparatus according to claim 3.
  5.  前記画像認識部は、前記特殊物体の領域内の所定の点から前記特徴点の各々までの画素数と、当該特殊物体の領域における1画素あたりの実寸サイズを示す特殊物体解像度と、前記特徴点のスケールとを用いて、前記特徴点実寸サイズの各々を算出する
    ことを特徴とする請求項4に記載の画像認識装置。
    The image recognition unit includes a number of pixels from a predetermined point in the special object region to each of the feature points, a special object resolution indicating an actual size per pixel in the special object region, and the feature points. The image recognition apparatus according to claim 4, wherein each of the actual size of the feature points is calculated using a scale.
  6.  前記所定の点は、前記特殊物体の領域内の中心、もしくは、前記特殊物体の領域内のいずれかの点である
    ことを特徴とする請求項5に記載の画像認識装置。
    The image recognition apparatus according to claim 5, wherein the predetermined point is any one of a center in the area of the special object and a point in the area of the special object.
  7.  前記クエリ画像および前記各学習画像の奥行き構造を推定する奥行き構造推定部をさらに備え、前記画像認識部において前記奥行き構造を用いて前記特徴点実寸サイズを補正することを特徴とする請求項4~6の何れか1項に記載の画像認識装置。 5. A depth structure estimation unit that estimates a depth structure of the query image and each learning image, and the image recognition unit corrects the actual size of the feature point using the depth structure. 7. The image recognition apparatus according to any one of items 6.
  8.  前記クエリ画像に含まれる物体を識別した結果と、前記特徴点実寸サイズとを出力する出力部をさらに備える
    ことを特徴とする請求項4~7の何れか1項に記載の画像認識装置。
    The image recognition apparatus according to any one of claims 4 to 7, further comprising an output unit that outputs a result of identifying an object included in the query image and the actual size of the feature point.
  9.  クエリ画像に含まれる物体を識別する画像認識方法であって、
     前記クエリ画像から、予め定められた、標準的な大きさが既知である特殊物体を検出し、当該クエリ画像における当該特殊物体に対応する領域の実寸サイズを示す情報を検出する検出ステップと、
     前記クエリ画像から、前記クエリ画像の特徴点を抽出する抽出ステップと、
     前記クエリ画像における前記特殊物体に対応する領域の実寸サイズを示す情報、および、前記クエリ画像の特徴点に基づいて、前記クエリ画像に含まれる物体を識別する画像認識ステップとを含む
    ことを特徴とする画像認識方法。
    An image recognition method for identifying an object included in a query image,
    A detection step of detecting a special object having a predetermined standard size known from the query image and detecting information indicating an actual size of an area corresponding to the special object in the query image;
    An extraction step of extracting feature points of the query image from the query image;
    And an image recognition step for identifying an object included in the query image based on information indicating an actual size of a region corresponding to the special object in the query image and a feature point of the query image. Image recognition method.
  10.  請求項1に記載の画像認識装置としてコンピュータを機能させるための画像認識プログラムであって、前記検出部、前記抽出部、および前記画像認識部としてコンピュータを機能させるための画像認識プログラム。 An image recognition program for causing a computer to function as the image recognition apparatus according to claim 1, wherein the image recognition program causes the computer to function as the detection unit, the extraction unit, and the image recognition unit.
  11.  請求項10に記載の画像認識プログラムを記録したコンピュータ読み取り可能な記録媒体。 A computer-readable recording medium on which the image recognition program according to claim 10 is recorded.
PCT/JP2017/015390 2016-04-14 2017-04-14 Image recognition device, image recognition method, and image recognition program WO2017179728A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2016-081433 2016-04-14
JP2016081433 2016-04-14

Publications (1)

Publication Number Publication Date
WO2017179728A1 true WO2017179728A1 (en) 2017-10-19

Family

ID=60041780

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2017/015390 WO2017179728A1 (en) 2016-04-14 2017-04-14 Image recognition device, image recognition method, and image recognition program

Country Status (1)

Country Link
WO (1) WO2017179728A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2020085805A (en) * 2018-11-30 2020-06-04 Arithmer株式会社 Dimension data calculation device, program, method, product manufacturing device, and product manufacturing system
US11922649B2 (en) 2018-11-30 2024-03-05 Arithmer Inc. Measurement data calculation apparatus, product manufacturing apparatus, information processing apparatus, silhouette image generating apparatus, and terminal apparatus

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002222423A (en) * 2001-01-25 2002-08-09 Fujitsu Ltd Method and device for object recognition
JP2010271861A (en) * 2009-05-20 2010-12-02 Canon Inc Object identification device and object identification method
WO2015099016A1 (en) * 2013-12-26 2015-07-02 日本電気株式会社 Image processing device, subject identification method and program
JP2015191626A (en) * 2014-03-28 2015-11-02 富士重工業株式会社 Outside-vehicle environment recognition device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002222423A (en) * 2001-01-25 2002-08-09 Fujitsu Ltd Method and device for object recognition
JP2010271861A (en) * 2009-05-20 2010-12-02 Canon Inc Object identification device and object identification method
WO2015099016A1 (en) * 2013-12-26 2015-07-02 日本電気株式会社 Image processing device, subject identification method and program
JP2015191626A (en) * 2014-03-28 2015-11-02 富士重工業株式会社 Outside-vehicle environment recognition device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2020085805A (en) * 2018-11-30 2020-06-04 Arithmer株式会社 Dimension data calculation device, program, method, product manufacturing device, and product manufacturing system
US11922649B2 (en) 2018-11-30 2024-03-05 Arithmer Inc. Measurement data calculation apparatus, product manufacturing apparatus, information processing apparatus, silhouette image generating apparatus, and terminal apparatus

Similar Documents

Publication Publication Date Title
Aldoma et al. Multimodal cue integration through hypotheses verification for rgb-d object recognition and 6dof pose estimation
US9984280B2 (en) Object recognition system using left and right images and method
JP4479478B2 (en) Pattern recognition method and apparatus
Sarfraz et al. Head Pose Estimation in Face Recognition Across Pose Scenarios.
JP6544900B2 (en) Object identification device, object identification method and program
Bak et al. Improving person re-identification by viewpoint cues
CN101147159A (en) Fast method of object detection by statistical template matching
JP5936561B2 (en) Object classification based on appearance and context in images
CN108399374B (en) Method and apparatus for selecting candidate fingerprint images for fingerprint identification
US10127681B2 (en) Systems and methods for point-based image alignment
US20140093142A1 (en) Information processing apparatus, information processing method, and information processing program
Ding et al. Recognition of hand-gestures using improved local binary pattern
WO2013181695A1 (en) Biometric verification
JP2009129237A (en) Image processing apparatus and its method
CN113297963A (en) Multi-person posture estimation method and device, electronic equipment and readable storage medium
CN106709915B (en) Image resampling operation detection method
KR101789979B1 (en) Method for calculating hausdorff distance based on gradient orientation information
KR20190018274A (en) Method and apparatus for recognizing a subject existed in an image based on temporal movement or spatial movement of a feature point of the image
Alavi et al. Multi-shot person re-identification via relational stein divergence
WO2017179728A1 (en) Image recognition device, image recognition method, and image recognition program
CN111476070A (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
CN105190689A (en) Image processing including adjoin feature based object detection, and/or bilateral symmetric object segmentation
US7113637B2 (en) Apparatus and methods for pattern recognition based on transform aggregation
JP5755516B2 (en) Object shape estimation device
JP2015007919A (en) Program, apparatus, and method of realizing high accuracy geometric inspection for images different in point of view

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17782541

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 17782541

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP