WO2017179728A1

WO2017179728A1 - Image recognition device, image recognition method, and image recognition program

Info

Publication number: WO2017179728A1
Application number: PCT/JP2017/015390
Authority: WO
Inventors: 郁子椿
Original assignee: シャープ株式会社
Priority date: 2016-04-14
Filing date: 2017-04-14
Publication date: 2017-10-19

Abstract

Provided is an image recognition device (10), comprising: a detection unit (12) which detects, from a query image, a predetermined particular object with an established standard size, and detects information which indicates the exact size of a region in the query image which corresponds to the particular object; an extraction unit (13) which extracts a feature point of the query image from the query image; and image recognition units (14, 17, 18) which recognize the object which is included in the query image on the basis of the information which indicates the actual size of the region in the query image which corresponds to the particular object and the feature point of the query image.

Description

Image recognition apparatus, image recognition method, and image recognition program

The following disclosure relates to an image recognition apparatus, an image processing method, and an image recognition program that perform image recognition using a local feature amount of an image.

Image recognition is a technique for identifying what an object is included in an image. Among image recognition, general object recognition is a technique for identifying a category of an object, and specific object recognition is a technique for searching for the same object from an image database.

Patent Document 1 discloses an image recognition method in which a feature vector is extracted from an image of an object, the object is expressed by a large number of feature vectors, and an object having a matching feature is searched from an image database. .

International Publication No. 2008/026414

In a conventional image recognition method including the technique described in Patent Document 1, a local invariant feature quantity that can extract the same feature quantity even when the scale of the image changes is used. However, in the conventional image recognition method using local feature quantities that do not change the scale, the size is completely different on an actual subject, but there is a problem that similar feature quantities correspond erroneously when the scale is changed. It was. As a result, there is a problem that different objects having similar textures in part are mistakenly recognized as the same object.

One aspect of the present invention has been made in view of the above circumstances, and an object thereof is to provide an image recognition apparatus, an image recognition method, and an image recognition program that are unlikely to cause erroneous recognition in image recognition. is there.

In order to solve the above-described problem, an image recognition apparatus according to an aspect of the present invention is an image recognition apparatus that identifies an object included in a query image, and is a standard image determined in advance from the query image. A special object having a known size is detected, a detection unit that detects information indicating an actual size of a region corresponding to the special object in the query image, and a feature point of the query image is extracted from the query image. An extraction unit; and an image recognition unit for identifying an object included in the query image based on information indicating an actual size of a region corresponding to the special object in the query image and a feature point of the query image. ing.

An image recognition method according to an aspect of the present invention is an image recognition method for identifying an object included in a query image, and a special standard size known in advance is known from the query image. A detection step of detecting an object and detecting information indicating an actual size of a region corresponding to the special object in the query image; an extraction step of extracting a feature point of the query image from the query image; and the query image And an image recognition step for identifying an object included in the query image on the basis of information indicating the actual size of the region corresponding to the special object and a feature point of the query image.

According to one aspect of the present invention, there is an effect that erroneous recognition in image recognition can be reduced.

1 is a block diagram illustrating a schematic configuration example of an image recognition device according to a first embodiment of the present invention. It is a figure which shows the image containing a face image. It is a figure for demonstrating the feature point actual size size. It is a flowchart for demonstrating the flow of the process in the image recognition apparatus which concerns on 1st embodiment of this invention. It is a flowchart for demonstrating the flow of the process in the image recognition apparatus which concerns on 1st embodiment of this invention. It is a block diagram which shows the example of schematic structure of the image recognition apparatus which concerns on 2nd embodiment of this invention. It is a flowchart for demonstrating the flow of a process in the image recognition apparatus which concerns on 2nd embodiment of this invention. It is a flowchart for demonstrating the flow of a process in the image recognition apparatus which concerns on 2nd embodiment of this invention.

Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the drawings, portions having the same function are denoted by the same reference numerals, and repeated description is omitted.

(First embodiment)
FIG. 1 is a block diagram showing a schematic configuration example of an image recognition apparatus 10 according to the first embodiment of the present invention. As shown in FIG. 1, an image recognition apparatus 10 according to the present embodiment includes a learning image input unit 11 that inputs a learning image including an image of an object to be compared, a special object detection unit 12 that performs object detection related to a special object, A feature amount extraction unit 13 that extracts a feature amount from an image, an actual size calculation unit 14 that calculates a feature point actual size, a storage unit 15 that stores a database, and a query image including an image of an object to be recognized are input. A query image input unit 16, a feature amount matching unit 17 that matches feature amounts, an object ID selection unit 18 that selects an object ID using a matching result, and an output unit 19 that outputs a selection result of the object ID. I have. The actual size calculation unit 14, the feature amount matching unit 17, and the object ID selection unit 18 may be referred to as an image recognition unit.

The image recognition apparatus 10 performs object recognition by performing image recognition method processing in two phases, a learning phase and an identification phase. In the learning phase, the image recognition apparatus 10 performs a process of inputting a plurality of learning images and creating a database. In the identification phase, the image recognition device 10 inputs a query image and performs a process of identifying an image similar to the query image from the learning image input in the learning phase. Hereinafter, after describing the learning phase, the identification phase will be described.

In the learning phase, the learning image input unit 11 captures a learning image from the outside, and outputs the captured learning image to the special object detection unit 12 and the feature amount extraction unit 13. Each learning image is given an object ID in advance, and the learning image input unit 11 also captures the object ID from the outside and outputs it to the feature amount extraction unit 13. The learning image input unit 11 may capture an image of an image file recorded in the recording device, or may capture an image acquired through a network. The image captured by the learning image input unit 11 may be a still image or a moving image. When the captured image is a moving image, the learning image input unit 11 decomposes the moving image into frame images, and sequentially outputs the decomposed frame images to the special object detection unit 12 and the feature amount extraction unit 13.

The special object detection unit 12 first performs a special object detection process on the image received from the learning image input unit 11. The special object is an object having an image recognition method for recognizing the object, such as a human face or a car license plate, and an object whose actual size is roughly determined. The special object detection process is a process of detecting whether or not a special object designated in advance is included in the image and detecting information indicating the actual size of the detected special object region.

Hereinafter, the case where a face is designated as a special object will be described as an example. When the special object is a face, the special object detection unit 12 detects whether a face is included in the image as the special object detection process, and detects the face area size as the special object area size.

FIG. 2 shows an image including a face image which is an example of a special object. The image A1 shown in FIG. 2 includes an image of the subject's face A2. Here, the face area size is a value indicating the size of the area of the face A2 in the entire image A1, and for example, the width of the area A3 in which the area of the face A2 in FIG. It is assumed that the number of pixels is A4) in FIG. In the following, for simplification of description, the face area size is set to the number of pixels of the width A4 of the area A3. Note that such face detection processing can be performed using, for example, the Viola-Jones method.

Next, the special object detection unit 12 calculates the special object resolution using the detected special object region size. The special object resolution is a value indicating how many mm of an area of the surface of the special object each pixel represents in the special object area, and is calculated by dividing the actual size of the special object by the special object area size. . However, since the actual size of the special object is unknown, a standard special object size that is known in advance is used. For example, when the special object is a face, an average face width of 160 mm is used. Using this value, for example, when the face area size (width of the face area) is 320 pixels, the special object resolution is calculated as 0.5 [mm / pixel] (= 160 [mm] / 320 [pixel]). Is done. The special object detection unit 12 outputs the obtained special object resolution and the center coordinates of the special object region to the actual size calculation unit 14 as information indicating the actual size of the special object region. When a plurality of special objects are detected, the special object detection unit 12 calculates the special object resolution for each special object region, and the obtained special object resolution and the center of the special object region are calculated. The coordinates are output to the actual size calculation unit 14. When no special object is detected, the special object detection unit 12 outputs a signal indicating that no special object exists to the actual size calculation unit 14.

The feature amount extraction unit 13 extracts a local feature amount of each learning image received from the learning image input unit 11. For example, the case where SIFT (Scale-Invariant Feature Transform) is used as a method for extracting local feature amounts will be described below. First, the feature quantity extraction unit 13 detects feature points using SIFT and obtains the coordinates of each feature point. When detecting feature points, the feature amount extraction unit 13 calculates the scale of feature points at the same time. The scale is a value indicating at which resolution the feature point is obtained. Next, the feature quantity extraction unit 13 calculates a SIFT feature quantity for each obtained feature point. The SIFT feature amount is a 128-dimensional vector obtained from the luminance gradient in the peripheral area of the feature point. The feature quantity extraction unit 13 outputs the SIFT feature quantity, coordinates, and scale of each feature point to the actual size calculation unit 14. The feature quantity extraction unit 13 also outputs the object ID received from the learning image input unit 11 to the actual size calculation unit 14 together.

The actual size calculation unit 14 calculates the feature point actual size size of each feature point using the special object resolution received from the special object detection unit 12 and the scale received from the feature amount extraction unit 13. The feature point actual size size is a value indicating the size of the texture characterizing the feature point on the actual subject.

Here, how to obtain the actual size of the feature point will be described in detail. For example, the feature point actual size is a value obtained by multiplying the number of pixels on the actual subject by the line number connecting the center point of the special object region to each feature point and the actual object resolution. In addition, as another example of the actual size of the feature point, the number of pixels on the actual subject other than the center point of the special object area and the line segment connecting any point in the special object area to each feature point In addition, a value obtained by multiplying the special object resolution may be used as the actual feature point size.

FIG. 3 is a diagram for explaining the actual size of feature points. In FIG. 3, for example, when the face area is a special object area B1, the center point of the special object area B1 is a point B3. Further, the feature point B2 can be detected using SIFT for each pixel included in the special object region B1. In the present embodiment, the size of the segment B4 connecting the point B3 to the detected feature point B2 is the actual feature point size of the feature point B2. As described above, in this embodiment, since the face area size is the number of pixels having the width A4 of the area A3, the size of the line segment B4 is also calculated as the horizontal size. However, even when the face area size is not the horizontal direction but the vertical direction, the actual size of the feature points can be obtained in the same manner by considering the horizontal direction in the vertical direction. Furthermore, even when the face region size is a two-dimensional size in the horizontal and vertical directions, the actual feature point size can be obtained in the same manner by calculating the vertical and horizontal sizes.

By doing as described above, the image recognition apparatus 10 detects a special object whose actual size can be estimated even when the actual size of the texture is not known, and uses the size to make a feature The point size can be estimated. In the present embodiment, the feature amount extraction unit 13 obtains feature points by SIFT. The feature point by SIFT has the concept of scale, and in order to consider this, the actual size calculation unit 14 obtains the final feature point actual size by multiplying the actual feature size obtained above by the scale. Calculate as

Next, a case where a plurality of special objects are detected in the special object detection unit 12 will be described. In this case, the actual size calculation unit 14 compares the coordinates of the feature points received from the feature amount extraction unit 13 with the center coordinates of each special object region received from the special object detection unit 12, and the special object region with the closest distance. Is used to calculate the actual size of the feature point.

Further, when a signal indicating that no special object exists is received from the special object detection unit 12, the actual size calculation unit 14 sets the actual size of the feature point to zero.

Finally, the actual size calculation unit 14 outputs the feature amount of each feature point and the feature point actual size to the storage unit 15. The actual size calculation unit 14 also outputs the object ID received from the feature amount extraction unit 13 to the storage unit 15.

The storage unit 15 stores a database. The database is composed of fields (columns) that hold feature point IDs, object IDs, feature amounts, and feature point actual size sizes, and one record (row) is data related to one feature point. When the storage unit 15 receives the feature amount, the feature point actual size size, and the object ID related to one feature point from the actual size calculation unit 14, the storage unit 15 creates a feature point ID and stores it in the feature point ID field of the storage unit 15. To remember. The feature point ID is an arbitrary value that varies depending on the feature point. The feature amount, the actual feature point size, the object ID, and the feature point ID are stored in each field of the storage unit 15. Note that object IDs of a plurality of feature points obtained from one learning image have the same value.

An example of the processing flow of the image recognition method in the learning phase will be described with reference to FIG.

First, as step S1, the learning image input unit 11 inputs a learning image and an object ID.

Next, as step S2, the special object detection unit 12 detects a special object.

Thereafter, as step S3, the feature quantity extraction unit 13 detects feature points and extracts feature quantities of the feature points.

Finally, as step S4, the actual size calculation unit 14 calculates the feature point actual size of each feature point, and combines the feature point actual size, the feature point ID, the object ID, and the feature amount together with the storage unit 15. Remember me. In the learning phase, the image recognition apparatus 10 performs this process on each learning image.

Next, the identification phase will be described. In the identification phase, the query image input unit 16 captures a query image from the outside and outputs it to the special object detection unit 12. The query image input unit 16 may capture an image file recorded in the recording device as a query image, or may capture an image acquired through a network as a query image.

The special object detection unit 12 performs the same processing as in the learning phase on the image received from the query image input unit 16 and outputs the special object resolution to the actual size calculation unit 14. When a plurality of special objects are detected, the special object detection unit 12 calculates the special object resolution for each special object area, and the obtained special object resolution is calculated using the special object area. The actual size is output to the actual size calculator 14 as information indicating the actual size. When no special object is detected, the special object detection unit 12 outputs a signal indicating that no special object exists to the actual size calculation unit 14.

The feature amount extraction unit 13 performs the same processing as that in the learning phase on the query image to calculate the feature amount. Then, the feature amount extraction unit 13 outputs the feature amount, coordinates, and scale of each feature point to the actual size calculation unit 14.

The actual size calculation unit 14 performs the same processing as in the learning phase using the special object resolution received from the special object detection unit 12 and the scale received from the feature amount extraction unit 13, and calculates the actual feature size size. To do. Then, the actual size calculation unit 14 outputs the feature amount of each feature point and the feature point actual size size to the feature amount matching unit 17.

The feature amount collation unit 17 collates the feature amount of each feature point of the query image received from the actual size calculation unit 14 with the feature amount in the database of the storage unit 15. This feature amount is a feature amount of a feature point detected from each learning image. The verification is performed according to the following procedure. First, the feature amount matching unit 17 compares the feature point actual size size between the feature point of the query image and all feature points in the database, and the absolute value of the difference between the feature point actual size sizes is within a predetermined threshold value. Are extracted from the database as candidate feature points. At this time, when the feature point actual size of the feature point of the query image is 0, that is, when the special object detection unit 12 does not detect the special object, the feature amount matching unit 17 performs all of the features in the database. Extract points as candidate feature points. If the feature point actual size size of the feature point of the query image is not 0 and the feature point actual size size of a feature point in the database is 0, the feature amount matching unit 17 also selects the feature point in the database as a candidate feature. Extract as a point. Next, the feature amount matching unit 17 obtains the distance of the feature amount (vector) between the feature point of the query image and each candidate feature point. Then, the feature amount matching unit 17 selects a candidate feature point having the smallest distance, examines the object ID, and casts one vote on the object. The feature amount matching unit 17 performs this procedure for each feature point of the query image. The feature amount matching unit 17 outputs the number of votes for each object ID to the object ID selecting unit 18.

The object ID selection unit 18 selects the object ID having the maximum number of votes from the number of votes for each object ID received from the feature amount matching unit 17 and outputs it to the output unit 19 as a recognition result. The output unit 19 outputs the object ID received from the object ID selection unit 18 to the outside of the image recognition device 10.

An example of the processing flow of the image recognition method in the identification phase will be described with reference to FIG.

First, as step S21, the query image input unit 16 inputs a query image.

Next, as step S22, the special object detection unit 12 detects a special object.

Thereafter, in step S23, the feature amount extraction unit 13 detects feature points and extracts feature amounts of the feature points.

Next, in step S24, the actual size calculation unit 14 calculates the actual feature size of each feature point.

Thereafter, as step S25, the feature amount matching unit 17 extracts candidate feature points from the storage unit 15 for each feature point of the query image.

Next, as step S26, the feature amount matching unit 17 obtains a feature point having the smallest distance between the feature amounts from the candidate feature points for each feature point of the query image, and the object ID of the feature point. Vote for

Thereafter, in step S27, the object ID selection unit 18 selects the object ID having the maximum number of votes in step S26 as a recognition result.

Thereafter, in step S28, the output unit 19 outputs the recognition result.

The effect of reducing misrecognition in this embodiment will be described below. In the conventional feature matching mechanism, the feature point actual size size is not taken into consideration, so the feature point having the smallest distance is selected from all the feature points in the database. Therefore, a feature point having a completely different feature amount size on the actual subject may be selected and voted for an object ID different from the object ID to be voted. According to the present embodiment, candidate feature points are extracted based on the actual feature point actual size, and the feature point having the smallest distance is selected from the candidate feature points. There is an effect that it becomes difficult to select completely different feature points. As a result, there is an effect that it is difficult for the object ID selection unit 18 to select an incorrect object ID.

In the above example, the feature quantity extraction unit 13 employs a method of determining feature point coordinates by detecting feature points. However, the present invention is not limited to this, and the feature point coordinates may be determined by other methods. For example, the feature quantity extraction unit 13 may pre-arrange the coordinates of the feature points at grid-like positions and extract the feature quantities at the coordinates.

In the above-described example, the SIFT feature amount is used as the feature amount. However, the present invention is not limited to this, and other local feature amounts such as SURF and HOG may be used.

In the above-described example, the feature amount of each feature point is stored in the storage unit 15 as it is. However, the present invention is not limited to this, and vector quantized data is recorded as a feature amount using the Bag-of-Features method. May be. In this case, clustering is performed by incorporating the actual feature point size together with the feature amount in the Bag-of-Features clustering process.

Further, in the above-described example, the feature amount matching unit 17 calculates the candidate feature point that minimizes the distance between the feature amounts between the feature points of the query image from the candidate feature points. The candidate feature point whose distance is approximately the minimum may be obtained. For example, the feature amount matching unit 17 can perform an approximate nearest neighbor search by using an ANN (Appearance Nearest Neighbor). Thereby, calculation time can be shortened.

In the above example, the output unit 19 outputs only the recognition result. However, the feature point actual size may be output together with the recognition result. This may output the feature point actual size size of all the feature points of the query image, or may output a statistically obtained value such as an average value using the feature point actual size size.

Also, the above-described example can be used for both specific object recognition and general object recognition.

As described above, according to the image recognition apparatus 10 of the present embodiment, erroneous recognition in image recognition can be reduced by considering the size of the feature amount on the actual subject.

(Second embodiment)
FIG. 6 is a block diagram illustrating a schematic configuration example of the image recognition apparatus 30 according to the second embodiment of the present invention. As illustrated in FIG. 6, the image recognition apparatus 30 according to the present embodiment includes a learning image input unit 11, a special object detection unit 12, a depth structure estimation unit 31 that calculates the composition of an image, and a feature amount extraction unit 13. , An actual size calculation unit 32 for calculating the actual size of the feature point, a storage unit 15, a query image input unit 16, a feature amount verification unit 17, an object ID selection unit 18, and an output unit 19.

In the learning phase, the depth structure estimation unit 31 receives the learning image from the special object detection unit 12, and estimates the depth structure of the learning image. The depth structure is a structure composed of depth values indicating the distance between the subject and the camera for each pixel. This depth structure can be estimated using, for example, the technique disclosed in Japanese Patent Application Laid-Open No. 2005-151534. In this technology, the composition ratio is determined according to the value of the high-frequency component evaluation value of the luminance signal of the non-stereo image input from the high-frequency component evaluation unit at the top of the screen and the high-frequency component evaluation unit at the bottom of the screen. Three types of basic depth models are synthesized according to the ratio. Further, the synthesized basic depth model is superimposed on the R signal of the non-stereo image to obtain final depth estimation data. The depth structure estimation unit 31 outputs the estimated depth structure to the actual size calculation unit 32.

The actual size calculation unit 32 first uses the special object resolution received from the special object detection unit 12 and the scale received from the feature amount extraction unit 13 in the same manner as the actual size size calculation unit 14 of the first embodiment. The actual feature point size of each feature point is calculated. Thereafter, the actual size calculation unit 32 corrects the feature point actual size using the depth structure received from the depth structure estimation unit 31. The correction of the actual size of the feature point is performed by multiplying the actual size of the feature point by p / q, where p is the depth value at the center point of the special object region used to calculate the actual size of the feature point and q is the depth value of the feature point. It is better to do it. As a result, when the special object and each feature point are at different depths, the size in the captured image that occurs when the distance at which the special object and each feature point are imaged is different. Changes can be corrected. The correction is not limited to this example, and may be performed by other methods such as reducing (qp).

First, as step S41, the learning image input unit 11 inputs a learning image and an object ID.

Next, as step S42, the special object detection unit 12 detects a special object.

Furthermore, as step S43, the depth structure estimation unit 31 estimates the depth structure.

Thereafter, in step S44, the feature quantity extraction unit 13 detects feature points and extracts feature quantities of the feature points.

Finally, in step S45, the actual size calculation unit 32 calculates the actual feature size of each feature point, and corrects the actual feature size using the depth structure.

The actual size calculation unit 32 stores the corrected feature point actual size, the feature point ID, the object ID, and the feature amount in the storage unit 15 together. In the learning phase, the image recognition device 30 performs this processing for each learning image.

Next, the identification phase will be described. In the identification phase, the depth structure estimation unit 31 receives the query image from the special object detection unit 12, performs the same processing as in the learning phase, and estimates the depth structure of the query image. The depth structure estimation unit 31 outputs the estimated depth structure to the actual size calculation unit 32.

First, the actual size calculation unit 32 uses the special object resolution received from the special object detection unit 12 and the scale received from the feature amount extraction unit 13 to perform the same processing as in the learning phase, and the actual feature size actual size. Is calculated. Thereafter, the actual size calculation unit 32 corrects the actual size of the feature point using the depth structure received from the depth structure estimation unit 31, as in the learning phase. Then, the actual size calculation unit 32 outputs the feature amount of each feature point and the corrected feature point actual size size to the feature amount matching unit 17.

First, as step S51, a query image is input to the special object detection unit 12 and the feature amount extraction unit 13 through the query image input unit 16.

Next, as step S52, the special object detection unit 12 detects a special object.

Furthermore, as step S53, the depth structure estimation unit 31 estimates the depth structure.

Thereafter, in step S54, the feature amount extraction unit 13 detects feature points and extracts feature amounts of the feature points.

Next, in step S55, the actual size calculation unit 32 calculates the actual feature size of each feature point, and corrects the actual feature size using the depth structure.

Thereafter, in step S56, the feature amount matching unit 17 extracts candidate feature points from the storage unit 15 using the corrected actual feature size for each feature point of the query image.

Next, as step S57, the feature amount matching unit 17 obtains a feature point having the smallest distance between the feature amounts from the candidate feature points for each feature point of the query image, and the object ID of the feature point. Vote for

Thereafter, in step S58, the object ID selection unit 18 selects the object ID having the maximum number of votes in step S57 as the recognition result. Thereafter, in step S59, the output unit 19 outputs the recognition result.

As described above, according to the image recognition device 30 of the present embodiment, the actual size calculation unit 32 generates an in-captured image that occurs when the distance at which the special object and each feature point are captured is different. The change in size can be corrected. For this reason, erroneous recognition can be further reduced.

(Example of software implementation)
The control blocks of the image recognition devices 10 and 30 (in particular, the special object detection unit 12, the feature amount extraction unit 13, the actual

size calculation units

14 and 32, the feature amount comparison unit 17, the output unit 19, and the depth structure estimation unit 31) It may be realized by a logic circuit (hardware) formed in an integrated circuit (IC chip) or the like, or may be realized by software using a CPU (Central Processing Unit).

In the latter case, the image recognition device 30 includes a CPU that executes instructions of a program that is software that realizes each function, and a ROM (Read Only Memory) in which the program and various data are recorded so as to be readable by a computer (or CPU). Alternatively, a storage device (these are referred to as “recording media”), a RAM (Random Access Memory) for expanding the program, and the like are provided. The computer (or CPU) reads the program from the recording medium and executes the program, thereby achieving the object of one embodiment of the present invention. As the recording medium, a “non-temporary tangible medium” such as a tape, a disk, a card, a semiconductor memory, a programmable logic circuit, or the like can be used. The program may be supplied to the computer via an arbitrary transmission medium (such as a communication network or a broadcast wave) that can transmit the program. Note that one embodiment of the present invention can also be realized in the form of a data signal embedded in a carrier wave, in which the program is embodied by electronic transmission.

[Summary]
The image recognition devices 10 and 30 according to the first aspect of the present invention are image recognition devices 10 and 30 that select an image similar to a query image from each learning image, and are specially selected in advance from the query image and each learning image. A special object detection unit 12 that detects a special object determined as an object, a feature amount of a feature point detected from the query image, and a feature amount of a feature point detected from each learning image The actual size for calculating the feature point actual size according to the extraction unit 13 and the special object and feature point detected from the query image, and the feature point actual size according to the special object and feature point detected from each learning image A

size calculation unit

14, 32, and a feature amount matching unit 17 that compares the feature amount of the feature point of the query image with the feature amount of the feature point of each learning image. 7, the feature point actual size is compared with the feature quantity of the feature point of the query image that is similar, the feature quantity of the feature point of each learning image.

According to the above configuration, erroneous recognition in image recognition can be reduced.

In the image recognition apparatuses 10 and 30 according to the aspect 2 of the present invention, the actual size calculation units 4 and 32 are the number of pixels from a predetermined point in the special object region to each of the feature points. And the special object resolution indicating the actual size per pixel in the special object area calculated by the special object detection unit 12 and the scale of the characteristic points calculated by the feature amount extraction unit 13. Each of the actual size may be calculated.

According to the above configuration, since both the actual size per pixel in the area of the special object and the feature point scale are used, erroneous recognition is unlikely to occur even when the scale of the image changes.

In the image recognition devices 10 and 30 according to aspect 3 of the present invention, in the aspect 2, the predetermined point is either the center in the area of the special object or the point in the area of the special object. May be.

According to the above configuration, a reference point that matches the special object can be selected.

The image recognition device 30 according to aspect 4 of the present invention further includes a depth structure estimation unit 31 that estimates the depth structure of the query image and each learning image in any one of the aspects 1 to 3, and the actual size. The size calculation unit 32 may correct the actual size of the feature points using the depth structure.

According to the above configuration, erroneous recognition can be further reduced.

In any one of the above aspects 1 to 4, the image recognition devices 10 and 30 according to the aspect 5 of the present invention include a selection result obtained by selecting an image similar to the query image from the learning images, and the actual feature point actual size. You may further provide the output part 19 which outputs size.

According to the above configuration, the size of the object included in the image on the subject can be known.

An image recognition method according to aspect 6 of the present invention is an image recognition method for selecting an image similar to a query image from each learning image, and is previously determined as a special object from the query image and each learning image. A special object detecting step of detecting a special object, a feature amount of a feature point detected from the query image, a feature amount extracting step of extracting a feature amount of a feature point detected from each learning image, and the query image A size calculation step of calculating a feature point actual size size corresponding to the special object and feature point detected from each of the learning images, and a feature point actual size size corresponding to the special object and feature point detected from the respective learning images; A feature amount matching step that compares the feature amount of the feature point with the feature amount of the feature point of each learning image, and in the feature amount matching step, And the feature quantity of the feature point of the query image feature points actual size are similar, comparing the feature quantity of the feature point of each learning image.

According to the above configuration, the same effects as those of the image recognition device according to aspect 1 can be obtained.

The image recognition apparatuses 10 and 30 according to each aspect of the present invention may be realized by a computer. In this case, the image recognition is performed by causing the computer to operate as each unit (software element) included in the image recognition apparatus 30. An image recognition program of the image recognition apparatuses 10 and 30 that realize the apparatus 30 by a computer and a computer-readable recording medium on which the image recognition program is recorded also fall within the category of one aspect of the present invention.

The present invention is not limited to the above-described embodiments, and various modifications are possible within the scope shown in the claims, and embodiments obtained by appropriately combining technical means disclosed in different embodiments. Is also included in the technical scope of one embodiment of the present invention. Furthermore, a new technical feature can be formed by combining the technical means disclosed in each embodiment.
(Cross-reference of related applications)
This application claims the benefit of priority to the Japanese patent application filed on April 14, 2016: Japanese Patent Application No. 2016-081433. By referring to it, all of its contents Included in this document.

10, 30 Image recognition device 12 Special object detector (detector)
13 Feature extraction unit (extraction unit)
14, 32 Actual size calculation unit (image recognition unit)
17 Feature amount matching unit (image recognition unit)
19 Output unit 31 Depth structure estimation unit

Claims

An image recognition device for identifying an object included in a query image,
A detection unit that detects a predetermined special object having a known standard size from the query image, and detects information indicating an actual size of an area corresponding to the special object in the query image;
An extraction unit for extracting feature points of the query image from the query image;
An image recognizing unit for identifying an object included in the query image based on information indicating an actual size of a region corresponding to the special object in the query image and a feature point of the query image. A featured image recognition apparatus.
The image recognition apparatus according to claim 1, wherein the image recognition unit selects an object included in the query image from objects included in each learning image.
The image recognition unit extracts candidate feature points from the feature points of each learning image based on information indicating the actual size of the region corresponding to the special object in the query image, and the feature points of the query image The image recognition apparatus according to claim 2, wherein an object included in the query image is selected from objects included in the learning images by collating the candidate feature points.
The detection unit detects, from each learning image, information indicating an actual size of a region corresponding to the special object in each learning image,
The extraction unit extracts feature points of the learning images from the learning images,
The image recognition unit
Based on the information indicating the actual size of the area corresponding to the special object in each learning image, the feature point actual size of the feature point of each learning image is calculated,
Based on the information indicating the actual size of the area corresponding to the special object in the query image, the feature point actual size of the feature point of the query image is calculated,
The candidate feature points are extracted from the feature points of each learning image by comparing the feature point actual size size of the feature point of each learning image with the feature point actual size size of the feature point of the query. The image recognition apparatus according to claim 3.
The image recognition unit includes a number of pixels from a predetermined point in the special object region to each of the feature points, a special object resolution indicating an actual size per pixel in the special object region, and the feature points. The image recognition apparatus according to claim 4, wherein each of the actual size of the feature points is calculated using a scale.
The image recognition apparatus according to claim 5, wherein the predetermined point is any one of a center in the area of the special object and a point in the area of the special object.
5. A depth structure estimation unit that estimates a depth structure of the query image and each learning image, and the image recognition unit corrects the actual size of the feature point using the depth structure. 7. The image recognition apparatus according to any one of items 6.
The image recognition apparatus according to any one of claims 4 to 7, further comprising an output unit that outputs a result of identifying an object included in the query image and the actual size of the feature point.
An image recognition method for identifying an object included in a query image,
A detection step of detecting a special object having a predetermined standard size known from the query image and detecting information indicating an actual size of an area corresponding to the special object in the query image;
An extraction step of extracting feature points of the query image from the query image;
And an image recognition step for identifying an object included in the query image based on information indicating an actual size of a region corresponding to the special object in the query image and a feature point of the query image. Image recognition method.
An image recognition program for causing a computer to function as the image recognition apparatus according to claim 1, wherein the image recognition program causes the computer to function as the detection unit, the extraction unit, and the image recognition unit.
A computer-readable recording medium on which the image recognition program according to claim 10 is recorded.