CN112633081B

CN112633081B - Specific object identification method in complex scene

Info

Publication number: CN112633081B
Application number: CN202011406594.8A
Authority: CN
Inventors: 梁磊; 王琳
Original assignee: Shenzhen Laike Computer Technology Co ltd
Current assignee: Shenzhen Laike Computer Technology Co ltd
Priority date: 2020-12-07
Filing date: 2020-12-07
Publication date: 2022-07-01
Anticipated expiration: 2040-12-07
Also published as: CN112633081A

Abstract

The invention discloses a method for identifying a specific object in a complex scene, and belongs to the field of digital image processing. The method comprises the following specific steps: acquiring a reference image of a specific object and a scene image containing the object; establishing an image pyramid of a reference image; performing Gaussian filtering on the scene image, extracting and matching feature points with each layer of the reference image pyramid respectively, and finding out the pyramid layer number of the reference image closest to the scene image scale according to the number of matched feature points; based on the corresponding layer number, adopting extended interpolation for the scene image to improve the resolution of the image; performing Gaussian filtering on the reference image, and extracting and matching feature points with the scene graph after the extended interpolation, so that the error matching generated in the cross-layer matching process can be reduced; and finally, obtaining the position information of the specific object in the scene. The method has the advantages of not only having fast recognition speed, but also obviously improving the number of correctly matched characteristic point pairs in precision so as to achieve the aim of correct recognition.

Description

Specific object identification method in complex scene

Technical Field

The invention relates to the field of digital image processing, in particular to a specific object identification method in a complex scene.

Background

The method is widely applied to the fields of industrial automation and intelligent robots. The specific object recognition means that a specific object (e.g., my cup) is recognized in the scene image, and the corresponding general object recognition means that a class of objects (e.g., cup) is recognized in the scene image. At present, in the field of digital image processing, a method of local invariant features is generally adopted to identify a specific object in a complex scene, and a deep learning method is adopted to identify a general object in the complex scene.

The design idea of local invariant features is that the image is composed of different types of target regions, the parameters of the target regions, such as color, brightness, distribution form and the like, are different, and each target region has a specific control range, namely each target region only affects the local part of the image. These local structures are highly representative due to the rich image information contained therein. On one hand, the device is not easily influenced by the change factors such as translation, rotation, scaling, dimension, visual angle, illumination, blurring and compression of the external environment; on the other hand, the defect that the traditional global feature is easily influenced by a complex background or noise attack and the like can be avoided to a great extent. In addition, compared with the blindness of the traditional global features, the localized feature processing method is more suitable for the actual situation of image data and human vision, and useful information is searched more specifically from the local part. Thus, the locally invariant feature has great advantages in terms of stability, repeatability and distinctiveness.

The local invariant features are divided into feature angular points and feature spots, specific objects are recognized in a complex scene, the recognition method adopting the feature spots is good in robustness, but high in complexity and cannot meet the real-time requirements in the fields of industrial automation, intelligent robots and the like, the recognition method adopting the feature angular points is high in calculation efficiency, but low in robustness, few in number of correctly matched feature point pairs and inaccurate in positioning.

The first cause of inaccurate positioning is the problem of scale space: in the method of adopting characteristic corner point identification, the scale change between image matching is simulated by establishing an image pyramid, so that a plurality of error matches exist during cross-scale matching; the second reason is the problem of image resolution: the resolution of the image pyramid is gradually decreased layer by layer, so that the number of detected key points is greatly influenced, and particularly, a corner detector based on neighborhood detection is adopted in a corner identification method; the third reason is that, unlike image feature matching applications (e.g., wide baseline matching), the reference image of a specific object is a part of the scene image, and therefore the feature points of other objects in the scene image interfere with the matching result, which is determined by the characteristics of the specific object identification method.

Disclosure of Invention

The invention aims to provide a method for identifying a specific object with high robustness and high calculation efficiency in a complex scene, so as to solve the problems in the background technology.

In order to achieve the purpose, the invention provides the following technical scheme:

a specific object identification method under a complex scene comprises the following specific steps:

(1) inputting a reference image of a specific object and a scene image containing the object;

(2) establishing a reference image I_r(x, y) image pyramid P_r(x，y，n)：

P_r(x，y，n)＝F_s(n)[I_r(x，y)]N is 0, 1, 2.. N (formula 1)

Wherein F_s(n)Bilinear interpolation with s (N) as scale factor, N is the total layer number of the image pyramid;

(3) extracting an image pyramid P_rLocal invariant feature corner C of each layer in (x, y, n)_r(N_r，n) And generates a corresponding feature descriptor D_r(N_r，n)：

Wherein C is_r(N_r，n) Characteristic corner points, D, representing the pyramid of the nth level reference image_r(N_r，n) Feature descriptor representing the pyramid of the nth level reference image, N_r，nThe pyramid feature quantity of the nth layer reference image is obtained;

(4) according to N_r，nCalculating the feature quantity of the scene image to be feature-matched with the reference image pyramid, wherein the formula is as follows:

wherein N is_os，nNumber of features, R, for scene images feature-matched with nth level reference image pyramid_r，nImage resolution, R, for the nth level reference image pyramid_sIs the resolution of the scene image;

(5) for a scene image I using a Gaussian filter G (x, y, σ)_s(x, y) filtering, and then extracting a local invariant feature corner C of the filtered scene image_s(N_s) And generates a corresponding feature descriptor D_s(N_s)：

Where σ is a filter kernel parameter, N_sThe feature quantity of the filtered scene image is obtained;

(6) according to N obtained in the step (4)_os，nFor scene image characteristic descriptor D_s(N_s) Is limited to obtain D_s(N_os，n) Then, performing feature matching with feature descriptors on a corresponding layer of the reference image pyramid to obtain a matched feature point pair k (n) of each layer of the scale space:

(7) initial scale factor S based on reference image pyramid_initAnd distributing different weights to k (n), wherein the maximum value is the corresponding matching layer number c:

(8) calculating scale factor of bilateral interpolation by corresponding matching layer number c

Then using the scale factor

Carrying out extended interpolation on the scene image to obtain an extended scene image E_s(x，y)：

(9) Extraction of E_sLocal invariant feature corner point C of (x, y)'_s(N′_s) And generates a corresponding feature descriptor D'_s(N′_s)：

Wherein N'_sThe feature quantity of the expanded scene image is obtained;

(10) for reference image I_r(x, y) filtering using a Gaussian filter, and then extracting local invariant feature corner points C 'of the filtered reference image'_r(N′_r) And generates a corresponding feature descriptor D'_r(N′_r)：

Wherein N'_rThe feature quantity of the filtered reference image is obtained;

(11) according to N'_rCalculating the quantity of scene image features to be feature-matched with the filtered reference image, wherein the formula is as follows:

wherein N'_osThe number of features of the scene image for feature matching with the filtered reference image, alpha being a coefficient factor, beta_rAnd beta_sEntropy, R, of the reference image and scene image, respectively_rIs the resolution of the reference image;

(12) according to N 'obtained in step (11)'_osTo feature descriptor D'_s(N′_s) Is limited in number to give D'_s(N′_os) Then with the filtered reference picture feature descriptor D'_r(N′_r) And performing feature matching to obtain matched feature point pairs k':

(13) and calculating the geometric transformation of the reference target in the scene image according to the matched characteristic point pair k', thereby obtaining the position information of the target in the scene, namely completing the identification.

As a further scheme of the invention: the mesoscale factor s (n) in (formula 1) is calculated as follows:

wherein S_initIs an initialization constant for the scale factor.

As a further scheme of the invention: the constant S is initialized in the (equation 2)_initThe value of (A) is 1.2.

As a further scheme of the invention: the value of N in the (formula 1) is 7.

As a further scheme of the invention: the value of the filter kernel parameter σ in (equation 5) is 0.3.

As a further scheme of the invention: the value of the coefficient factor α in the (equation 11) is 2.

Compared with the prior art, the invention has the beneficial effects that:

the specific object identification method has high calculation efficiency, has high robustness in natural and complex scenes, and can accurately and quickly identify the specific object.

Drawings

FIG. 1 is a reference image of a specific object to be identified in an embodiment of the present invention.

FIG. 2 is an image of a scene containing an object to be identified in accordance with an embodiment of the present invention.

Fig. 3 is an effect diagram of marking matched pairs of feature points after object identification by using the method of the present invention in the embodiment of the present invention.

FIG. 4 is a diagram illustrating the effect of marking the position of an object after the object is identified by the method of the present invention according to an embodiment of the present invention.

Fig. 5 is a flow chart of object recognition using the method of the present invention in an embodiment of the present invention.

Detailed Description

The following detailed description of the present patent refers to the accompanying drawings and detailed description.

(2) establishing a reference image I_r(x, y) image pyramid P_r(x，y，n)：

P_r(x，y，n)＝F_s(n)[I_r(x，y)]N is 0, 1, 2.. N (formula 1)

Wherein F_s(n)Bilinear interpolation with S (n) as scale factor, S_init1.2 is the initialization constant of the scale factor, and N7 is the total number of layers of the image pyramid;

(3) extracting an image pyramid P_r(x, y, n) local invariant feature corner C of each layer_r(N_r，n) And generates a corresponding feature descriptor D_r(N_r，n)：

Wherein C_r(N_r，n) Characteristic corner points, D, representing the pyramid of the nth level reference image_r(N_r，n) Feature descriptor representing the pyramid of the nth level reference image, N_r，nThe pyramid feature quantity of the nth layer reference image is obtained;

wherein N is_os，nNumber of features for scene images feature-matched with nth level reference image pyramid, N_r，nImage resolution, R, for the nth level reference image pyramid_sIs the resolution of the scene image;

(5) image of scene I using gaussian filter G (x, y, σ) with filter kernel parameter σ of 0.3_s(x, y) filtering, and then extracting a local invariant feature corner C of the filtered scene image_s(N_s) And generates a corresponding feature descriptor D_s(N_s)：

Wherein N is_sThe feature quantity of the filtered scene image is obtained;

(6) according to N obtained in the step (4)_os，nFeature descriptor D for scene image_s(N_s) Is limited to obtain D_s(N_os，n) Then, performing feature matching with feature descriptors on the corresponding layer of the reference image pyramid to obtain a matched feature point pair k (n) of each layer of the scale space:

(7) initial scale factor S based on reference image pyramid_initDistributing different weights to k (n), wherein the maximum value is the corresponding matching layer number c:

Then using the scale factor

(9) Extraction of E_sLocal invariant feature corner point C of (x, y)'_s(N′_s) And generating a corresponding feature descriptor D'_s(N′_s)：

Wherein N'_sThe feature quantity of the expanded scene image is obtained;

(10) for reference image I_r(x, y) filtering by using a Gaussian filter with a filtering kernel parameter sigma of 0.3, and then extracting a local invariant feature corner point C 'of the filtered reference image'_r(N′_r) And generates a corresponding feature descriptor D'_r(N′_r)：

Wherein N'_rThe feature quantity of the filtered reference image is obtained;

wherein N'_osThe number of features of the scene image for feature matching with the filtered reference image, α -2 being a coefficient factor, β_rAnd beta_sEntropy, R, of the reference image and scene image, respectively_rIs the resolution of the reference image;

In an embodiment, the method of the present invention is used to identify a specific object in the scene image shown in fig. 1, as follows:

(1) the captured reference image of a particular object (as shown in fig. 1) and an image of a scene containing the object (as shown in fig. 2) are input.

(2) Establishing a reference image I_r(x, y) image pyramid P_r(x，y，n)：

P_r(x，y，n)＝F_s(n)[I_r(x，y)]N is 0, 1, 2.. N (formula 1)

Wherein F_s(n)Bilinear interpolation with S (n) as scale factor, S_init1.2 is the initialization constant of the scale factor, and N7 is the total number of layers of the image pyramid.

(3) Extracting an image pyramid P_rORB feature corner C of each layer in (x, y, n)_r(N_r，n) And generates a corresponding ORB binary feature descriptor D_r(N_r，n)：

Wherein C is_r(N_r，n) Characteristic corner points, D, representing the pyramid of the nth level reference image_r(N_r，n) Feature descriptor representing the pyramid of the nth level reference image, N_r，nAnd the pyramid feature quantity of the nth layer reference image is obtained.

(4) Since the scene image contains more objects than the reference image, the number of feature points of the scene image is much greater than the number of feature points of the scene image, and therefore the number of feature points of the scene image to be feature-matched with the reference image pyramid needs to be limited, according to N_r，nCalculating the number of scene image features to be feature-matched with the reference image pyramid, wherein the formula is as follows:

wherein N is_os，nNumber of features, R, for scene images feature-matched with nth level reference image pyramid_r，nImage resolution, R, for the nth level reference image pyramid_sThe resolution of the scene image.

(5) Image of scene I using gaussian filter G (x, y, σ) with filter kernel parameter σ of 0.3_s(x, y) filtering to reduce the interference of other parts in the scene image to the matching result, and then extracting the ORB characteristic corner C of the filtered scene image_s(N_s) And generates a corresponding ORB binary feature descriptor D_s(N_s)：

Wherein N is_sIs the feature quantity of the filtered scene image.

(6) According to N obtained in the step (4)_os，nFor scene image characteristic descriptor D_s(N_s) Is limited in number byTo D_s(N_os，n) Then, calculating the Hamming distance between the characteristic descriptors on the corresponding layer of the reference image pyramid for characteristic matching to obtain a characteristic point pair k (n) matched with each layer of the scale space:

Then using the scale factor

(9) Extraction of E_sORB feature corner point C of (x, y)'_s(N′_s) And generates a corresponding ORB binary feature descriptor D'_s(N′_s)：

Wherein N'_sThe feature quantity of the expanded scene image.

(10) The extended interpolation of the scene image in the step (8) causes the loss of image detail information becauseThe pair of reference images I_r(x, y) simulating the loss of the part by filtering with a gaussian filter having a filtering kernel parameter σ of 0.3, and then extracting an ORB feature corner point C 'of the filtered reference image'_r(N′_r) And generates a corresponding ORB binary feature descriptor D'_r(N′_r)：

Wherein N'_rIs the feature quantity of the filtered reference picture.

(11) Limiting the number of characteristic point pairs of the scene image not only relates to the resolution of the image, but also relates to the texture information of the object to be identified in the image, if the texture information of the object to be identified is less, the object to be identified is easily interfered by other objects in the scene image, so in order to increase the number of the correctly matched characteristic point pairs, the number of the characteristic point pairs is determined according to N'_rWhen calculating the number of scene image features to be feature-matched with the filtered reference image, the information entropy of the image is taken into account, and the formula is as follows:

wherein N'_osThe number of features of the scene image for feature matching with the filtered reference image, α -2 being a coefficient factor, β_rAnd beta_sEntropy, R, of the reference image and scene image, respectively_rIs the resolution of the reference picture.

(12) According to N 'obtained in step (11)'_osTo feature descriptor D'_s(N′_s) Is limited in number to give D'_s(N′_os) Then, the filtered reference image feature descriptor D 'is calculated'_r(N′_r) Carrying out feature matching on the Hamming distance between the two points to obtain a matched feature point pair k':

as shown in fig. 3, matching feature points are respectively marked in the reference image and the scene image, and matching pairs of feature points are marked with straight lines.

(13) And calculating the geometric transformation of the reference object in the scene image according to the matching feature point pair k', so as to obtain the position information of the object in the scene, namely completing the identification, as shown in fig. 4.

In practical application, the reference image of the object to be recognized is preserved in advance, the image pyramid of the reference image is established in advance, the features are extracted, and then the features are preserved for standby application, so that the steps (2), (3) and (4) in the method do not generate the consumption of computing resources. In practical operation, although the method of the present invention has more steps of finding the number of corresponding matching layers than the conventional local invariant corner feature method, the method still has advantages in computational efficiency because: in the conventional feature extraction method, feature points need to be extracted and matched for each layer of the image pyramids of the reference image and the scene image (for example, if the reference image has 7 layers of image pyramids, and the scene image has 7 layers of image pyramids, matching is needed 49 times). The method provided by the invention can obtain a result by one-time matching according to the corresponding matching layer number, more importantly, the method can reduce the error matching points generated during cross-layer matching, and can obviously improve the number of correctly matched characteristic point pairs in precision.

The specific object identification method provided by the invention has high calculation efficiency, has high robustness in a natural and complex scene, and can correctly and quickly identify the specific object.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications may be made or equivalents may be substituted for some of the features thereof without departing from the scope of the present invention, and such modifications and substitutions should also be considered as the protection scope of the present invention.

Claims

1. A specific object identification method under a complex scene is characterized by comprising the following specific steps:

(2) establishing a reference image I_r(x, y) image pyramid P_r(x，y，n)：

P_r(x，y，n)＝F_s(n)[I_r，(x，y)]N is 0, 1, 2.. N (formula 1)

(5) using a Gaussian filter G (x, y, σ) for an image I of a scene_s(x, y) filtering and then extractingLocal invariant feature corner C of filtered scene image_s(N_s) And generates a corresponding feature descriptor D_s(N_s)：

(6) according to N obtained in the step (4)_os，nFeature descriptor D for scene image_s(N_s) Is limited to obtain D_s(N_os，n) Then, performing feature matching with feature descriptors on a corresponding layer of the reference image pyramid to obtain a matched feature point pair k (n) of each layer of the scale space:

Then using the scale factor

Wherein N'_sThe feature quantity of the expanded scene image is obtained;

(10) for reference image I_r(x, y) filtering by using a Gaussian filter, and then extracting local invariant feature corner points C 'of the filtered reference image'_r(N′_r) And generates a corresponding feature descriptor D'_r(N′_r)：

Wherein N'_rThe feature quantity of the filtered reference image is obtained;

(12) according to N 'obtained in step (11)'_osTo feature descriptor D'_s(N′_s) Is limited in number to give D'_s(N′_os) Then with the filtered reference picture feature descriptorD′_r(N′_r) And performing feature matching to obtain matched feature point pairs k':

2. The method for identifying the specific object in the complex scene according to claim 1, wherein the scale factor s (n) in (formula 1) is calculated as follows:

wherein S_initIs an initialization constant for the scale factor.

3. The method for identifying specific objects in complex scene according to claim 2, wherein the constant S is initialized in the formula 2_initThe value of (A) is 1.2.

4. The method for identifying the specific object in the complex scene according to claim 1, wherein the value of N in (formula 1) is 7.

5. The method for identifying specific objects in a complex scene according to claim 1, wherein the value of the filter kernel parameter σ in (formula 5) is 0.3.

6. The method for identifying specific objects in a complex scene according to claim 1, wherein the value of the coefficient factor α in the formula 11 is 2.