CN109409388B

CN109409388B - Dual-mode deep learning descriptor construction method based on graphic primitives

Info

Publication number: CN109409388B
Application number: CN201811317282.2A
Authority: CN
Inventors: 丁新涛; 左开中; 汪金宝; 接标; 俞庆英
Original assignee: Anhui Normal University
Current assignee: Anhui Normal University
Priority date: 2018-11-07
Filing date: 2018-11-07
Publication date: 2021-08-27
Anticipated expiration: 2038-11-07
Also published as: CN109409388A

Abstract

The invention is suitable for the technical field of image registration, and provides a dual-mode deep learning descriptor construction method based on graphic primitives, which learns the attribute category of a patch image by marking samples, learns the geometric characteristics of the patch image by using the graphic primitives, and fuses the attribute category and the geometric characteristics to obtain the characteristic vector of a local patch image, namely: a descriptor based on a graphics primitive. The registration among patches is completed through the similarity of descriptor vectors, and classification portrayal based on machine learning descriptors is realized. The method mainly comprises the steps of establishing a descriptor training set, constructing a multi-mode convolution network, training categories and geometric modes on a GPU, and achieving classification and registration of local patch images. The method solves the problems of the classification description method of the descriptor and the realization on the GPU.

Description

Dual-mode deep learning descriptor construction method based on graphic primitives

Technical Field

The invention belongs to the technical field of image registration and provides a dual-mode deep learning descriptor construction method based on image elements.

Background

Feature matching is the mainstream method of image registration, and a classical image registration method adopts a local feature descriptor, which takes an image local region with a key point as a center as an object, and describes the feature of the region according to gray scale information of internal pixel points to obtain a feature vector expressing local information around the image key point. However, the classical descriptors are large in calculation amount, difficult to apply to a real-time system and not suitable for mobile equipment.

Disclosure of Invention

The embodiment of the invention provides a dual-mode deep learning descriptor construction method based on image primitives, and aims to solve the problems that a classical descriptor is large in calculation amount and difficult to apply to a real-time system.

In order to achieve the above object, the present invention provides a method for constructing a dual-mode deep learning descriptor based on image primitives, the method comprising the steps of:

s1, extracting image I₁And image I₂Key point p of_1iAnd p_2iRespectively form a set of key points P₁And a set of keypoints P₂；

S2, intercepting a key point set P₁And a set of keypoints P₂Forming a scale space of a local area of the key points based on the patch images of all the key points, wherein the patch images refer to N images with different sizes which are captured by taking a certain key point as a center;

s3, scaling the patch images corresponding to the key points into set sizes respectively to obtain normalized dimension patch images, and executing the steps S4 and S5 simultaneously;

s4, inputting the normalized dimension patch images of the key points into a category detection model respectively, and outputting the categories of the normalized dimension patch images;

s5, performing marginalization standardization scale patch image to obtain patch marginalized image, and inputting the patch marginalized image into a geometric detection model to obtain a geometric feature vector of the patch marginalized image;

and S6, combining the category and the geometric feature vector of the same key point on the patch images with different sizes to form a descriptor vector of each key point on the patch images with different sizes.

Further, after step S6, the method further includes:

s7, image I₁And image I₂In (1) registration of keypoints, image I₁And image I₂Any one pair of key points p in_1iAnd p_2iThe registration method specifically comprises the following steps:

if j is present₁And j₂So that

Then image I₁Key point p_1iAnd image I₂Key point p_2iMatching;

wherein T is a set distance threshold value,

as an image I₁Wherein the ith key point is at the jth₁The sub-vectors of the descriptors on a single scale,

as an image I₂Wherein the ith key point is at the jth₂Descriptor vectors on a size.

Further, the method for constructing the category detection model in step S3 is specifically as follows:

s31, constructing a class detection training set and a class detection verification set, wherein the class detection training set and the class detection verification set are formed on the basis of classified marking data;

s32, constructing a category classifier;

s33, training a class classifier through a class training set;

and S34, if the training frequency reaches a set frequency threshold, verifying the trained class classifier through the class verification set, and if the error of the trained class classifier on the class verification set is in an error allowable range or the training frequency reaches an upper limit threshold, stopping training, namely forming a class detection model.

Further, the method for constructing the set detection model in step S4 is specifically as follows:

s41, constructing an image primitive training set and an image primitive verification set, wherein the image primitive training set and the image primitive verification set are formed on the basis of a randomly generated combined image containing straight lines and circles;

s42, constructing a multi-dimensional primitive classifier aiming at the image primitive training set;

s43, training the multi-dimensional primitive classifier through an image primitive training set;

and S44, if the training times reach the set time threshold, verifying the trained multi-dimensional primitive classifier through the image primitive verification set, and if the error of the trained multi-dimensional primitive classifier on the image primitive verification set is within the error allowable range or the training times reach the upper limit threshold, stopping training, namely forming the geometric detection model.

Further, the method for constructing the class detection training set and the class detection verification set specifically comprises the following steps:

s311, uploading and downloading images in a target database;

s312, selecting patch images with set sizes in a target area of the image according to the segmentation classification labels, wherein the number of the selected patch images is one fourth of the multiplication of the size rows and columns of the target area, and the frequency of the central coordinates of the patch images obeys the two-dimensional Gaussian distribution of the center in the area center;

and S314, respectively putting the patch graphics with the classification marks into a class detection training set and a class detection verification set according to a set proportion.

Further, the method for constructing the image element training set and the image element verification set specifically comprises the following steps:

s411, randomly forming a combined image with a set size by taking straight lines and circles as elements, and recording the number n of the straight lines₁Number of circles n₂The number n of intersections of the straight line and the circle₃And the number n of intersections between straight lines₄And acute angles at intersections between straight lines

Constructing classifier vectors based thereon

S412, randomly adding noise to the combined image to form a primitive sample;

and S413, respectively dividing the primitive samples into an image primitive training set and an image primitive verification set based on the set proportion.

Further, patch images of different sizes are enlarged to a set size by a nearest neighbor interpolation method.

Further, the normalized scale patch image is marginalized through a Sobel edge detection method, and a patch marginalized image is obtained.

Furthermore, a parallel network is built by double ResNet, the normalized dimension patch image is input into the category detection model, meanwhile, the patch marginalized image is input into the geometric detection model, and the category detection model and the geometric detection model are multiplied at a full connection layer to build the multiplicative detection model for output.

The dual-mode deep learning descriptor construction method based on the graphic primitives, which is provided by the invention, explores a descriptor classification method of GPU (graphics processing unit) calculation aiming at the defect of large CPU (central processing unit) calculation amount of a classical image registration method. The method mainly comprises the steps of establishing a descriptor training set, constructing a multi-mode convolution network, training categories and geometric modes on a GPU, and achieving classification and registration of local patch images. The method solves the problems of the classification description method of the descriptor and the realization on the GPU.

Drawings

FIG. 1 is a flowchart illustrating a dual-mode deep learning descriptor construction method based on graphics primitives according to an embodiment of the present invention;

fig. 2 is a schematic flow chart of a dual-mode building method according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The invention learns the attribute category of the patch image by marking samples, learns the geometric characteristics of the patch image by using graphic primitives, and fuses the attribute category and the geometric characteristics to obtain the characteristic vector of the local patch image, namely: a descriptor based on a graphics primitive. The registration between patches is completed through the similarity of descriptor vectors, and the classification characterization based on machine learning descriptors is realized.

Fig. 1 is a schematic flowchart of a method for constructing a dual-mode deep learning descriptor based on image primitives according to an embodiment of the present invention, and as can be seen from fig. 1, the method includes the following steps:

s1, extracting image I₁And image I₂Respectively constitute a set of key points P₁And a set of keypoints P₂；

Image I is extracted by OD-SIFT method₁And image I₂Is given as a key point of (1), assume image I₁Above is provided with n₁A key point, a set of key points P₁Expressed as: p₁＝{p_1i|p_1i＝(x_1i,y_1i),i＝1,2,…,n₁}，(x_1i,y_1i) Representing a set of keypoints P₁Middle ith key point p_1iAssuming the coordinates of image I₂Above is provided with n₂A key point, a set of key points P₂Is represented as P₂＝{p_2i|p_2i＝(x_2i,y_2i),i＝1,2,…,n₂},(x_2i,y_2i) Representing a set of keypoints P₂Middle ith key point p_2iThe coordinates of (a);

s2, intercepting a key point set P₁And a set of keypoints P₂Patch images of all key points form a scale space of a local area of the key points based on the patch images of the key points;

taking a certain key point as a center, intercepting N images with different scales, wherein the N images with different scales form a patch image of the key point;

the embodiment of the invention takes N value 7 as an example to illustrate the key point p_1iThe size space of the local area and the patch image ofKey point p_1iAs the center, rectangular regions of 32 × 32, 28 × 28, 24 × 24 pixels, 20 × 20, 16 × 16, 12 × 12, 8 × 8 were sequentially cut out to form 7 patch images with different sizes, and the 7 patch images together form a key point p₁₁Scale space of local regions, p_1i∈I₁The corresponding dimension patch image is

Wherein

32 x 32 rectangular patch areas, and so on,

for 8-8 rectangular patch areas, a key point set P can be constructed based on the method₁And a set of keypoints P₂Patch images of all key points and size space of local regions.

S3, scaling the patch image to a set size to obtain a normalized dimension patch image, and executing the steps S4 and S5;

s4, inputting the normalized dimension patch image into a category detection model, and outputting the category of the normalized dimension patch image;

scaling the patch images with different sizes to a set size by a nearest neighbor interpolation method, wherein the nearest neighbor interpolation method specifically comprises the following steps: if interpolation is done at p points, let the value at p points equal the value of the point closest to it, and furthermore, take 32 x 32 as the set size, as for p_1i∈I₁The corresponding normalized scale patch image is

Wherein the content of the first and second substances,

32 x 32 patch images, without stretching,

stretching a 28 x 28 rectangular patch area into a 32 x 32 patch image, and so on,

stretching 8-8 rectangular patch area to form 32-32 patch image, respectively using N (P)₁) And N (P)₂) To represent an image I₁And image I₂All normalized scale patch images obtained above, where N (P)₁) And N (P)₂) Is represented as follows: n (P)₁)＝{N(p_1i)|,i＝1,2,…,n₁}；N(P₂)＝{N(p_2i)|,i＝1,2,…,n₂H, mixing N (P)₁)、N(P₂) Respectively inputting the class detection models to obtain an output C (P)₁)＝{C(p_1i)|,i＝1,2,…,n₁And C (P)₂)＝{C(p_2i)|,i＝1,2,…,n₂And (c) the step of (c) in which,

representing an image I₁J at the ith key point₁Patches on individual scales

The above-obtained classification is based on a similar method to obtain C (p)_2i)。

let E (P)₁)＝{E(p_1i)|,i＝1,2,…,n₁And E (P)₂)＝{E(p_2i)|,i＝1,2,…,n₂Are respectively images I₁And image I₂All patch edge images obtained above, for N (p)_1i) Corresponding to the marginalized image as

Wherein

Is a pair of

The image obtained by Sobel edge detection is applied, and so on,

is a pair of

Applying the image obtained by Sobel edge detection to the sample, and adding E (P)₁) And E (P)₂) Respectively input into a geometric detection model to obtain an output G (P)₁)＝{G(p_1i)|,i＝1,2,…,n₁And G (P)₂)＝{G(p_2i)|,i＝1,2,…,n₂Therein of

Representing an image I₁J at the ith key point₁Patches on individual scales

Edge image of

The geometric feature vector obtained above is

Wherein the content of the first and second substances,

as edge images

The number of the middle straight lines is,

as edge images

The number of the middle circles is such that,

as edge images

The number of intersections of the middle straight line and the circle,

as edge images

The number of intersections of the middle straight line and the straight line,

as edge images

The sharp included angle at the middle intersection point is used for obtaining G (p) based on a similar method_2i)。

A parallel network is constructed by double ResNet, the normalized dimension patch image of the patch image is input into a category detection model, meanwhile, the patch marginalized image is input into a geometric detection model, the category detection model and the geometric detection model are multiplied at a full connection layer, and a multiplicative detection model is constructed and output, as shown in figure 2.

S5, combining the category and the geometric feature vector of the same key point on the patch images with different sizes to form a descriptor vector of each key point on the patch images with different sizes;

descriptor vector D (P)₁)＝{D(p_1i)|,i＝1,2,…,n₁And (c) the step of (c) in which,

as an image I₁J at the ith key point₁The descriptor vector obtained on each scale is similarly acquired as D (P)₂) Descriptor vector D (P)₂)＝{D(p_2i)|,i＝1,2,…,n₂And (c) the step of (c) in which,

as an image I₂J at the ith key point₂Descriptor vectors derived on a scale, and thus, a keypoint p_1iOr the key point p_2i7 descriptors are generated on the scale space.

In the embodiment of the present invention, determining the registration of the images by describing the similarity of the sub-vectors further includes, after step S6:

s7, image I₁And image I₂Registration of middle key points, image I₁And image I₂Any one pair of key points p in_1iAnd p_2iThe registration method specifically comprises the following steps:

if j is present₁And j₂So that

Then image I₁Key point p_1iAnd image I₂Key point p_2iMatching, wherein T is a set distance threshold,

as an image I₂Wherein the ith key point is at the jth₂Descriptor vectors on a size. In the embodiment of the invention, if

Descriptor vector and

if the lengths of the descriptor vectors are different, the shorter vector is extended and then distance comparison is performed.

In the embodiment of the present invention, the method for constructing the category detection model in step S3 specifically includes:

s31, constructing a class detection training set and a class detection verification set;

in the embodiment of the present invention, the method for constructing the class detection training set and the class detection verification set specifically includes the following steps:

s311, uploading and downloading images in a target database;

the object database involved in the present invention comprises: COCO, Pascal Voc, Indor Scene Recognition, Cifar-100, Downsample ImageNet, Tiny Images database, deep learning from these databases, on which many scholars have gained some better precision in order to compare the results obtained with those of other scholars.

And S312, merging according to the same category of the image in different target databases, namely fusing the target categories according to the classification labels of the image in all the databases.

S313, selecting patch images with set sizes in a target area of the image according to the segmentation classification labels, wherein the number of the selected patch images is one fourth of the multiplication of the size row and column of the target area, and the frequency of the central coordinates of the patch images follows the two-dimensional Gaussian distribution of the center of the area;

and S314, respectively putting the patch graphics with the classification labels into a class detection training set and a class detection verification set according to a set proportion.

S32, constructing a category classifier;

s33, training a class classifier through a class training set;

and S34, if the training frequency reaches a set frequency threshold value, and the frequency threshold value of deep training is generally about 450 thousand, verifying the trained class classifier through a classification verification set, if the error of the trained class classifier on the class verification set exceeds an error allowable range, returning to the step S33, and if the error of the trained class classifier on the class verification set is within the error allowable range or the training frequency reaches an upper limit threshold value, stopping training, namely forming a class detection model.

In the embodiment of the present invention, the method for constructing the set detection model in step S4 specifically includes the following steps:

s41, constructing an image element training set and an image element verification set;

in the embodiment of the present invention, the method for constructing the image primitive training set and the image primitive verification set specifically includes the following steps:

s411, randomly composing a combined image with a set size, for example, 32 × 32, with the straight lines and circles as elements, the combined image being generated by 11 random parameter controls: (k)_l,p_start,p_end,k_c,p_c,r_c) Wherein k is_l,

Controlling the number of lines and circles, p, in the graph, respectively_start,p_end,p_c∈R²Respectively controlling the starting point and the end point of the straight line and the central position r of the circle_cE.g. R controls the radius of the circle, p_start＝(x_start,y_start)，p_end＝(x_end,y_end)，p_c＝(x_c,y_c)；

S412, adding random noise on the combined image to form a primitive sample; the noise is divided into 6 types: gaussian noise, Rayleigh noise, gamma noise, exponential distribution noise, uniformly distributed noise and salt and pepper noise, wherein parameters of the Gaussian noise, the Rayleigh noise, the gamma noise, the exponential distribution noise, the uniformly distributed noise and the salt and pepper noise are generated randomly;

and S413, respectively dividing the primitive samples into an image primitive training set and an image primitive verification set according to a set proportion (such as a proportion of 3: 2).

S42, aiming at imageElement training set construction multi-dimensional element classifier

s44, if the training times reach the set times threshold, the trained multi-dimensional primitive classifier is verified through the image primitive verification set, if the error of the trained multi-dimensional primitive classifier on the image primitive verification set exceeds the error allowable range, the step S43 is returned, if the error of the trained multi-dimensional primitive classifier on the image primitive verification set is within the error allowable range or the training times reach the upper limit threshold, the training is stopped, and the geometric detection model is formed.

The invention provides a dual-mode deep learning descriptor construction method based on graphic primitives, which is used for exploring a descriptor classification method calculated by a GPU (graphics processing unit) aiming at the defect of large CPU (central processing unit) calculated amount in a classical image registration method. The method mainly comprises the steps of establishing a descriptor training set, constructing a multi-mode convolution network, training categories and geometric modes on a GPU, and achieving classification and registration of local patch images. The method solves the problems of the classification description method of the descriptor and the realization on the GPU.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A dual-mode deep learning descriptor construction method based on graphic primitives, which is characterized by comprising the following steps:

S2, intercepting a key point set P₁And a set of keypoints P₂Patch images of all the key points in the image data are constructed based on the patch imagesIn the scale space of the local area of the key point, the patch image refers to N images with different sizes which are intercepted by taking a certain key point as a center;

2. A dual-mode graphics primitive-based deep learning descriptor construction method as claimed in claim 1, further comprising after step S6:

if j is present₁And j₂So that

Then image I₁Key point p_1iAnd image I₂Key point p_2iMatching;

wherein T is a set distance threshold value,

3. The dual-mode deep learning descriptor construction method based on graphic primitives as claimed in claim 1, wherein the class detection model construction method in step S4 is specifically as follows:

s32, constructing a category classifier;

s33, training a class classifier through a class training set;

4. The method for constructing dual-mode deep learning descriptor based on graphic primitives as claimed in claim 1, wherein the geometric detection model construction method in step S5 is specifically as follows:

5. The method for constructing dual-mode deep learning descriptor based on graphic primitives as claimed in claim 3, wherein the method for constructing said class detection training set and said class detection verification set is as follows:

s311, uploading and downloading images in a target database;

6. The method as claimed in claim 4, wherein the method for constructing the training set and the verification set of image primitives comprises the following steps:

S412, randomly adding noise to the combined image to form a primitive sample;

7. A dual-mode graphics primitive-based deep learning descriptor construction method as claimed in claim 1 wherein patch images of different sizes are enlarged to a set size by nearest neighbor interpolation.

8. The dual-mode deep learning descriptor construction method based on graphic primitives of claim 1, wherein a normalized scale patch image is marginalized by a Sobel edge detection method to obtain a patch marginalized image.

9. The dual-mode deep learning descriptor construction method based on graphic primitives as claimed in claim 1, wherein a parallel network is constructed by using dual ResNet, a normalized dimension patch image is input into a class detection model, a patch marginalized image is input into a geometric detection model, the class detection model and the geometric detection model are multiplied at a full connection layer, and multiplicative detection model output is constructed.