WO2018036146A1

WO2018036146A1 - Convolutional neural network-based target matching method, device and storage medium

Info

Publication number: WO2018036146A1
Application number: PCT/CN2017/077579
Authority: WO
Inventors: 任鹏远; 石园; 许健; 万定锐
Original assignee: 东方网力科技股份有限公司
Priority date: 2016-08-26
Filing date: 2017-03-21
Publication date: 2018-03-01
Also published as: CN106407891A; CN106407891B

Abstract

Disclosed in the embodiments of the present invention are a convolutional neural network-based target matching method, device and storage medium. The method comprises: acquiring a first image and a second image; calculating a pooling feature of a target area in the first image; traversing and matching the second image on the basis of the pooling feature to obtain a corresponding matching score graph; and determining a target area in the second image according to the matching score graph.

Description

Target matching method, device and storage medium based on convolutional neural network

Technical field

The present invention relates to the field of machine vision technology, and in particular, to a target matching method, device and storage medium based on a convolutional neural network.

Background technique

With the continuous deepening of smart city construction, the video surveillance market continues to maintain a rapid growth trend. Currently, video surveillance mainly captures video images by setting a camera that captures environmental information, and transmits the captured video images to a control platform for analysis processing, such as tracking of targets in the video images. For target tracking, the general process includes: after the target enters the video surveillance area, because the target is moving, the system captures the image of the target in the current frame as a template, and finds the target after the target frame is matched by the target frame in the next frame of the video image. position. It can be seen that how to accurately perform target matching is the key to video image tracking. In addition, target matching is also the core of technologies such as image recognition, image retrieval, and image annotation.

The target matching means that the front and rear video frames or a plurality of pre-selected image frames are associated, and the matching target matching the target in the previous image frame is found from the latter image frame. The associated methods are primarily related by features.

In the prior art, a target matching method such as point feature template matching, line feature template matching, and surface feature template matching is generally adopted. However, the point feature matching method has poor matching accuracy when the target contrast is low, or there is no obvious focus feature; the line feature matching method is not obvious when the target edge is not obvious, or the target has large deformation, the matching accuracy is also better. Difference; although the surface feature matching method improves the accuracy of matching, it has a large amount of computation and low efficiency.

Summary of the invention

In view of this, the embodiments of the present invention are directed to a target matching method, apparatus, and storage medium based on a convolutional neural network, which adopts pooling features for ergodic matching, and the accuracy and efficiency of matching are high.

In a first aspect, an embodiment of the present invention provides a target matching method based on a convolutional neural network, where the method includes:

Obtaining the first image and the second image;

Calculating a pooling feature of the target area in the first image;

Performing traversal matching on the second image based on the pooling feature to obtain a corresponding matching score map;

Determining a target area in the second image according to the matching score map.

In an embodiment, the calculating the pooling feature of the target area in the first image comprises:

Extracting a first basic feature layer of the first image based on a pre-acquired Convolutional Neural Networks (CNN);

Calculating a position of the first window in the first base feature layer relative to the target area according to a location of the target area in the first image and a reduction ratio of the CNN;

Determining a location of the second window of the first base feature layer based on the preset pooling parameter and the location of the first window;

The first basic feature layer of the second window is input to the pooling layer corresponding to the pooling parameter for feature extraction, and a pooling feature is obtained.

In an embodiment, determining the location of the second window of the first basic feature layer based on the preset pooling parameter and the location of the first window comprises:

Calculating a first output size of the pooling layer according to a preset minimum window size of the pooled layer and a position of the first window;

Calculating the pool according to a preset maximum output size of the pooled layer and the first output size The second output size of the layer;

Calculating a window size of the pooling layer according to the second output size and a position of the first window;

Calculating a position of the second window of the first base feature layer according to the second output size and the window size.

In an embodiment, the traversing matching is performed on the second image based on the pooling feature to obtain a corresponding matching score map, including:

Extracting a second basic feature layer of the second image based on the pre-acquired CNN;

Configuring a matching convolution layer and a modulus convolution layer for the second basic feature layer; wherein the matching convolution layer and the convolution kernel used by the modulus convolution layer are all taken from the first a normalized pooling feature of the image, wherein the normalized pooling feature is obtained by normalizing the pooled feature;

And obtaining, according to a ratio relationship between an output of the matching convolution layer and an output of the modulo convolution layer, a matching score of each of the to-be-matched regions of the second image with respect to a target region of the first image Figure.

In an embodiment, the extracting the second basic feature layer of the second image based on the pre-acquired CNN includes:

Performing a scaling process on the second image according to the first image to obtain a second image after the scaling process;

And extracting, according to the pre-acquired CNN, the second basic feature layer of the second image after the scaling process.

In an embodiment, configuring the matching convolution layer for the second basic feature layer comprises:

And configuring, according to the window size of the pooling layer and the window traversal granularity, a pooling layer to be matched for the second basic feature layer, according to the output of the second basic feature layer according to the pooled layer to be matched according to the pool The window size of the layer is pooled;

And configuring a matching convolution layer according to the normalization pooling feature to perform convolution according to the normalized pooling feature according to the output of the matching convolution layer to be matched with the pooling layer deal with;

Configuring a modulus convolution layer for the second base feature layer, including:

And configuring a modulus calculation layer on the to-be-matched pooling layer to perform normalization processing on the output of the to-be-matched pooling layer according to the modulus calculation layer;

And configuring a modulus convolution layer for the modulus calculation layer according to the normalization pooling feature, to perform convolution processing according to the normalized pooling feature according to the output of the modulus value convolution layer .

In an embodiment, the configuring a matching convolution layer for the to-be-matched pooling layer according to the normalized pooling feature includes:

Performing a hole-polishing process on the normalized pooling feature according to a difference between a window size of the pooling layer and a window traversing granularity, to obtain a normalized pooling feature after the hole processing;

And matching the convolution layer to the pooled layer to be matched according to the normalized pooling feature after the hole processing.

In an embodiment, the configuring a modulus convolution layer for the modulus calculation layer according to the normalization pooling feature includes:

And configuring a modulus convolution layer for the modulus calculation layer according to the normalized pooling feature after the hole processing.

In an embodiment, the determining, according to the matching score map, the target area in the second image comprises:

The area to be matched corresponding to the highest score in the matching score map is selected as the target area in the second image.

In a second aspect, an embodiment of the present invention further provides a target matching based on a convolutional neural network. Apparatus, the apparatus comprising:

Obtaining a module, configured to acquire the first image and the second image;

a calculation module configured to calculate a pooling feature of the target area in the first image;

a generating module, configured to perform traversal matching on the second image based on the pooling feature, to obtain a corresponding matching score map;

a determining module configured to determine a target area in the second image based on the matching score map.

In an embodiment, the calculation module comprises:

a first extraction submodule configured to extract a first basic feature layer of the first image based on the pre-acquired CNN;

a calculating submodule configured to calculate a position of the first window in the first base feature layer relative to the target area according to the position of the target area in the first image and the dimensionality reduction ratio of the CNN;

Determining a submodule configured to determine a location of the second window of the first basic feature layer based on the preset pooling parameter and the location of the first window;

The first generation sub-module is configured to input the first basic feature layer of the second window to the pooling layer corresponding to the pooling parameter for feature extraction, to obtain a pooling feature.

In an embodiment, the determining submodule comprises:

a first calculating unit, configured to calculate a first output size of the pooling layer according to a preset minimum window size of the pooling layer and a position of the first window;

a second calculating unit, configured to calculate a second output size of the pooling layer according to a preset maximum output size of the pooled layer and a first output size;

a third calculating unit, configured to calculate a window size of the pooling layer according to the second output size and the position of the first window;

And a fourth calculating unit configured to calculate a position of the second window of the first basic feature layer according to the second output size and the window size.

In an embodiment, the generating module includes:

a second extraction submodule configured to extract a second basic feature layer of the second image based on the pre-acquired CNN;

a configuration submodule configured to respectively configure a matching convolution layer and a modulus convolution layer for the second basic feature layer; wherein the convolution kernel used by the matching convolution layer and the modulus convolution layer are all taken from the first image The normalized pooling feature is obtained by normalizing the pooling features;

a second generating unit submodule configured to obtain, according to a ratio relationship between an output of the matching convolution layer and an output of the modulo convolution layer, obtaining a target region of the second image relative to the target region of the first image Match the score map.

In an embodiment, the second extraction submodule comprises:

a scaling unit configured to perform a scaling process on the second image according to the first image to obtain a second image after the scaling process;

And an extracting unit configured to extract a second basic feature layer of the second image after the scaling process based on the pre-acquired CNN.

In an embodiment, the configuration submodule includes:

The first configuration unit is configured to configure a pooling layer to be matched according to the window size of the pooling layer and the window traversal granularity, so as to be pooled according to the output of the pooled layer to be matched to the second basic layer The window size of the layer is pooled;

a second configuration unit, configured to configure a matching convolution layer for the pooled layer to be matched according to the normalized pooling feature, to perform convolution processing according to the normalized pooling feature according to the output of the matching convolution layer to be matched with the pooling layer ;

a third configuration unit, configured to configure a modulus calculation layer to be matched with the pooling layer based on the modulus value operation, to perform normalization processing on the output of the pooled layer to be matched according to the modulus calculation layer;

a fourth configuration unit configured to configure a modulus convolution layer for the modulus calculation layer according to the normalization pooling feature, to perform volume according to the normalized pooling feature according to the modulus convolution layer Product processing.

In an embodiment, the second configuration unit comprises:

The first hole-adding sub-unit is configured to perform a hole-polishing process on the normalized pooling feature according to the difference between the window size of the pooling layer and the window traversing granularity, and obtain a normalized pooling after the hole processing feature;

The first configuration subunit is configured to configure a matching convolution layer for the pooled layer to be matched according to the normalized pooling feature after the hole processing.

In an embodiment, the fourth configuration unit comprises:

The second hole-adding sub-unit is configured to perform a hole-polishing process on the normalized pooling feature according to the difference between the window size of the pooling layer and the window traversing granularity, and obtain a normalized pooling after the hole processing feature;

The second configuration subunit is configured to configure the modulus convolution layer for the modulus calculation layer according to the normalized pooling feature after the hole processing.

In an embodiment, the determining module is further configured to select a to-be-matched region corresponding to the highest score in the matching score map as the target region in the second image.

In a third aspect, the embodiment of the present invention further provides a storage medium, where the computer-executable instructions are stored in the storage medium, and the computer-executable instructions are used to execute the convolutional neural network based on the embodiment of the present invention. Target matching method.

The object matching method, device and storage medium based on convolutional neural network provided by the embodiments of the present invention are less accurate than the point feature matching method and the line feature matching method in the prior art, and the efficiency of the surface feature matching method is better. In contrast, it first acquires the first image and the second image, and secondly calculates the pooled feature of the target region in the first image, and performs traversal matching on the second image based on the calculated pooled feature again, and finally according to The matching score map obtained by traversing the matching determines the target region in the second image, and adopts the pooling feature of the first image to perform traversal matching on the second image, and the matching accuracy is better and the efficiency is higher.

The above described objects, features and advantages of the present invention will become more apparent from the aspects of the appended claims.

DRAWINGS

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the embodiments will be briefly described below. It should be understood that the following drawings show only certain embodiments of the present invention, and therefore It should be seen as a limitation on the scope, and those skilled in the art can obtain other related drawings according to these drawings without any creative work.

FIG. 1 is a flowchart of a target matching method based on a convolutional neural network according to an embodiment of the present invention;

FIG. 2 is a flowchart of another method for matching a target based on a convolutional neural network according to an embodiment of the present invention;

FIG. 3 is a flowchart of another method for matching a target based on a convolutional neural network according to an embodiment of the present invention;

FIG. 4 is a flowchart of another method for matching a target based on a convolutional neural network according to an embodiment of the present invention;

FIG. 5 is a flowchart of another method for matching a target based on a convolutional neural network according to an embodiment of the present invention;

FIG. 6 is a flowchart of another method for matching a target based on a convolutional neural network according to an embodiment of the present invention;

FIG. 7 is a flowchart of another method for matching a target based on a convolutional neural network according to an embodiment of the present invention;

FIG. 8 is a flowchart of another method for matching a target based on a convolutional neural network according to an embodiment of the present invention;

9a and 9b are schematic diagrams showing matching of a convolution kernel in a target matching method based on a convolutional neural network according to an embodiment of the present invention;

FIG. 10 is a flowchart of another method for matching a target based on a convolutional neural network according to an embodiment of the present invention;

FIG. 11 is a schematic structural diagram of a target matching apparatus based on a convolutional neural network according to an embodiment of the present invention.

The main component symbol description:

11. Obtaining a module; 22, calculating a module; 33, generating a module; 44, determining a module.

detailed description

The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is a partial embodiment of the invention, and not all of the embodiments. The components of the embodiments of the invention, which are generally described and illustrated in the figures herein, may be arranged and designed in various different configurations. Therefore, the following detailed description of the embodiments of the invention in the claims All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.

Considering the point feature matching method in the prior art, the matching accuracy is poor when the target contrast is low, or there is no obvious focus feature; the line feature matching method is matched when the target edge is not obvious, or the target has large deformation, the matching The accuracy is also poor; although the surface feature matching method improves the accuracy of matching, it has a large amount of computation and low efficiency. Based on this, an embodiment of the present invention provides a target matching method and apparatus based on a convolutional neural network, which has high accuracy and efficiency of target matching by traversal matching of pooled features.

FIG. 1 is a flowchart of a method for matching a target based on a convolutional neural network according to an embodiment of the present invention, where the method specifically includes the following steps:

S101. Acquire a first image and a second image.

Specifically, the target matching based on the convolutional neural network provided by the embodiment of the present invention is considered. In the physical application scenario of the method, the target matching method based on the convolutional neural network provided by the embodiment of the present invention needs to acquire the first image and the second image. In addition, the object matching method based on the convolutional neural network provided by the embodiment of the present invention can be applied not only to image retrieval but also to image tracking. For the image retrieval system, the first image is a query image input by the user, and the second image is all images stored in advance; for the target tracking system, the first image is an initial frame or a current frame image, and the second image is a lower image. One frame of image.

S102. Calculate a pooling feature of the target area in the first image.

Specifically, first, the obtained first image is selected as a target area, and then the selected target area is calculated by using the pooled feature. The frame selection of the target area may be performed by a manual method or by a related computer program, and the target area is preferably selected as a rectangle in the embodiment of the present invention. The target area mainly includes areas where people, faces, objects, and the like are more interested. The calculation of the above-mentioned pooling feature is mainly performed by using a depth neural network to perform corresponding window determination on each computing layer, so as to use the pooled feature of the determined window as the image pooling feature of the target region in the first image. At the same time for the target area, in the subsequent matching process, the traversal matching of the second image will be performed with the image pooling feature as a convolution kernel.

S103. Perform traversal matching on the second image based on the pooling feature to obtain a corresponding matching score map.

S104. Determine a target area in the second image according to the matching score map.

Specifically, for the pooled feature calculated by the first image, it will be used as a convolution kernel of the second image, and convolved on the feature layer outputted by the pooled layer of the second image to obtain each to-be-matched region. The matching score of the target area of the first image is determined, and finally the target area in the second image is determined from the to-be-matched area according to the corresponding matching score map.

The target matching method based on the convolutional neural network provided by the embodiment of the present invention is less accurate than the point feature matching method and the line feature matching method in the prior art, and the surface feature matching method is less efficient than the First acquiring the first image and the second image, and secondly for the first image The target area is calculated by the pooling feature, and the second image is traversed and matched based on the calculated pooling feature. Finally, the target region in the second image is determined according to the matching score map obtained by the traversal matching, and the first image is adopted. The pooling feature performs traversal matching on the second image, and the matching accuracy is better and the efficiency is higher.

In order to better calculate the pooling feature of the target area in the first image, the calculation process of the above S102 is specifically implemented by the following steps. Referring to the flowchart shown in FIG. 2, the method further includes:

S201. Extract a first basic feature layer of the first image based on the pre-acquired CNN.

S202. Calculate, according to a location of the target area in the first image and a dimensionality reduction ratio of the CNN, a position of the first window in the first basic feature layer relative to the target area.

Specifically, the target matching method based on the convolutional neural network provided by the embodiment of the present invention inputs the first image as an input layer into the pre-trained CNN, and uses the CNN output as the basic feature layer. In the embodiment of the present invention, the position of the first window in the first basic feature layer relative to the target area is calculated according to the position of the target area in the first image and the dimensionality reduction ratio of the CNN. Specifically, assuming that the size of the first image is [W1_0, H1_0], the dimensionality reduction ratio of the convolutional neural network is R, and the coordinates of the upper left corner of the rectangular target area selected in the first image are (X0_lt, Y0_lt), The coordinates of the lower right corner point are (X0_rb, Y0_rb), and the size of the base feature layer of the first image is [W1, H1] = [Floor(W1_0/R), Floor(H1_0/R)] (where Floor represents downward Rounding), the position of the first window of the corresponding first basic feature layer is:

The coordinates of the upper left point are (X1_lt, Y1_lt) = (Floor (X0_lt / R), Floor (Y0_lt / R));

The coordinates of the lower right point are (X1_rb, Y1_rb) = (Floor (X0_rb/R), Floor (Y0_rb/R)).

In addition, with the advent of the era of big data, only a relatively complex model, or a model with strong expressive ability, can fully exploit the rich information contained in the massive data. Therefore, the pre-trained CNN in the embodiment of the present invention can target the target. The neural network for deep learning of features in the region, since the feature detection layer of CNN learns through training data, when using CNN, explicit feature extraction is avoided, and learning is implicitly learned from the training data; Same The weights of the neurons on the feature map are the same, so the network can learn in parallel, which is also a big advantage of the convolutional network relative to the network connected to the neurons.

S203. Determine a location of the second window of the first basic feature layer based on the preset pooling parameter and the location of the first window.

In order to determine the position of the second window of the first basic feature layer according to the position of the first window of the first basic feature layer, referring to FIG. 3, the determining process of the position of the second window is specifically implemented by the following steps:

S2031: Calculate a first output size of the pooling layer according to a preset minimum window size of the pooled layer and a position of the first window.

S2032: Calculate a second output size of the pooling layer according to a preset maximum output size of the pooled layer and a first output size.

S2033. Calculate a window size of the pooling layer according to the second output size and the position of the first window.

S2034. Calculate a position of the second window of the first basic feature layer according to the second output size and the window size.

Specifically, the determining of the second window of the first basic feature layer by the target matching method based on the convolutional neural network provided by the embodiment of the present invention is based on the preset pooling parameter and the position of the first window. A specific implementation of an embodiment of the present invention is as follows:

First, the first output size of the pooled layer is calculated according to a preset minimum window size of the pooled layer and a position of the first window. Assume that the minimum window size of the pooled layer is [MinPoolX, MinPoolY], and the coordinates of the upper left point (X1_lt, Y1_lt) and the coordinates of the lower right point (X1_rb, Y1_rb) of the first window of the first basic feature layer calculated above are known. The first output size [PoolOutX_1, PoolOutY_1] of the pooled layer is:

[Floor((X1_rb-X1_lt)/MinPoolX), Floor((Y1_rb-Y1_lt)/MinPoolY)].

Secondly, the second output size of the pooled layer is calculated according to a preset maximum output size of the pooled layer and a first output size. Assume that the maximum output size of the pooled layer is [MaxPoolOutX, MaxPoolOutY], from the first output size [PoolOutX_1, PoolOutY_1], the second output size [PoolOutX_2, PoolOutY_2] of the pooled layer is:

[Max(PoolOutX_1, MaxPoolOutX), Max(PoolOutY_1, MaxPoolOutU)].

Again, the window size of the pooled layer is calculated based on the second output size and the position of the first window. From the second output size [PoolOutX_2, PoolOutY_2] and the upper left point coordinates (X1_lt, Y1_lt) and the lower right point coordinates (X1_rb, Y1_rb) of the first window of the first basic feature layer, it can be known that the window size of the pooled layer [ PoolSizeX, PoolSizeY] is:

[Floor((X1_rb-X1_lt)/PoolOutX_2), Floor((Y1_rb-Y1_lt)/PoolOutY_2)].

Finally, the position of the second window of the first base feature layer is calculated based on the second output size and the window size. From the second output size [PoolOutX_2, PoolOutY_2] and the window size of the pooled layer [PoolSizeX, PoolSizeY], the position of the second window of the first basic feature layer is:

The coordinates of the upper left point are: (X2_lt, Y2_lt) = (X1_lt, Y1_lt);

The coordinates of the lower right point are: (X1_rb, Y1_rb) = (X1_lt + PoolOutX_2 × PoolSizeX, Y1_lt + PoolOutY_2 × PoolSizeY).

The target matching method based on the convolutional neural network provided by the embodiment of the present invention further sets the pooling step size of the pooling layer to the same value as the pooling window size.

S204. Input the first basic feature layer of the second window to the pooling layer corresponding to the pooling parameter for feature extraction, to obtain a pooling feature.

Specifically, the pooling layer is configured according to each of the pooling parameters, and the first basic feature layer in the second window is used as an input to generate a pooling feature. Let the base feature layer contain C channels, then the dimension of the local pooling feature is [PoolOutX, PoolOutY, C].

The target matching method based on the convolutional neural network provided by the embodiment of the present invention adopts a traversal matching manner to achieve target matching of the second image to the first image, and the traversal matching in the embodiment of the present invention obtains a matching score. The value map is traversed and matched to the to-be-matched area in the second image to obtain correlation information of each to-be-matched area and the target area in the first image. Referring to FIG. 4, the process of generating the matching score map is specifically implemented by the following steps, where the method further includes:

S301. Extract a second basic feature layer of the second image based on the pre-acquired CNN.

In order to better perform the matching between the second image and the first image, the target matching method based on the convolutional neural network provided by the embodiment of the present invention performs scaling processing on the second image before performing feature extraction on the second image. Therefore, referring to FIG. 5, the feature extraction of the second image is specifically implemented by the following steps:

S3011: Perform a scaling process on the second image according to the first image to obtain a second image after the scaling process.

S3012: Extract a second basic feature layer of the second image after the scaling process based on the pre-acquired CNN.

Specifically, the second image is first scaled to a size corresponding to the first image. For image retrieval, the size of the second image should be similar to the first image, and for image tracking, the second image and The first image has the same size; then the second base feature layer of the scaled second image is extracted using the same CNN as the first image.

S302. Configure a matching convolution layer and a modulus convolution layer for the second basic feature layer. The convolution kernel used by the matching convolution layer and the modulus convolution layer are all normalized cells taken from the first image. The characterization feature, the normalization pooling feature is obtained by normalizing the pooling features.

S303. Obtain a matching score map of each of the to-be-matched regions of the second image with respect to the target region of the first image according to a ratio relationship between the output of the matching convolutional layer and the output of the modulus convolutional layer.

Specifically, the object matching method based on the convolutional neural network provided by the embodiment of the present invention is that the second basic feature layer is configured to match the volume base layer and the modulus value convolution layer respectively to be configured in the configured pooling layer to be matched and the configured modulus value. Based on the calculation layer, wherein the matching basic convolution layer is configured for the second basic feature layer, as shown in FIG. 6, the following steps are specifically implemented:

S401. The window size of the pooling layer and the window traversal granularity are used to configure the pooling layer to be matched for the second basic feature layer, so that the output of the second basic feature layer according to the pooled layer to be matched is performed according to the window size of the pooling layer. Pool processing.

S402. Configure a matching convolution layer for the pooled layer to be matched according to the normalized pooling feature, so as to perform convolution processing according to the normalized pooling feature according to the output of the matching convolution layer to be matched with the pooling layer.

Specifically, the object matching method based on the convolutional neural network provided by the embodiment of the present invention first configures a pooling layer to be matched on the second basic feature layer. The window size of the pooled layer to be matched is the same as the pooled window size of the first image pooling layer. In addition, the pooling step size of the pooled layer to be matched [PoolStepX2, PoolStepY2] represents the granularity of the window traversal, so the step size can be a preset value or an integer that increases as the size of the pooled window increases. The step size ranges from 1 to the pooled window size. The embodiments of the present invention are not specifically limited to meet the different needs of different users.

In addition, in the embodiment of the present invention, the matching convolution layer is further disposed in the above-mentioned pooling layer to be matched. The matching convolution layer uses the normalized pooled feature extracted by the first image as the convolution kernel of the matching convolution layer of the second image, and the dimension is [PoolOutX, PoolOutY, C]. Let the dimension of the output of the pooled layer of the second image to be matched be [W2, H2, C], then the dimension of the output of the matching convolution layer is [W2, H2, 1], and each spatial position represents a part with the first image. The matching value of the feature.

The normalization pooling feature is a result of normalizing the pooled features, and the embodiment of the present invention normalizes by first calculating the pooling feature in the spatial dimension [PoolOutX, PoolOutY]. The modulus of the C-dimensional vector for each position, and the modules for each position are accumulated. The pooled feature is then divided by the accumulated modulus to obtain a normalized pooling feature.

In addition, the foregoing is configured to configure a modulus convolution layer for the second basic feature layer. Referring to FIG. 7, the following steps are specifically implemented:

S501. Configure a modulus calculation layer according to the modulo operation to match the pooling layer, and perform normalization processing on the output of the pooled layer to be matched according to the modulus calculation layer.

S502. Configure a modulus convolution layer for the modulus calculation layer according to the normalization pooling feature, so as to perform convolution processing on the output of the modulus calculation layer according to the normalized pooling feature according to the modulus convolution layer.

Specifically, the modulus value of the C-dimensional feature of each position is first calculated by the modulus calculation layer, and the modulus value of the dimension [PoolOutX, PoolOutY, 1] is output. Then, a modulus convolution layer is disposed on the modulus calculation layer, and the convolution kernel size, the convolution step size and the like of the convolution layer are the same as the matching convolution layer, and the number of input and output channels is 1, and the convolution kernel value is All are 1 and the offset is 0. Let the dimension of the second image base feature layer be [W2, H2, C], and the dimension of the modulo convolution layer output is [W2, H2, 1].

The target matching method based on the convolutional neural network provided by the embodiment of the present invention divides two scalar image points of the output of the matched volume base layer and the modulus convolution layer according to the above configuration, and obtains a target area in the first image. The matching score map of the pooled features in each of the to-be-matched regions in the second image.

In order to ensure that the scope of each pixel of the convolution kernel used for convolution processing of the second image is the same as the scope of the target region of the first image, referring to FIG. 8, the convolutional neural network based on the embodiment of the present invention is provided. The matching convolution layer configuration process in the target matching method is specifically implemented by the following steps:

S601: Perform a hole-polishing process on the normalized pooling feature according to the difference between the window size of the pooling layer and the window traversal granularity, and obtain a normalized pooling feature after the hole processing.

S602. Configure a matching convolution layer for the pooled layer to be matched according to the normalized pooling feature after the hole processing.

Specifically, the object matching method based on the convolutional neural network provided by the embodiment of the present invention is to use the normalized pooling feature of the first image as the convolution kernel of the matching convolution layer of the second image, and the convolution kernel The core is added with holes, and the dimension of the hole is the pooling window size of the pre-matching pooling layer minus the pooling step size of the pre-matching pooling layer (ie, window traversal granularity), that is, [PoolSizeX-PoolStepX2, PoolSizeY-PoolStepY2]. Then, according to the normalized pooling feature after the hole processing, the matching matching layer is configured to match the pooling layer, and the offset of the matching convolution layer is 0, and the convolution step is 1.

The so-called hole addition can be equivalent to filling the original convolution kernel with every other pixel by 0, and the equivalent convolution kernel size after filling is [PoolOutX+PoolSizeX-PoolStepX2, PoolOutY+PoolSizeY-PoolStepY2], and In the actual convolution operation, the program can skip the calculation of the zero position and thus does not increase the amount of calculation.

Referring to FIG. 9a and FIG. 9b, an embodiment of the present invention provides a matching diagram of a convolution core after hole-adding, wherein the size of the convolution kernel is [2, 2], and the size of the hole is [1, 1]. The dot matrix represents the underlying feature layer. The pooling window size, pooling step size, and pooling output size of the pooled layer in the first image (Fig. 9a) are both [2, 2]. The pooled window size of the pooled layer to be matched in the second image (Fig. 9b) is [2, 2], and the pooling step size is [1, 1]. The matching convolutional layer convolution kernel size of the second image is [2, 2]. When no holes are added, the scope of each pixel of the [2, 2] convolution kernel (as shown by the thin line frame) overlaps, which is different from the local feature of the first image; when the hole of [1, 1] is added, the volume The nucleation scope (as indicated by the thick line box) is the same as the first image local feature.

In addition, referring to FIG. 10, the configuration process of the modulus convolution layer in the target matching method based on the convolutional neural network provided by the embodiment of the present invention is specifically implemented by the following steps:

S701. Perform a hole-polishing process on the normalized pooling feature according to a difference between a window size of the pooling layer and a window traversing granularity, to obtain a normalized pooling feature after the hole processing.

S702. Configure a modulus convolution layer for the modulus calculation layer according to the normalized pooling feature after the hole processing.

Specifically, the convolution kernel size, the convolution step size, and the hole-adding layer of the modulus convolution layer in the embodiment of the present invention are the same as the matching convolution layer. Similarly, the process of the above-mentioned hole-punching process is also I will not repeat them here. Then, the modulus convolution layer is configured on the modulus calculation layer according to the normalized pooling feature after the hole processing after the hole processing.

For determining the matching score map obtained by the traversal matching, in order to better determine the target area of the second image relative to the first image, the determining process of the foregoing S104 is specifically implemented by the following steps, the method further comprising:

Select the area to be matched corresponding to the highest score in the matching score map as the target in the second image. Standard area.

Specifically, for the generated matching score map, the matching score map refers to matching the degree of matching of each to-be-matched region of the second image with respect to the target region of the first image, and the matching score of the corresponding pixel The higher the value, the more similar the target area to the first image is. The embodiment of the present invention selects the to-be-matched area corresponding to the highest score of the matching score map as the target area in the second image.

The target matching method based on the convolutional neural network provided by the embodiment of the present invention is less accurate than the point feature matching method and the line feature matching method in the prior art, and the surface feature matching method is less efficient than the Firstly, the first image and the second image are acquired, and then the target region in the first image is calculated by the pooling feature, and the second image is traversed and matched based on the calculated pooling feature again, and finally the matching is obtained according to the traversal matching. The score map determines a target area in the second image, and uses the pooled feature of the first image to perform traversal matching on the second image, and the matching accuracy is better and the efficiency is higher.

The embodiment of the present invention further provides a target matching device based on a convolutional neural network, where the device is used to perform the above-described convolutional neural network-based target matching method. Referring to FIG. 11, the device includes:

The obtaining module 11 is configured to acquire the first image and the second image;

The calculating module 22 is configured to calculate a pooling feature of the target area in the first image;

The generating module 33 is configured to perform traversal matching on the second image based on the pooling feature to obtain a corresponding matching score map.

The determining module 44 is configured to determine a target area in the second image based on the matching score map.

In order to better calculate the pooling feature of the target area in the first image, the calculation module 22 includes:

In order to determine the position of the second window of the first basic feature layer according to the position of the first window of the first basic feature layer, the determining submodule comprises:

The target matching device based on the convolutional neural network provided by the embodiment of the present invention adopts a traversal matching manner to achieve target matching of the second image to the first image, and the traversal matching in the embodiment of the present invention obtains a matching score. The value map is traversed and matched to the to-be-matched area in the second image to obtain correlation information of each to-be-matched area and the target area in the first image. The target matching device based on the convolutional neural network provided by the embodiment of the present invention further includes a generating module 33, and the generating module 33 includes:

The configuration submodule is configured to respectively configure a matching convolution layer and a modulus volume for the second basic feature layer. The stacking kernel used in the matching convolution layer and the modulus convolution layer is a normalized pooling feature taken from the first image, and the normalized pooling feature is to normalize the pooling feature. Processed

In order to better perform the matching between the second image and the first image, the target matching device based on the convolutional neural network provided by the embodiment of the present invention performs scaling processing on the second image before performing feature extraction on the second image. Therefore, the second extraction submodule includes:

The target matching device method based on the convolutional neural network provided by the embodiment of the present invention is configured to configure a matching volume base layer and a modulus value convolution layer respectively in the configured pooling layer to be matched and the configured modulus value. Based on the computing layer, the configuration sub-module includes:

In order to ensure that the scope of each pixel of the convolution kernel used in the convolution process of the second image is the same as the scope of the target region of the first image, the object matching device based on the convolutional neural network provided by the embodiment of the present invention is provided. The second configuration unit in the middle includes:

In addition, the fourth configuration unit in the target matching device based on the convolutional neural network provided by the embodiment of the present invention includes:

For the matching score map obtained by the traversal matching, in order to better determine the target area of the second image relative to the first image, the determining module 44 is further configured to select the highest score corresponding to the matching score map to be matched. The area serves as a target area in the second image.

In the embodiment of the present invention, the acquisition module 11, the calculation module 22, the generation module 33, and the determination module 44 in the target matching device based on the convolutional neural network, and the submodules included in the above modules are in practical applications. The central processing unit (CPU), the digital signal processor (DSP), the micro control unit (MCU), or the programmable gate array (FPGA, Field-Programmable) can be used in the device. Gate Array) implementation.

The target matching device based on the convolutional neural network provided by the embodiment of the present invention has poor accuracy with the point feature matching method and the line feature matching method in the prior art, and the surface feature matching method is less efficient than the Firstly, the first image and the second image are acquired, and then the target region in the first image is calculated by the pooling feature, and the second image is traversed and matched based on the calculated pooling feature again, and finally the matching is obtained according to the traversal matching. The score map determines a target area in the second image, and uses the pooled feature of the first image to perform traversal matching on the second image, and the matching accuracy is better and the efficiency is higher.

In addition, the object matching method and apparatus based on the convolutional neural network provided by the embodiments of the present invention can also be applied to image retrieval and image tracking, wherein the application of the image retrieval can bring the following technical effects:

1. Using deep learning technology to improve the robustness of the sliding frame method;

A sliding window traversal method with high computational efficiency and convenient parallelization is proposed.

Applied to image tracking, it can also bring the following technical effects:

1. Based on deep learning technology, the success rate and stability of tracking are improved;

2. The neural network is not required to be trained during the initial stage of tracking and tracking, which greatly reduces the time consumption of single target tracking;

3. In multi-target tracking, each tracking shares the basic feature layer. Compared with the calculation amount of the basic feature layer, the individual operation amount of each tracking is very small, so it is suitable for real-time video multi-target tracking.

A computer program product for performing a method for target matching based on a convolutional neural network according to an embodiment of the present invention, comprising a computer readable storage medium storing program code, the program code comprising instructions operable to execute the foregoing method embodiment For the specific implementation of the method, refer to the method embodiment, and details are not described herein again.

The device for matching the target based on the convolutional neural network provided by the embodiment of the present invention may be specific hardware on the device or software or firmware installed on the device. The implementation principle and the technical effects produced by the device provided by the embodiments of the present invention are the same as those of the foregoing method embodiments, and are briefly described. Where the device embodiment is not mentioned, reference may be made to the corresponding content in the foregoing method embodiments. A person skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working processes of the foregoing system, the device and the unit can refer to the corresponding processes in the foregoing method embodiments, and details are not described herein again.

In the embodiments provided by the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. The device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some communication interface, device or unit, and may be electrical, mechanical or otherwise.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, each functional unit in the embodiment provided by the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.

The functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product. Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including The instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention. The foregoing storage medium includes: a U disk, a mobile hard disk, a read only memory (ROM, Read-Only) Memory, random access memory (RAM), disk or optical disk, and other media that can store program code.

It should be noted that similar reference numerals and letters indicate similar items in the following figures. Therefore, once an item is defined in a drawing, it is not necessary to further define and explain it in the subsequent drawings. Moreover, the terms "first", "second", "third", and the like are used merely to distinguish a description, and are not to be construed as indicating or implying a relative importance.

Finally, it should be noted that the above-mentioned embodiments are merely specific embodiments of the present invention, and are used to explain the technical solutions of the present invention, and are not limited thereto, and the scope of protection of the present invention is not limited thereto, although reference is made to the foregoing. The present invention has been described in detail, and those skilled in the art should understand that any one skilled in the art can still modify the technical solutions described in the foregoing embodiments within the technical scope disclosed by the present invention. The changes may be easily conceived, or equivalents may be substituted for some of the technical features. The modifications, variations, and substitutions of the present invention do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention. All should be covered by the scope of the present invention. Therefore, the scope of the invention should be determined by the scope of the claims.

Industrial applicability

The technical solution of the embodiment of the present invention first acquires the first image and the second image, and then performs the calculation of the pooled feature on the target region in the first image, and performs traversal matching on the second image again based on the calculated pooled feature. Finally, the target region in the second image is determined according to the matching score map obtained by the traversal matching, and the second image is traversed and matched by using the pooling feature of the first image, and the matching accuracy is better and the efficiency is higher.

Claims

A target matching method based on convolutional neural network, comprising:

Obtaining the first image and the second image;

Calculating a pooling feature of the target area in the first image;

Performing traversal matching on the second image based on the pooling feature to obtain a corresponding matching score map;

Determining a target area in the second image according to the matching score map.
The method of claim 1, wherein the calculating the pooling characteristics of the target area in the first image comprises:

Extracting a first basic feature layer of the first image based on a pre-acquired convolutional neural network CNN;

Calculating a position of the first window in the first base feature layer relative to the target area according to a location of the target area in the first image and a reduction ratio of the CNN;

Determining a location of the second window of the first base feature layer based on the preset pooling parameter and the location of the first window;

The first basic feature layer of the second window is input to the pooling layer corresponding to the pooling parameter for feature extraction, and a pooling feature is obtained.
The method of claim 2, wherein the determining the location of the second window of the first base feature layer based on the preset pooling parameter and the location of the first window comprises:

Calculating a first output size of the pooling layer according to a preset minimum window size of the pooled layer and a position of the first window;

Calculating a second output size of the pooling layer according to a preset maximum output size of the pooled layer and the first output size;

Calculating a window size of the pooling layer according to the second output size and a position of the first window;

Calculating a position of the second window of the first base feature layer according to the second output size and the window size.
The method according to claim 3, wherein the traversing matching of the second image based on the pooling feature to obtain a corresponding matching score map comprises:

Extracting a second basic feature layer of the second image based on the pre-acquired CNN;

Configuring a matching convolution layer and a modulus convolution layer for the second basic feature layer; wherein the matching convolution layer and the convolution kernel used by the modulus convolution layer are all taken from the first a normalized pooling feature of the image, wherein the normalized pooling feature is obtained by normalizing the pooled feature;

And obtaining, according to a ratio relationship between an output of the matching convolution layer and an output of the modulo convolution layer, a matching score of each of the to-be-matched regions of the second image with respect to a target region of the first image Figure.
The method according to claim 4, wherein the extracting the second basic feature layer of the second image based on the pre-acquired CNN comprises:

Performing a scaling process on the second image according to the first image to obtain a second image after the scaling process;

And extracting, according to the pre-acquired CNN, the second basic feature layer of the second image after the scaling process.
The method of claim 4, wherein configuring the matching convolution layer for the second base feature layer comprises:

And configuring, according to the window size of the pooling layer and the window traversal granularity, a pooling layer to be matched for the second basic feature layer, according to the output of the second basic feature layer according to the pooled layer to be matched according to the pool The window size of the layer is pooled;

And configuring, according to the normalized pooling feature, a matching convolution layer for the pooled layer to be matched, to perform volume according to the normalized pooling feature according to the output of the matching convolution layer to be matched with the pooling layer Product processing

The configuring the modulus convolution layer for the second basic feature layer includes:

And configuring a modulus calculation layer on the to-be-matched pooling layer to perform normalization processing on the output of the to-be-matched pooling layer according to the modulus calculation layer;

And configuring a modulus convolution layer for the modulus calculation layer according to the normalization pooling feature, to perform convolution processing according to the normalized pooling feature according to the output of the modulus value convolution layer .
The method of claim 6, wherein the configuring the matching convolution layer for the to-be-matched pooling layer according to the normalized pooling feature comprises:

Performing a hole-polishing process on the normalized pooling feature according to a difference between a window size of the pooling layer and a window traversing granularity, to obtain a normalized pooling feature after the hole processing;

And matching the convolution layer to the pooled layer to be matched according to the normalized pooling feature after the hole processing.
The method of claim 6, wherein the configuring the modulus convolution layer for the modulus calculation layer according to the normalization pooling feature comprises:

Performing a hole-polishing process on the normalized pooling feature according to a difference between a window size of the pooling layer and a window traversing granularity, to obtain a normalized pooling feature after the hole processing;

And configuring a modulus convolution layer for the modulus calculation layer according to the normalized pooling feature after the hole processing.
The method of claim 4, wherein the determining the target area in the second image based on the matching score map comprises:

The area to be matched corresponding to the highest score in the matching score map is selected as the target area in the second image.
A target matching device based on a convolutional neural network, comprising:

Obtaining a module, configured to acquire the first image and the second image;

a calculation module configured to calculate a pooling feature of the target area in the first image;

a generating module, configured to perform traversal matching on the second image based on the pooling feature, to obtain a corresponding matching score map;

a determining module configured to determine a target area in the second image based on the matching score map.
The apparatus of claim 10 wherein said computing module comprises:

a first extraction submodule configured to extract a first basic feature layer of the first image based on the pre-acquired CNN;

a calculating submodule configured to calculate a position of the first window in the first base feature layer relative to the target area according to the position of the target area in the first image and the dimensionality reduction ratio of the CNN;

Determining a submodule configured to determine a location of the second window of the first basic feature layer based on the preset pooling parameter and the location of the first window;

The first generation sub-module is configured to input the first basic feature layer of the second window to the pooling layer corresponding to the pooling parameter for feature extraction, to obtain a pooling feature.
The apparatus of claim 11 wherein said determining sub-module comprises:

a first calculating unit, configured to calculate a first output size of the pooling layer according to a preset minimum window size of the pooling layer and a position of the first window;

a second calculating unit, configured to calculate a second output size of the pooling layer according to a preset maximum output size of the pooled layer and a first output size;

a third calculating unit, configured to calculate a window size of the pooling layer according to the second output size and the position of the first window;

And a fourth calculating unit configured to calculate a position of the second window of the first basic feature layer according to the second output size and the window size.
The apparatus of claim 12, wherein the generating module comprises:

a second extraction submodule configured to extract a second basic feature layer of the second image based on the pre-acquired CNN;

a configuration submodule configured to respectively configure a matching convolution layer and a modulus convolution layer for the second basic feature layer; wherein the convolution kernel used by the matching convolution layer and the modulus convolution layer are all taken from the first image The normalized pooling feature is obtained by normalizing the pooling features;

a second generating unit submodule configured to obtain, according to a ratio relationship between an output of the matching convolution layer and an output of the modulo convolution layer, obtaining a target region of the second image relative to the target region of the first image Match the score map.
The apparatus of claim 13 wherein said second extraction sub-module comprises:

a scaling unit configured to perform a scaling process on the second image according to the first image to obtain a second image after the scaling process;

And an extracting unit configured to extract a second basic feature layer of the second image after the scaling process based on the pre-acquired CNN.
The apparatus of claim 13 wherein said configuration sub-module comprises:

The first configuration unit is configured to configure a pooling layer to be matched according to the window size of the pooling layer and the window traversal granularity, so as to be pooled according to the output of the pooled layer to be matched to the second basic layer The window size of the layer is pooled;

a second configuration unit, configured to configure a matching convolution layer for the pooled layer to be matched according to the normalized pooling feature, to perform convolution processing according to the normalized pooling feature according to the output of the matching convolution layer to be matched with the pooling layer ;

a third configuration unit, configured to configure a modulus calculation layer to be matched with the pooling layer based on the modulus value operation, to perform normalization processing on the output of the pooled layer to be matched according to the modulus calculation layer;

a fourth configuration unit configured to configure a modulus convolution layer for the modulus calculation layer according to the normalization pooling feature, to convolute the output of the modulus calculation layer according to the normalized pooling feature according to the modulus convolution layer deal with.
The apparatus of claim 15, wherein the second configuration unit comprises:

a first hole subunit configured to traverse the granularity according to a window size and a window of the pooling layer The result of the difference calculation is performed on the normalized pooling feature to obtain the normalized pooling feature after the hole processing;

The first configuration subunit is configured to configure a matching convolution layer for the pooled layer to be matched according to the normalized pooling feature after the hole processing.
The apparatus of claim 15, wherein the fourth configuration unit comprises:

The second hole-adding sub-unit is configured to perform a hole-polishing process on the normalized pooling feature according to the difference between the window size of the pooling layer and the window traversing granularity, and obtain a normalized pooling after the hole processing feature;

The second configuration subunit is configured to configure the modulus convolution layer for the modulus calculation layer according to the normalized pooling feature after the hole processing.
The apparatus according to claim 13, wherein the determining module is further configured to select a to-be-matched region corresponding to the highest score in the matching score map as the target region in the second image.
A storage medium storing computer-executable instructions for performing the convolutional neural network-based target matching method according to any one of claims 1 to 9.