WO2018036146A1 - Convolutional neural network-based target matching method, device and storage medium - Google Patents

Convolutional neural network-based target matching method, device and storage medium Download PDF

Info

Publication number
WO2018036146A1
WO2018036146A1 PCT/CN2017/077579 CN2017077579W WO2018036146A1 WO 2018036146 A1 WO2018036146 A1 WO 2018036146A1 CN 2017077579 W CN2017077579 W CN 2017077579W WO 2018036146 A1 WO2018036146 A1 WO 2018036146A1
Authority
WO
WIPO (PCT)
Prior art keywords
layer
pooling
feature
image
matching
Prior art date
Application number
PCT/CN2017/077579
Other languages
French (fr)
Chinese (zh)
Inventor
任鹏远
石园
许健
万定锐
Original Assignee
东方网力科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 东方网力科技股份有限公司 filed Critical 东方网力科技股份有限公司
Publication of WO2018036146A1 publication Critical patent/WO2018036146A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/48Matching video sequences

Definitions

  • the present invention relates to the field of machine vision technology, and in particular, to a target matching method, device and storage medium based on a convolutional neural network.
  • video surveillance mainly captures video images by setting a camera that captures environmental information, and transmits the captured video images to a control platform for analysis processing, such as tracking of targets in the video images.
  • the general process includes: after the target enters the video surveillance area, because the target is moving, the system captures the image of the target in the current frame as a template, and finds the target after the target frame is matched by the target frame in the next frame of the video image. position. It can be seen that how to accurately perform target matching is the key to video image tracking.
  • target matching is also the core of technologies such as image recognition, image retrieval, and image annotation.
  • the target matching means that the front and rear video frames or a plurality of pre-selected image frames are associated, and the matching target matching the target in the previous image frame is found from the latter image frame.
  • the associated methods are primarily related by features.
  • a target matching method such as point feature template matching, line feature template matching, and surface feature template matching is generally adopted.
  • the point feature matching method has poor matching accuracy when the target contrast is low, or there is no obvious focus feature;
  • the line feature matching method is not obvious when the target edge is not obvious, or the target has large deformation, the matching accuracy is also better.
  • Difference although the surface feature matching method improves the accuracy of matching, it has a large amount of computation and low efficiency.
  • the embodiments of the present invention are directed to a target matching method, apparatus, and storage medium based on a convolutional neural network, which adopts pooling features for ergodic matching, and the accuracy and efficiency of matching are high.
  • an embodiment of the present invention provides a target matching method based on a convolutional neural network, where the method includes:
  • the calculating the pooling feature of the target area in the first image comprises:
  • CNN Convolutional Neural Networks
  • the first basic feature layer of the second window is input to the pooling layer corresponding to the pooling parameter for feature extraction, and a pooling feature is obtained.
  • determining the location of the second window of the first basic feature layer based on the preset pooling parameter and the location of the first window comprises:
  • the traversing matching is performed on the second image based on the pooling feature to obtain a corresponding matching score map, including:
  • the extracting the second basic feature layer of the second image based on the pre-acquired CNN includes:
  • configuring the matching convolution layer for the second basic feature layer comprises:
  • Configuring a modulus convolution layer for the second base feature layer including:
  • the configuring a matching convolution layer for the to-be-matched pooling layer according to the normalized pooling feature includes:
  • the configuring a modulus convolution layer for the modulus calculation layer according to the normalization pooling feature includes:
  • the determining, according to the matching score map, the target area in the second image comprises:
  • the area to be matched corresponding to the highest score in the matching score map is selected as the target area in the second image.
  • an embodiment of the present invention further provides a target matching based on a convolutional neural network.
  • Apparatus comprising:
  • a calculation module configured to calculate a pooling feature of the target area in the first image
  • a generating module configured to perform traversal matching on the second image based on the pooling feature, to obtain a corresponding matching score map
  • a determining module configured to determine a target area in the second image based on the matching score map.
  • the calculation module comprises:
  • a first extraction submodule configured to extract a first basic feature layer of the first image based on the pre-acquired CNN
  • a calculating submodule configured to calculate a position of the first window in the first base feature layer relative to the target area according to the position of the target area in the first image and the dimensionality reduction ratio of the CNN;
  • Determining a submodule configured to determine a location of the second window of the first basic feature layer based on the preset pooling parameter and the location of the first window;
  • the first generation sub-module is configured to input the first basic feature layer of the second window to the pooling layer corresponding to the pooling parameter for feature extraction, to obtain a pooling feature.
  • the determining submodule comprises:
  • a first calculating unit configured to calculate a first output size of the pooling layer according to a preset minimum window size of the pooling layer and a position of the first window;
  • a second calculating unit configured to calculate a second output size of the pooling layer according to a preset maximum output size of the pooled layer and a first output size
  • a third calculating unit configured to calculate a window size of the pooling layer according to the second output size and the position of the first window
  • a fourth calculating unit configured to calculate a position of the second window of the first basic feature layer according to the second output size and the window size.
  • the generating module includes:
  • a second extraction submodule configured to extract a second basic feature layer of the second image based on the pre-acquired CNN
  • a configuration submodule configured to respectively configure a matching convolution layer and a modulus convolution layer for the second basic feature layer; wherein the convolution kernel used by the matching convolution layer and the modulus convolution layer are all taken from the first image
  • the normalized pooling feature is obtained by normalizing the pooling features
  • a second generating unit submodule configured to obtain, according to a ratio relationship between an output of the matching convolution layer and an output of the modulo convolution layer, obtaining a target region of the second image relative to the target region of the first image Match the score map.
  • the second extraction submodule comprises:
  • a scaling unit configured to perform a scaling process on the second image according to the first image to obtain a second image after the scaling process
  • an extracting unit configured to extract a second basic feature layer of the second image after the scaling process based on the pre-acquired CNN.
  • the configuration submodule includes:
  • the first configuration unit is configured to configure a pooling layer to be matched according to the window size of the pooling layer and the window traversal granularity, so as to be pooled according to the output of the pooled layer to be matched to the second basic layer
  • the window size of the layer is pooled
  • a second configuration unit configured to configure a matching convolution layer for the pooled layer to be matched according to the normalized pooling feature, to perform convolution processing according to the normalized pooling feature according to the output of the matching convolution layer to be matched with the pooling layer ;
  • a third configuration unit configured to configure a modulus calculation layer to be matched with the pooling layer based on the modulus value operation, to perform normalization processing on the output of the pooled layer to be matched according to the modulus calculation layer;
  • a fourth configuration unit configured to configure a modulus convolution layer for the modulus calculation layer according to the normalization pooling feature, to perform volume according to the normalized pooling feature according to the modulus convolution layer Product processing.
  • the second configuration unit comprises:
  • the first hole-adding sub-unit is configured to perform a hole-polishing process on the normalized pooling feature according to the difference between the window size of the pooling layer and the window traversing granularity, and obtain a normalized pooling after the hole processing feature;
  • the first configuration subunit is configured to configure a matching convolution layer for the pooled layer to be matched according to the normalized pooling feature after the hole processing.
  • the fourth configuration unit comprises:
  • the second hole-adding sub-unit is configured to perform a hole-polishing process on the normalized pooling feature according to the difference between the window size of the pooling layer and the window traversing granularity, and obtain a normalized pooling after the hole processing feature;
  • the second configuration subunit is configured to configure the modulus convolution layer for the modulus calculation layer according to the normalized pooling feature after the hole processing.
  • the determining module is further configured to select a to-be-matched region corresponding to the highest score in the matching score map as the target region in the second image.
  • the embodiment of the present invention further provides a storage medium, where the computer-executable instructions are stored in the storage medium, and the computer-executable instructions are used to execute the convolutional neural network based on the embodiment of the present invention.
  • Target matching method
  • the object matching method, device and storage medium based on convolutional neural network provided by the embodiments of the present invention are less accurate than the point feature matching method and the line feature matching method in the prior art, and the efficiency of the surface feature matching method is better.
  • it first acquires the first image and the second image, and secondly calculates the pooled feature of the target region in the first image, and performs traversal matching on the second image based on the calculated pooled feature again, and finally according to The matching score map obtained by traversing the matching determines the target region in the second image, and adopts the pooling feature of the first image to perform traversal matching on the second image, and the matching accuracy is better and the efficiency is higher.
  • FIG. 1 is a flowchart of a target matching method based on a convolutional neural network according to an embodiment of the present invention
  • FIG. 2 is a flowchart of another method for matching a target based on a convolutional neural network according to an embodiment of the present invention
  • FIG. 3 is a flowchart of another method for matching a target based on a convolutional neural network according to an embodiment of the present invention
  • FIG. 4 is a flowchart of another method for matching a target based on a convolutional neural network according to an embodiment of the present invention
  • FIG. 5 is a flowchart of another method for matching a target based on a convolutional neural network according to an embodiment of the present invention
  • FIG. 6 is a flowchart of another method for matching a target based on a convolutional neural network according to an embodiment of the present invention.
  • FIG. 7 is a flowchart of another method for matching a target based on a convolutional neural network according to an embodiment of the present invention.
  • FIG. 8 is a flowchart of another method for matching a target based on a convolutional neural network according to an embodiment of the present invention.
  • 9a and 9b are schematic diagrams showing matching of a convolution kernel in a target matching method based on a convolutional neural network according to an embodiment of the present invention.
  • FIG. 10 is a flowchart of another method for matching a target based on a convolutional neural network according to an embodiment of the present invention.
  • FIG. 11 is a schematic structural diagram of a target matching apparatus based on a convolutional neural network according to an embodiment of the present invention.
  • an embodiment of the present invention provides a target matching method and apparatus based on a convolutional neural network, which has high accuracy and efficiency of target matching by traversal matching of pooled features.
  • FIG. 1 is a flowchart of a method for matching a target based on a convolutional neural network according to an embodiment of the present invention, where the method specifically includes the following steps:
  • the target matching based on the convolutional neural network provided by the embodiment of the present invention is considered.
  • the target matching method based on the convolutional neural network provided by the embodiment of the present invention needs to acquire the first image and the second image.
  • the object matching method based on the convolutional neural network provided by the embodiment of the present invention can be applied not only to image retrieval but also to image tracking.
  • the image retrieval system the first image is a query image input by the user, and the second image is all images stored in advance; for the target tracking system, the first image is an initial frame or a current frame image, and the second image is a lower image. One frame of image.
  • the obtained first image is selected as a target area, and then the selected target area is calculated by using the pooled feature.
  • the frame selection of the target area may be performed by a manual method or by a related computer program, and the target area is preferably selected as a rectangle in the embodiment of the present invention.
  • the target area mainly includes areas where people, faces, objects, and the like are more interested.
  • the calculation of the above-mentioned pooling feature is mainly performed by using a depth neural network to perform corresponding window determination on each computing layer, so as to use the pooled feature of the determined window as the image pooling feature of the target region in the first image.
  • the traversal matching of the second image will be performed with the image pooling feature as a convolution kernel.
  • the pooled feature calculated by the first image it will be used as a convolution kernel of the second image, and convolved on the feature layer outputted by the pooled layer of the second image to obtain each to-be-matched region.
  • the matching score of the target area of the first image is determined, and finally the target area in the second image is determined from the to-be-matched area according to the corresponding matching score map.
  • the target matching method based on the convolutional neural network provided by the embodiment of the present invention is less accurate than the point feature matching method and the line feature matching method in the prior art, and the surface feature matching method is less efficient than the First acquiring the first image and the second image, and secondly for the first image
  • the target area is calculated by the pooling feature, and the second image is traversed and matched based on the calculated pooling feature.
  • the target region in the second image is determined according to the matching score map obtained by the traversal matching, and the first image is adopted.
  • the pooling feature performs traversal matching on the second image, and the matching accuracy is better and the efficiency is higher.
  • the calculation process of the above S102 is specifically implemented by the following steps. Referring to the flowchart shown in FIG. 2, the method further includes:
  • the target matching method based on the convolutional neural network inputs the first image as an input layer into the pre-trained CNN, and uses the CNN output as the basic feature layer.
  • the position of the first window in the first basic feature layer relative to the target area is calculated according to the position of the target area in the first image and the dimensionality reduction ratio of the CNN.
  • the size of the first image is [W1_0, H1_0]
  • the dimensionality reduction ratio of the convolutional neural network is R
  • the coordinates of the upper left corner of the rectangular target area selected in the first image are (X0_lt, Y0_lt)
  • the coordinates of the lower right corner point are (X0_rb, Y0_rb)
  • the position of the first window of the corresponding first basic feature layer is:
  • the pre-trained CNN in the embodiment of the present invention can target the target.
  • the neural network for deep learning of features in the region since the feature detection layer of CNN learns through training data, when using CNN, explicit feature extraction is avoided, and learning is implicitly learned from the training data; Same
  • the weights of the neurons on the feature map are the same, so the network can learn in parallel, which is also a big advantage of the convolutional network relative to the network connected to the neurons.
  • S203 Determine a location of the second window of the first basic feature layer based on the preset pooling parameter and the location of the first window.
  • the determining process of the position of the second window is specifically implemented by the following steps:
  • S2031 Calculate a first output size of the pooling layer according to a preset minimum window size of the pooled layer and a position of the first window.
  • S2032 Calculate a second output size of the pooling layer according to a preset maximum output size of the pooled layer and a first output size.
  • the determining of the second window of the first basic feature layer by the target matching method based on the convolutional neural network provided by the embodiment of the present invention is based on the preset pooling parameter and the position of the first window.
  • a specific implementation of an embodiment of the present invention is as follows:
  • the first output size of the pooled layer is calculated according to a preset minimum window size of the pooled layer and a position of the first window.
  • the minimum window size of the pooled layer is [MinPoolX, MinPoolY]
  • the coordinates of the upper left point (X1_lt, Y1_lt) and the coordinates of the lower right point (X1_rb, Y1_rb) of the first window of the first basic feature layer calculated above are known.
  • the first output size [PoolOutX_1, PoolOutY_1] of the pooled layer is:
  • the second output size of the pooled layer is calculated according to a preset maximum output size of the pooled layer and a first output size. Assume that the maximum output size of the pooled layer is [MaxPoolOutX, MaxPoolOutY], from the first output size [PoolOutX_1, PoolOutY_1], the second output size [PoolOutX_2, PoolOutY_2] of the pooled layer is:
  • the window size of the pooled layer is calculated based on the second output size and the position of the first window. From the second output size [PoolOutX_2, PoolOutY_2] and the upper left point coordinates (X1_lt, Y1_lt) and the lower right point coordinates (X1_rb, Y1_rb) of the first window of the first basic feature layer, it can be known that the window size of the pooled layer [ PoolSizeX, PoolSizeY] is:
  • the position of the second window of the first base feature layer is calculated based on the second output size and the window size. From the second output size [PoolOutX_2, PoolOutY_2] and the window size of the pooled layer [PoolSizeX, PoolSizeY], the position of the second window of the first basic feature layer is:
  • the target matching method based on the convolutional neural network provided by the embodiment of the present invention further sets the pooling step size of the pooling layer to the same value as the pooling window size.
  • the pooling layer is configured according to each of the pooling parameters, and the first basic feature layer in the second window is used as an input to generate a pooling feature.
  • the base feature layer contain C channels, then the dimension of the local pooling feature is [PoolOutX, PoolOutY, C].
  • the target matching method based on the convolutional neural network provided by the embodiment of the present invention adopts a traversal matching manner to achieve target matching of the second image to the first image, and the traversal matching in the embodiment of the present invention obtains a matching score.
  • the value map is traversed and matched to the to-be-matched area in the second image to obtain correlation information of each to-be-matched area and the target area in the first image.
  • the process of generating the matching score map is specifically implemented by the following steps, where the method further includes:
  • the target matching method based on the convolutional neural network provided by the embodiment of the present invention performs scaling processing on the second image before performing feature extraction on the second image. Therefore, referring to FIG. 5, the feature extraction of the second image is specifically implemented by the following steps:
  • S3011 Perform a scaling process on the second image according to the first image to obtain a second image after the scaling process.
  • S3012 Extract a second basic feature layer of the second image after the scaling process based on the pre-acquired CNN.
  • the second image is first scaled to a size corresponding to the first image.
  • the size of the second image should be similar to the first image, and for image tracking, the second image and The first image has the same size; then the second base feature layer of the scaled second image is extracted using the same CNN as the first image.
  • the convolution kernel used by the matching convolution layer and the modulus convolution layer are all normalized cells taken from the first image.
  • the characterization feature, the normalization pooling feature is obtained by normalizing the pooling features.
  • the object matching method based on the convolutional neural network is that the second basic feature layer is configured to match the volume base layer and the modulus value convolution layer respectively to be configured in the configured pooling layer to be matched and the configured modulus value.
  • the matching basic convolution layer is configured for the second basic feature layer, as shown in FIG. 6, the following steps are specifically implemented:
  • the window size of the pooling layer and the window traversal granularity are used to configure the pooling layer to be matched for the second basic feature layer, so that the output of the second basic feature layer according to the pooled layer to be matched is performed according to the window size of the pooling layer. Pool processing.
  • S402. Configure a matching convolution layer for the pooled layer to be matched according to the normalized pooling feature, so as to perform convolution processing according to the normalized pooling feature according to the output of the matching convolution layer to be matched with the pooling layer.
  • the object matching method based on the convolutional neural network provided by the embodiment of the present invention first configures a pooling layer to be matched on the second basic feature layer.
  • the window size of the pooled layer to be matched is the same as the pooled window size of the first image pooling layer.
  • the pooling step size of the pooled layer to be matched [PoolStepX2, PoolStepY2] represents the granularity of the window traversal, so the step size can be a preset value or an integer that increases as the size of the pooled window increases.
  • the step size ranges from 1 to the pooled window size.
  • the embodiments of the present invention are not specifically limited to meet the different needs of different users.
  • the matching convolution layer is further disposed in the above-mentioned pooling layer to be matched.
  • the matching convolution layer uses the normalized pooled feature extracted by the first image as the convolution kernel of the matching convolution layer of the second image, and the dimension is [PoolOutX, PoolOutY, C]. Let the dimension of the output of the pooled layer of the second image to be matched be [W2, H2, C], then the dimension of the output of the matching convolution layer is [W2, H2, 1], and each spatial position represents a part with the first image.
  • the matching value of the feature is [W2, H2, 1]
  • the normalization pooling feature is a result of normalizing the pooled features, and the embodiment of the present invention normalizes by first calculating the pooling feature in the spatial dimension [PoolOutX, PoolOutY]. The modulus of the C-dimensional vector for each position, and the modules for each position are accumulated. The pooled feature is then divided by the accumulated modulus to obtain a normalized pooling feature.
  • the foregoing is configured to configure a modulus convolution layer for the second basic feature layer. Referring to FIG. 7, the following steps are specifically implemented:
  • the modulus value of the C-dimensional feature of each position is first calculated by the modulus calculation layer, and the modulus value of the dimension [PoolOutX, PoolOutY, 1] is output. Then, a modulus convolution layer is disposed on the modulus calculation layer, and the convolution kernel size, the convolution step size and the like of the convolution layer are the same as the matching convolution layer, and the number of input and output channels is 1, and the convolution kernel value is All are 1 and the offset is 0.
  • the dimension of the second image base feature layer be [W2, H2, C]
  • the dimension of the modulo convolution layer output is [W2, H2, 1].
  • the target matching method based on the convolutional neural network provided by the embodiment of the present invention divides two scalar image points of the output of the matched volume base layer and the modulus convolution layer according to the above configuration, and obtains a target area in the first image.
  • the matching score map of the pooled features in each of the to-be-matched regions in the second image is provided by the embodiment of the present invention.
  • the convolutional neural network based on the embodiment of the present invention is provided.
  • the matching convolution layer configuration process in the target matching method is specifically implemented by the following steps:
  • S601 Perform a hole-polishing process on the normalized pooling feature according to the difference between the window size of the pooling layer and the window traversal granularity, and obtain a normalized pooling feature after the hole processing.
  • the object matching method based on the convolutional neural network is to use the normalized pooling feature of the first image as the convolution kernel of the matching convolution layer of the second image, and the convolution kernel
  • the core is added with holes, and the dimension of the hole is the pooling window size of the pre-matching pooling layer minus the pooling step size of the pre-matching pooling layer (ie, window traversal granularity), that is, [PoolSizeX-PoolStepX2, PoolSizeY-PoolStepY2].
  • the matching matching layer is configured to match the pooling layer, and the offset of the matching convolution layer is 0, and the convolution step is 1.
  • the so-called hole addition can be equivalent to filling the original convolution kernel with every other pixel by 0, and the equivalent convolution kernel size after filling is [PoolOutX+PoolSizeX-PoolStepX2, PoolOutY+PoolSizeY-PoolStepY2], and In the actual convolution operation, the program can skip the calculation of the zero position and thus does not increase the amount of calculation.
  • an embodiment of the present invention provides a matching diagram of a convolution core after hole-adding, wherein the size of the convolution kernel is [2, 2], and the size of the hole is [1, 1].
  • the dot matrix represents the underlying feature layer.
  • the pooling window size, pooling step size, and pooling output size of the pooled layer in the first image are both [2, 2].
  • the pooled window size of the pooled layer to be matched in the second image (Fig. 9b) is [2, 2], and the pooling step size is [1, 1].
  • the matching convolutional layer convolution kernel size of the second image is [2, 2].
  • the scope of each pixel of the [2, 2] convolution kernel overlaps, which is different from the local feature of the first image; when the hole of [1, 1] is added, the volume The nucleation scope (as indicated by the thick line box) is the same as the first image local feature.
  • the configuration process of the modulus convolution layer in the target matching method based on the convolutional neural network provided by the embodiment of the present invention is specifically implemented by the following steps:
  • the convolution kernel size, the convolution step size, and the hole-adding layer of the modulus convolution layer in the embodiment of the present invention are the same as the matching convolution layer. Similarly, the process of the above-mentioned hole-punching process is also I will not repeat them here. Then, the modulus convolution layer is configured on the modulus calculation layer according to the normalized pooling feature after the hole processing after the hole processing.
  • the determining process of the foregoing S104 is specifically implemented by the following steps, the method further comprising:
  • the matching score map refers to matching the degree of matching of each to-be-matched region of the second image with respect to the target region of the first image, and the matching score of the corresponding pixel The higher the value, the more similar the target area to the first image is.
  • the embodiment of the present invention selects the to-be-matched area corresponding to the highest score of the matching score map as the target area in the second image.
  • the target matching method based on the convolutional neural network provided by the embodiment of the present invention is less accurate than the point feature matching method and the line feature matching method in the prior art, and the surface feature matching method is less efficient than the Firstly, the first image and the second image are acquired, and then the target region in the first image is calculated by the pooling feature, and the second image is traversed and matched based on the calculated pooling feature again, and finally the matching is obtained according to the traversal matching.
  • the score map determines a target area in the second image, and uses the pooled feature of the first image to perform traversal matching on the second image, and the matching accuracy is better and the efficiency is higher.
  • the embodiment of the present invention further provides a target matching device based on a convolutional neural network, where the device is used to perform the above-described convolutional neural network-based target matching method.
  • the device includes:
  • the obtaining module 11 is configured to acquire the first image and the second image
  • the calculating module 22 is configured to calculate a pooling feature of the target area in the first image
  • the generating module 33 is configured to perform traversal matching on the second image based on the pooling feature to obtain a corresponding matching score map.
  • the determining module 44 is configured to determine a target area in the second image based on the matching score map.
  • the calculation module 22 includes:
  • a first extraction submodule configured to extract a first basic feature layer of the first image based on the pre-acquired CNN
  • a calculating submodule configured to calculate a position of the first window in the first base feature layer relative to the target area according to the position of the target area in the first image and the dimensionality reduction ratio of the CNN;
  • Determining a submodule configured to determine a location of the second window of the first basic feature layer based on the preset pooling parameter and the location of the first window;
  • the first generation sub-module is configured to input the first basic feature layer of the second window to the pooling layer corresponding to the pooling parameter for feature extraction, to obtain a pooling feature.
  • the determining submodule comprises:
  • a first calculating unit configured to calculate a first output size of the pooling layer according to a preset minimum window size of the pooling layer and a position of the first window;
  • a second calculating unit configured to calculate a second output size of the pooling layer according to a preset maximum output size of the pooled layer and a first output size
  • a third calculating unit configured to calculate a window size of the pooling layer according to the second output size and the position of the first window
  • a fourth calculating unit configured to calculate a position of the second window of the first basic feature layer according to the second output size and the window size.
  • the target matching device based on the convolutional neural network provided by the embodiment of the present invention adopts a traversal matching manner to achieve target matching of the second image to the first image, and the traversal matching in the embodiment of the present invention obtains a matching score.
  • the value map is traversed and matched to the to-be-matched area in the second image to obtain correlation information of each to-be-matched area and the target area in the first image.
  • the target matching device based on the convolutional neural network provided by the embodiment of the present invention further includes a generating module 33, and the generating module 33 includes:
  • a second extraction submodule configured to extract a second basic feature layer of the second image based on the pre-acquired CNN
  • the configuration submodule is configured to respectively configure a matching convolution layer and a modulus volume for the second basic feature layer.
  • the stacking kernel used in the matching convolution layer and the modulus convolution layer is a normalized pooling feature taken from the first image, and the normalized pooling feature is to normalize the pooling feature. Processed
  • a second generating unit submodule configured to obtain, according to a ratio relationship between an output of the matching convolution layer and an output of the modulo convolution layer, obtaining a target region of the second image relative to the target region of the first image Match the score map.
  • the target matching device based on the convolutional neural network provided by the embodiment of the present invention performs scaling processing on the second image before performing feature extraction on the second image. Therefore, the second extraction submodule includes:
  • a scaling unit configured to perform a scaling process on the second image according to the first image to obtain a second image after the scaling process
  • an extracting unit configured to extract a second basic feature layer of the second image after the scaling process based on the pre-acquired CNN.
  • the target matching device method based on the convolutional neural network provided by the embodiment of the present invention is configured to configure a matching volume base layer and a modulus value convolution layer respectively in the configured pooling layer to be matched and the configured modulus value.
  • the configuration sub-module includes:
  • the first configuration unit is configured to configure a pooling layer to be matched according to the window size of the pooling layer and the window traversal granularity, so as to be pooled according to the output of the pooled layer to be matched to the second basic layer
  • the window size of the layer is pooled
  • a second configuration unit configured to configure a matching convolution layer for the pooled layer to be matched according to the normalized pooling feature, to perform convolution processing according to the normalized pooling feature according to the output of the matching convolution layer to be matched with the pooling layer ;
  • a third configuration unit configured to configure a modulus calculation layer to be matched with the pooling layer based on the modulus value operation, to perform normalization processing on the output of the pooled layer to be matched according to the modulus calculation layer;
  • a fourth configuration unit configured to configure a modulus convolution layer for the modulus calculation layer according to the normalization pooling feature, to perform volume according to the normalized pooling feature according to the modulus convolution layer Product processing.
  • the second configuration unit in the middle includes:
  • the first hole-adding sub-unit is configured to perform a hole-polishing process on the normalized pooling feature according to the difference between the window size of the pooling layer and the window traversing granularity, and obtain a normalized pooling after the hole processing feature;
  • the first configuration subunit is configured to configure a matching convolution layer for the pooled layer to be matched according to the normalized pooling feature after the hole processing.
  • the fourth configuration unit in the target matching device based on the convolutional neural network provided by the embodiment of the present invention includes:
  • the second hole-adding sub-unit is configured to perform a hole-polishing process on the normalized pooling feature according to the difference between the window size of the pooling layer and the window traversing granularity, and obtain a normalized pooling after the hole processing feature;
  • the second configuration subunit is configured to configure the modulus convolution layer for the modulus calculation layer according to the normalized pooling feature after the hole processing.
  • the determining module 44 is further configured to select the highest score corresponding to the matching score map to be matched.
  • the area serves as a target area in the second image.
  • the acquisition module 11, the calculation module 22, the generation module 33, and the determination module 44 in the target matching device based on the convolutional neural network, and the submodules included in the above modules are in practical applications.
  • the central processing unit (CPU), the digital signal processor (DSP), the micro control unit (MCU), or the programmable gate array (FPGA, Field-Programmable) can be used in the device. Gate Array) implementation.
  • the target matching device based on the convolutional neural network provided by the embodiment of the present invention has poor accuracy with the point feature matching method and the line feature matching method in the prior art, and the surface feature matching method is less efficient than the Firstly, the first image and the second image are acquired, and then the target region in the first image is calculated by the pooling feature, and the second image is traversed and matched based on the calculated pooling feature again, and finally the matching is obtained according to the traversal matching.
  • the score map determines a target area in the second image, and uses the pooled feature of the first image to perform traversal matching on the second image, and the matching accuracy is better and the efficiency is higher.
  • the object matching method and apparatus based on the convolutional neural network provided by the embodiments of the present invention can also be applied to image retrieval and image tracking, wherein the application of the image retrieval can bring the following technical effects:
  • the neural network is not required to be trained during the initial stage of tracking and tracking, which greatly reduces the time consumption of single target tracking;
  • each tracking shares the basic feature layer. Compared with the calculation amount of the basic feature layer, the individual operation amount of each tracking is very small, so it is suitable for real-time video multi-target tracking.
  • a computer program product for performing a method for target matching based on a convolutional neural network comprising a computer readable storage medium storing program code, the program code comprising instructions operable to execute the foregoing method embodiment
  • program code comprising instructions operable to execute the foregoing method embodiment
  • the device for matching the target based on the convolutional neural network provided by the embodiment of the present invention may be specific hardware on the device or software or firmware installed on the device.
  • the implementation principle and the technical effects produced by the device provided by the embodiments of the present invention are the same as those of the foregoing method embodiments, and are briefly described.
  • the device embodiment is not mentioned, reference may be made to the corresponding content in the foregoing method embodiments.
  • a person skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working processes of the foregoing system, the device and the unit can refer to the corresponding processes in the foregoing method embodiments, and details are not described herein again.
  • the disclosed apparatus and method may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some communication interface, device or unit, and may be electrical, mechanical or otherwise.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in the embodiment provided by the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product.
  • the technical solution of the present invention which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including
  • the instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read only memory (ROM, Read-Only) Memory, random access memory (RAM), disk or optical disk, and other media that can store program code.
  • the technical solution of the embodiment of the present invention first acquires the first image and the second image, and then performs the calculation of the pooled feature on the target region in the first image, and performs traversal matching on the second image again based on the calculated pooled feature. Finally, the target region in the second image is determined according to the matching score map obtained by the traversal matching, and the second image is traversed and matched by using the pooling feature of the first image, and the matching accuracy is better and the efficiency is higher.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

Disclosed in the embodiments of the present invention are a convolutional neural network-based target matching method, device and storage medium. The method comprises: acquiring a first image and a second image; calculating a pooling feature of a target area in the first image; traversing and matching the second image on the basis of the pooling feature to obtain a corresponding matching score graph; and determining a target area in the second image according to the matching score graph.

Description

基于卷积神经网络的目标匹配方法、装置及存储介质Target matching method, device and storage medium based on convolutional neural network 技术领域Technical field
本发明涉及机器视觉技术领域,具体涉及一种基于卷积神经网络的目标匹配方法、装置及存储介质。The present invention relates to the field of machine vision technology, and in particular, to a target matching method, device and storage medium based on a convolutional neural network.
背景技术Background technique
随着智慧城市建设的不断深入,视频监控市场继续保持快速增长的态势。目前,视频监控主要通过设置捕获环境信息的摄像机来捕获视频图像,并将捕获到的视频图像传输至控制平台以对其进行分析处理,例如对视频图像中目标的跟踪。对于目标跟踪,其一般过程包括:目标进入视频监控区域后,由于目标是运动的,将系统捕获到目标在当前帧的图像作为模板,在视频图像的下一帧通过目标匹配找到目标移动后的位置。可见,如何精确地进行目标匹配是视频图像跟踪的关键。另外,目标匹配也是图像识别、图像检索、图像标注等技术的核心。With the continuous deepening of smart city construction, the video surveillance market continues to maintain a rapid growth trend. Currently, video surveillance mainly captures video images by setting a camera that captures environmental information, and transmits the captured video images to a control platform for analysis processing, such as tracking of targets in the video images. For target tracking, the general process includes: after the target enters the video surveillance area, because the target is moving, the system captures the image of the target in the current frame as a template, and finds the target after the target frame is matched by the target frame in the next frame of the video image. position. It can be seen that how to accurately perform target matching is the key to video image tracking. In addition, target matching is also the core of technologies such as image recognition, image retrieval, and image annotation.
其中,目标匹配也就是指,对前后视频帧或者预先选定的多个图像帧进行关联,从后一图像帧中找到与前一图像帧中的目标相匹配的匹配目标。其关联的方法主要通过特征进行关联。The target matching means that the front and rear video frames or a plurality of pre-selected image frames are associated, and the matching target matching the target in the previous image frame is found from the latter image frame. The associated methods are primarily related by features.
在现有技术中,一般采用点特征模板匹配、线特征模板匹配和面特征模板匹配等目标匹配的方法。然而,点特征匹配方法在目标对比度较低,或没有明显的焦点特征时,匹配准确度较差;线特征匹配方法在目标边缘不明显,或目标产生较大变形时,匹配的准确度也较差;面特征匹配方法虽然提高了匹配的准确度,但是其运算量较大,效率较低。 In the prior art, a target matching method such as point feature template matching, line feature template matching, and surface feature template matching is generally adopted. However, the point feature matching method has poor matching accuracy when the target contrast is low, or there is no obvious focus feature; the line feature matching method is not obvious when the target edge is not obvious, or the target has large deformation, the matching accuracy is also better. Difference; although the surface feature matching method improves the accuracy of matching, it has a large amount of computation and low efficiency.
发明内容Summary of the invention
有鉴于此,本发明实施例期望提供一种基于卷积神经网络的目标匹配方法、装置及存储介质,采用池化特征进行遍历匹配,匹配的准确度和效率均较高。In view of this, the embodiments of the present invention are directed to a target matching method, apparatus, and storage medium based on a convolutional neural network, which adopts pooling features for ergodic matching, and the accuracy and efficiency of matching are high.
第一方面,本发明实施例提供了一种基于卷积神经网络的目标匹配方法,所述方法包括:In a first aspect, an embodiment of the present invention provides a target matching method based on a convolutional neural network, where the method includes:
获取第一图像和第二图像;Obtaining the first image and the second image;
计算所述第一图像中目标区域的池化特征;Calculating a pooling feature of the target area in the first image;
基于所述池化特征对所述第二图像进行遍历匹配,得到对应的匹配分值图;Performing traversal matching on the second image based on the pooling feature to obtain a corresponding matching score map;
根据所述匹配分值图确定所述第二图像中的目标区域。Determining a target area in the second image according to the matching score map.
在一实施例中,所述计算所述第一图像中目标区域的池化特征,包括:In an embodiment, the calculating the pooling feature of the target area in the first image comprises:
基于预先获取的卷积神经网络(CNN,Convolutional Neural Networks)提取所述第一图像的第一基础特征层;Extracting a first basic feature layer of the first image based on a pre-acquired Convolutional Neural Networks (CNN);
根据所述第一图像中目标区域的位置和所述CNN的降维比率,计算所述第一基础特征层中相对于目标区域的第一窗口的位置;Calculating a position of the first window in the first base feature layer relative to the target area according to a location of the target area in the first image and a reduction ratio of the CNN;
基于预设的池化参数和所述第一窗口的位置,确定所述第一基础特征层的第二窗口的位置;Determining a location of the second window of the first base feature layer based on the preset pooling parameter and the location of the first window;
将所述第二窗口的第一基础特征层输入至所述池化参数对应的池化层进行特征提取,得到池化特征。The first basic feature layer of the second window is input to the pooling layer corresponding to the pooling parameter for feature extraction, and a pooling feature is obtained.
在一实施例中,所述基于预设的池化参数和所述第一窗口的位置,确定所述第一基础特征层的第二窗口的位置,包括:In an embodiment, determining the location of the second window of the first basic feature layer based on the preset pooling parameter and the location of the first window comprises:
根据预设的池化层的最小窗口尺寸和所述第一窗口的位置,计算池化层的第一输出尺寸;Calculating a first output size of the pooling layer according to a preset minimum window size of the pooled layer and a position of the first window;
根据预设的池化层的最大输出尺寸和所述第一输出尺寸,计算所述池 化层的第二输出尺寸;Calculating the pool according to a preset maximum output size of the pooled layer and the first output size The second output size of the layer;
根据所述第二输出尺寸和所述第一窗口的位置,计算所述池化层的窗口尺寸;Calculating a window size of the pooling layer according to the second output size and a position of the first window;
根据所述第二输出尺寸和所述窗口尺寸,计算所述第一基础特征层的第二窗口的位置。Calculating a position of the second window of the first base feature layer according to the second output size and the window size.
在一实施例中,所述基于所述池化特征对所述第二图像进行遍历匹配,得到对应的匹配分值图,包括:In an embodiment, the traversing matching is performed on the second image based on the pooling feature to obtain a corresponding matching score map, including:
基于预先获取的CNN提取所述第二图像的第二基础特征层;Extracting a second basic feature layer of the second image based on the pre-acquired CNN;
为所述第二基础特征层分别配置匹配卷积层和模值卷积层;其中,所述匹配卷积层和所述模值卷积层使用的卷积核均为取自所述第一图像的归一化池化特征,所述归一化池化特征是对所述池化特征进行归一化处理得到的;Configuring a matching convolution layer and a modulus convolution layer for the second basic feature layer; wherein the matching convolution layer and the convolution kernel used by the modulus convolution layer are all taken from the first a normalized pooling feature of the image, wherein the normalized pooling feature is obtained by normalizing the pooled feature;
根据所述匹配卷积层的输出和所述模值卷积层的输出之间的比值关系,得出所述第二图像的每个待匹配区域相对于第一图像的目标区域的匹配分值图。And obtaining, according to a ratio relationship between an output of the matching convolution layer and an output of the modulo convolution layer, a matching score of each of the to-be-matched regions of the second image with respect to a target region of the first image Figure.
在一实施例中,所述基于预先获取的CNN提取所述第二图像的第二基础特征层,包括:In an embodiment, the extracting the second basic feature layer of the second image based on the pre-acquired CNN includes:
按照所述第一图像对所述第二图像进行缩放处理,得到缩放处理后的第二图像;Performing a scaling process on the second image according to the first image to obtain a second image after the scaling process;
基于预先获取的CNN提取所述缩放处理后的第二图像的第二基础特征层。And extracting, according to the pre-acquired CNN, the second basic feature layer of the second image after the scaling process.
在一实施例中,为所述第二基础特征层配置匹配卷积层,包括:In an embodiment, configuring the matching convolution layer for the second basic feature layer comprises:
基于所述池化层的窗口尺寸和窗口遍历颗粒度为所述第二基础特征层配置待匹配池化层,以根据所述待匹配池化层对第二基础特征层的输出按照所述池化层的窗口尺寸进行池化处理; And configuring, according to the window size of the pooling layer and the window traversal granularity, a pooling layer to be matched for the second basic feature layer, according to the output of the second basic feature layer according to the pooled layer to be matched according to the pool The window size of the layer is pooled;
根据所述归一化池化特征为所述待匹配池化层配置匹配卷积层,以根据所述匹配卷积层对待匹配池化层的输出按照所述归一化池化特征进行卷积处理;And configuring a matching convolution layer according to the normalization pooling feature to perform convolution according to the normalized pooling feature according to the output of the matching convolution layer to be matched with the pooling layer deal with;
为所述第二基础特征层配置模值卷积层,包括:Configuring a modulus convolution layer for the second base feature layer, including:
基于模值运算对所述待匹配池化层配置模值计算层,以根据所述模值计算层对所述待匹配池化层的输出进行归一化处理;And configuring a modulus calculation layer on the to-be-matched pooling layer to perform normalization processing on the output of the to-be-matched pooling layer according to the modulus calculation layer;
根据所述归一化池化特征为所述模值计算层配置模值卷积层,以根据所述模值卷积层对模值计算层的输出按照归一化池化特征进行卷积处理。And configuring a modulus convolution layer for the modulus calculation layer according to the normalization pooling feature, to perform convolution processing according to the normalized pooling feature according to the output of the modulus value convolution layer .
在一实施例中,所述根据所述归一化池化特征为所述待匹配池化层配置匹配卷积层,包括:In an embodiment, the configuring a matching convolution layer for the to-be-matched pooling layer according to the normalized pooling feature includes:
根据所述池化层的窗口尺寸和窗口遍历颗粒度之间的差值运算结果对所述归一化池化特征进行加孔处理,得到加孔处理后的归一化池化特征;Performing a hole-polishing process on the normalized pooling feature according to a difference between a window size of the pooling layer and a window traversing granularity, to obtain a normalized pooling feature after the hole processing;
根据所述加孔处理后的归一化池化特征为所述待匹配池化层配置匹配卷积层。And matching the convolution layer to the pooled layer to be matched according to the normalized pooling feature after the hole processing.
在一实施例中,所述根据所述归一化池化特征为所述模值计算层配置模值卷积层,包括:In an embodiment, the configuring a modulus convolution layer for the modulus calculation layer according to the normalization pooling feature includes:
根据所述池化层的窗口尺寸和窗口遍历颗粒度之间的差值运算结果对所述归一化池化特征进行加孔处理,得到加孔处理后的归一化池化特征;Performing a hole-polishing process on the normalized pooling feature according to a difference between a window size of the pooling layer and a window traversing granularity, to obtain a normalized pooling feature after the hole processing;
根据所述加孔处理后的归一化池化特征为所述模值计算层配置模值卷积层。And configuring a modulus convolution layer for the modulus calculation layer according to the normalized pooling feature after the hole processing.
在一实施例中,所述根据所述匹配分值图确定所述第二图像中的目标区域,包括:In an embodiment, the determining, according to the matching score map, the target area in the second image comprises:
选取匹配分值图中的最高分值对应的待匹配区域作为第二图像中的目标区域。The area to be matched corresponding to the highest score in the matching score map is selected as the target area in the second image.
第二方面,本发明实施例还提供了一种基于卷积神经网络的目标匹配 装置,所述装置包括:In a second aspect, an embodiment of the present invention further provides a target matching based on a convolutional neural network. Apparatus, the apparatus comprising:
获取模块,配置为获取第一图像和第二图像;Obtaining a module, configured to acquire the first image and the second image;
计算模块,配置为计算所述第一图像中目标区域的池化特征;a calculation module configured to calculate a pooling feature of the target area in the first image;
生成模块,配置为基于所述池化特征对所述第二图像进行遍历匹配,得到对应的匹配分值图;a generating module, configured to perform traversal matching on the second image based on the pooling feature, to obtain a corresponding matching score map;
确定模块,配置为根据所述匹配分值图确定所述第二图像中的目标区域。a determining module configured to determine a target area in the second image based on the matching score map.
在一实施例中,所述计算模块包括:In an embodiment, the calculation module comprises:
第一提取子模块,配置为基于预先获取的CNN提取第一图像的第一基础特征层;a first extraction submodule configured to extract a first basic feature layer of the first image based on the pre-acquired CNN;
计算子模块,配置为根据第一图像中目标区域的位置和CNN的降维比率,计算第一基础特征层中相对于目标区域的第一窗口的位置;a calculating submodule configured to calculate a position of the first window in the first base feature layer relative to the target area according to the position of the target area in the first image and the dimensionality reduction ratio of the CNN;
确定子模块,配置为基于预设的池化参数和第一窗口的位置,确定第一基础特征层的第二窗口的位置;Determining a submodule configured to determine a location of the second window of the first basic feature layer based on the preset pooling parameter and the location of the first window;
第一生成子模块,配置为将第二窗口的第一基础特征层输入至池化参数对应的池化层进行特征提取,得到池化特征。The first generation sub-module is configured to input the first basic feature layer of the second window to the pooling layer corresponding to the pooling parameter for feature extraction, to obtain a pooling feature.
在一实施例中,所述确定子模块包括:In an embodiment, the determining submodule comprises:
第一计算单元,配置为根据预设的池化层的最小窗口尺寸和第一窗口的位置,计算池化层的第一输出尺寸;a first calculating unit, configured to calculate a first output size of the pooling layer according to a preset minimum window size of the pooling layer and a position of the first window;
第二计算单元,配置为根据预设的池化层的最大输出尺寸和第一输出尺寸,计算池化层的第二输出尺寸;a second calculating unit, configured to calculate a second output size of the pooling layer according to a preset maximum output size of the pooled layer and a first output size;
第三计算单元,配置为根据第二输出尺寸和第一窗口的位置,计算池化层的窗口尺寸;a third calculating unit, configured to calculate a window size of the pooling layer according to the second output size and the position of the first window;
第四计算单元,配置为根据第二输出尺寸和窗口尺寸,计算第一基础特征层的第二窗口的位置。 And a fourth calculating unit configured to calculate a position of the second window of the first basic feature layer according to the second output size and the window size.
在一实施例中,所述生成模块包括:In an embodiment, the generating module includes:
第二提取子模块,配置为基于预先获取的CNN提取第二图像的第二基础特征层;a second extraction submodule configured to extract a second basic feature layer of the second image based on the pre-acquired CNN;
配置子模块,配置为为第二基础特征层分别配置匹配卷积层和模值卷积层;其中,匹配卷积层和模值卷积层使用的卷积核均为取自第一图像的归一化池化特征,归一化池化特征是对池化特征进行归一化处理得到的;a configuration submodule configured to respectively configure a matching convolution layer and a modulus convolution layer for the second basic feature layer; wherein the convolution kernel used by the matching convolution layer and the modulus convolution layer are all taken from the first image The normalized pooling feature is obtained by normalizing the pooling features;
第二生成单元子模块,配置为根据匹配卷积层的输出和模值卷积层的输出之间的比值关系,得出第二图像的每个待匹配区域相对于第一图像的目标区域的匹配分值图。a second generating unit submodule configured to obtain, according to a ratio relationship between an output of the matching convolution layer and an output of the modulo convolution layer, obtaining a target region of the second image relative to the target region of the first image Match the score map.
在一实施例中,所述第二提取子模块包括:In an embodiment, the second extraction submodule comprises:
缩放单元,配置为按照第一图像对第二图像进行缩放处理,得到缩放处理后的第二图像;a scaling unit configured to perform a scaling process on the second image according to the first image to obtain a second image after the scaling process;
提取单元,配置为基于预先获取的CNN提取缩放处理后的第二图像的第二基础特征层。And an extracting unit configured to extract a second basic feature layer of the second image after the scaling process based on the pre-acquired CNN.
在一实施例中,所述配置子模块包括:In an embodiment, the configuration submodule includes:
第一配置单元,配置为基于池化层的窗口尺寸和窗口遍历颗粒度为第二基础特征层配置待匹配池化层,以根据待匹配池化层对第二基础特征层的输出按照池化层的窗口尺寸进行池化处理;The first configuration unit is configured to configure a pooling layer to be matched according to the window size of the pooling layer and the window traversal granularity, so as to be pooled according to the output of the pooled layer to be matched to the second basic layer The window size of the layer is pooled;
第二配置单元,配置为根据归一化池化特征为待匹配池化层配置匹配卷积层,以根据匹配卷积层对待匹配池化层的输出按照归一化池化特征进行卷积处理;a second configuration unit, configured to configure a matching convolution layer for the pooled layer to be matched according to the normalized pooling feature, to perform convolution processing according to the normalized pooling feature according to the output of the matching convolution layer to be matched with the pooling layer ;
第三配置单元,配置为基于模值运算对待匹配池化层配置模值计算层,以根据模值计算层对待匹配池化层的输出进行归一化处理;a third configuration unit, configured to configure a modulus calculation layer to be matched with the pooling layer based on the modulus value operation, to perform normalization processing on the output of the pooled layer to be matched according to the modulus calculation layer;
第四配置单元,配置为根据归一化池化特征为模值计算层配置模值卷积层,以根据模值卷积层对模值计算层的输出按照归一化池化特征进行卷 积处理。a fourth configuration unit configured to configure a modulus convolution layer for the modulus calculation layer according to the normalization pooling feature, to perform volume according to the normalized pooling feature according to the modulus convolution layer Product processing.
在一实施例中,所述第二配置单元包括:In an embodiment, the second configuration unit comprises:
第一加孔子单元,配置为根据池化层的窗口尺寸和窗口遍历颗粒度之间的差值运算结果对归一化池化特征进行加孔处理,得到加孔处理后的归一化池化特征;The first hole-adding sub-unit is configured to perform a hole-polishing process on the normalized pooling feature according to the difference between the window size of the pooling layer and the window traversing granularity, and obtain a normalized pooling after the hole processing feature;
第一配置子单元,配置为根据加孔处理后的归一化池化特征为待匹配池化层配置匹配卷积层。The first configuration subunit is configured to configure a matching convolution layer for the pooled layer to be matched according to the normalized pooling feature after the hole processing.
在一实施例中,所述第四配置单元包括:In an embodiment, the fourth configuration unit comprises:
第二加孔子单元,配置为根据池化层的窗口尺寸和窗口遍历颗粒度之间的差值运算结果对归一化池化特征进行加孔处理,得到加孔处理后的归一化池化特征;The second hole-adding sub-unit is configured to perform a hole-polishing process on the normalized pooling feature according to the difference between the window size of the pooling layer and the window traversing granularity, and obtain a normalized pooling after the hole processing feature;
第二配置子单元,配置为根据加孔处理后的归一化池化特征为模值计算层配置模值卷积层。The second configuration subunit is configured to configure the modulus convolution layer for the modulus calculation layer according to the normalized pooling feature after the hole processing.
在一实施例中,所述确定模块,还配置为选取匹配分值图中的最高分值对应的待匹配区域作为第二图像中的目标区域。In an embodiment, the determining module is further configured to select a to-be-matched region corresponding to the highest score in the matching score map as the target region in the second image.
第三方面,本发明实施例还提供了一种存储介质,所述存储介质中存储有计算机可执行指令,所述计算机可执行指令用于执行本发明实施例所述的基于卷积神经网络的目标匹配方法。In a third aspect, the embodiment of the present invention further provides a storage medium, where the computer-executable instructions are stored in the storage medium, and the computer-executable instructions are used to execute the convolutional neural network based on the embodiment of the present invention. Target matching method.
本发明实施例提供的基于卷积神经网络的目标匹配方法、装置及存储介质,与现有技术中的点特征匹配方法和线特征匹配方法的准确度较差,且面特征匹配方法的效率较低相比,其首先获取第一图像和第二图像,其次对第一图像中的目标区域进行池化特征的计算,再次基于上述计算得到的池化特征对第二图像进行遍历匹配,最后根据遍历匹配得到的匹配分值图确定第二图像中的目标区域,其采用第一图像的池化特征对第二图像进行遍历匹配,匹配的准确度较佳、效率也较高。 The object matching method, device and storage medium based on convolutional neural network provided by the embodiments of the present invention are less accurate than the point feature matching method and the line feature matching method in the prior art, and the efficiency of the surface feature matching method is better. In contrast, it first acquires the first image and the second image, and secondly calculates the pooled feature of the target region in the first image, and performs traversal matching on the second image based on the calculated pooled feature again, and finally according to The matching score map obtained by traversing the matching determines the target region in the second image, and adopts the pooling feature of the first image to perform traversal matching on the second image, and the matching accuracy is better and the efficiency is higher.
为使本发明的上述目的、特征和优点能更明显易懂,下文特举较佳实施例,并配合所附附图,作详细说明如下。The above described objects, features and advantages of the present invention will become more apparent from the aspects of the appended claims.
附图说明DRAWINGS
为了更清楚地说明本发明实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,应当理解,以下附图仅示出了本发明的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the embodiments will be briefly described below. It should be understood that the following drawings show only certain embodiments of the present invention, and therefore It should be seen as a limitation on the scope, and those skilled in the art can obtain other related drawings according to these drawings without any creative work.
图1示出了本发明实施例所提供的一种基于卷积神经网络的目标匹配方法的流程图;FIG. 1 is a flowchart of a target matching method based on a convolutional neural network according to an embodiment of the present invention;
图2示出了本发明实施例所提供的另一种基于卷积神经网络的目标匹配方法的流程图;FIG. 2 is a flowchart of another method for matching a target based on a convolutional neural network according to an embodiment of the present invention;
图3示出了本发明实施例所提供的另一种基于卷积神经网络的目标匹配方法的流程图;FIG. 3 is a flowchart of another method for matching a target based on a convolutional neural network according to an embodiment of the present invention;
图4示出了本发明实施例所提供的另一种基于卷积神经网络的目标匹配方法的流程图;FIG. 4 is a flowchart of another method for matching a target based on a convolutional neural network according to an embodiment of the present invention;
图5示出了本发明实施例所提供的另一种基于卷积神经网络的目标匹配方法的流程图;FIG. 5 is a flowchart of another method for matching a target based on a convolutional neural network according to an embodiment of the present invention;
图6示出了本发明实施例所提供的另一种基于卷积神经网络的目标匹配方法的流程图;FIG. 6 is a flowchart of another method for matching a target based on a convolutional neural network according to an embodiment of the present invention;
图7示出了本发明实施例所提供的另一种基于卷积神经网络的目标匹配方法的流程图;FIG. 7 is a flowchart of another method for matching a target based on a convolutional neural network according to an embodiment of the present invention;
图8示出了本发明实施例所提供的另一种基于卷积神经网络的目标匹配方法的流程图;FIG. 8 is a flowchart of another method for matching a target based on a convolutional neural network according to an embodiment of the present invention;
图9a和图9b示出了本发明实施例所提供的一种基于卷积神经网络的目标匹配方法中卷积核加孔后的匹配示意图; 9a and 9b are schematic diagrams showing matching of a convolution kernel in a target matching method based on a convolutional neural network according to an embodiment of the present invention;
图10示出了本发明实施例所提供的另一种基于卷积神经网络的目标匹配方法的流程图;FIG. 10 is a flowchart of another method for matching a target based on a convolutional neural network according to an embodiment of the present invention;
图11示出了本发明实施例所提供的一种基于卷积神经网络的目标匹配装置的结构示意图。FIG. 11 is a schematic structural diagram of a target matching apparatus based on a convolutional neural network according to an embodiment of the present invention.
主要元件符号说明:The main component symbol description:
11、获取模块;22、计算模块;33、生成模块;44、确定模块。11. Obtaining a module; 22, calculating a module; 33, generating a module; 44, determining a module.
具体实施方式detailed description
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本发明实施例的组件可以以各种不同的配置来布置和设计。因此,以下对在附图中提供的本发明的实施例的详细描述并非旨在限制要求保护的本发明的范围,而是仅仅表示本发明的选定实施例。基于本发明的实施例,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is a partial embodiment of the invention, and not all of the embodiments. The components of the embodiments of the invention, which are generally described and illustrated in the figures herein, may be arranged and designed in various different configurations. Therefore, the following detailed description of the embodiments of the invention in the claims All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.
考虑到现有技术中的点特征匹配方法在目标对比度较低,或没有明显的焦点特征时,匹配准确度较差;线特征匹配方法在目标边缘不明显,或目标产生较大变形时,匹配的准确度也较差;面特征匹配方法虽然提高了匹配的准确度,但是其运算量较大,效率较低。基于此,本发明实施例提供了一种基于卷积神经网络的目标匹配方法及装置,其通过池化特征的遍历匹配,目标匹配的准确度和效率均较高。Considering the point feature matching method in the prior art, the matching accuracy is poor when the target contrast is low, or there is no obvious focus feature; the line feature matching method is matched when the target edge is not obvious, or the target has large deformation, the matching The accuracy is also poor; although the surface feature matching method improves the accuracy of matching, it has a large amount of computation and low efficiency. Based on this, an embodiment of the present invention provides a target matching method and apparatus based on a convolutional neural network, which has high accuracy and efficiency of target matching by traversal matching of pooled features.
参见图1所示的本发明实施例提供的基于卷积神经网络的目标匹配方法的流程图,所述方法具体包括如下步骤:FIG. 1 is a flowchart of a method for matching a target based on a convolutional neural network according to an embodiment of the present invention, where the method specifically includes the following steps:
S101、获取第一图像和第二图像。S101. Acquire a first image and a second image.
具体的,考虑到本发明实施例所提供的基于卷积神经网络的目标匹配 方法的体应用场景,本发明实施例提供的基于卷积神经网络的目标匹配方法需要对第一图像和第二图像进行获取。另外,本发明实施例所提供的基于卷积神经网络的目标匹配方法不仅可以应用于图像检索中,还可以应用于图像跟踪中。对于图像检索系统而言,第一图像为用户输入的查询图像,第二图像为预先存储的所有图像;对于目标跟踪系统而言,第一图像为初始帧或当前帧图像,第二图像为下一帧图像。Specifically, the target matching based on the convolutional neural network provided by the embodiment of the present invention is considered. In the physical application scenario of the method, the target matching method based on the convolutional neural network provided by the embodiment of the present invention needs to acquire the first image and the second image. In addition, the object matching method based on the convolutional neural network provided by the embodiment of the present invention can be applied not only to image retrieval but also to image tracking. For the image retrieval system, the first image is a query image input by the user, and the second image is all images stored in advance; for the target tracking system, the first image is an initial frame or a current frame image, and the second image is a lower image. One frame of image.
S102、计算第一图像中目标区域的池化特征。S102. Calculate a pooling feature of the target area in the first image.
具体的,首先对获取的上述第一图像进行目标区域的框选,然后再对框选的目标区域进行池化特征的计算。其中,上述目标区域的框选可以通过手工的方式或通过相关的计算机程序进行目标区域的确定,且本发明实施例中优选的将目标区域选择为矩形。该目标区域主要包括人、人脸、物体等用户较为感兴趣的区域。上述池化特征的计算主要通过深度神经网络对各个计算层进行对应的窗口确定,以根据确定后的窗口的池化特征作为第一图像中目标区域的图像池化特征。对于目标区域而言同时,在后续的匹配过程中,将以该图像池化特征作为卷积核进行第二图像的遍历匹配。Specifically, first, the obtained first image is selected as a target area, and then the selected target area is calculated by using the pooled feature. The frame selection of the target area may be performed by a manual method or by a related computer program, and the target area is preferably selected as a rectangle in the embodiment of the present invention. The target area mainly includes areas where people, faces, objects, and the like are more interested. The calculation of the above-mentioned pooling feature is mainly performed by using a depth neural network to perform corresponding window determination on each computing layer, so as to use the pooled feature of the determined window as the image pooling feature of the target region in the first image. At the same time for the target area, in the subsequent matching process, the traversal matching of the second image will be performed with the image pooling feature as a convolution kernel.
S103、基于池化特征对第二图像进行遍历匹配,得到对应的匹配分值图。S103. Perform traversal matching on the second image based on the pooling feature to obtain a corresponding matching score map.
S104、根据匹配分值图确定第二图像中的目标区域。S104. Determine a target area in the second image according to the matching score map.
具体的,对于第一图像计算得到的池化特征而言,其将作为第二图像的卷积核,且在第二图像的池化层输出的特征层上进行卷积,得到各个待匹配区域相对第一图像的目标区域的匹配分值,最后根据对应的匹配分值图从待匹配区域中确定第二图像中的目标区域。Specifically, for the pooled feature calculated by the first image, it will be used as a convolution kernel of the second image, and convolved on the feature layer outputted by the pooled layer of the second image to obtain each to-be-matched region. The matching score of the target area of the first image is determined, and finally the target area in the second image is determined from the to-be-matched area according to the corresponding matching score map.
本发明实施例提供的基于卷积神经网络的目标匹配方法,与现有技术中的点特征匹配方法和线特征匹配方法的准确度较差,且面特征匹配方法的效率较低相比,其首先获取第一图像和第二图像,其次对第一图像中的 目标区域进行池化特征的计算,再次基于上述计算得到的池化特征对第二图像进行遍历匹配,最后根据遍历匹配得到的匹配分值图确定第二图像中的目标区域,其采用第一图像的池化特征对第二图像进行遍历匹配,匹配的准确度较佳、效率也较高。The target matching method based on the convolutional neural network provided by the embodiment of the present invention is less accurate than the point feature matching method and the line feature matching method in the prior art, and the surface feature matching method is less efficient than the First acquiring the first image and the second image, and secondly for the first image The target area is calculated by the pooling feature, and the second image is traversed and matched based on the calculated pooling feature. Finally, the target region in the second image is determined according to the matching score map obtained by the traversal matching, and the first image is adopted. The pooling feature performs traversal matching on the second image, and the matching accuracy is better and the efficiency is higher.
为了更好的计算第一图像中目标区域的池化特征,上述S102的计算过程,具体通过如下步骤实现,参见图2所示的流程图,所述方法还包括:In order to better calculate the pooling feature of the target area in the first image, the calculation process of the above S102 is specifically implemented by the following steps. Referring to the flowchart shown in FIG. 2, the method further includes:
S201、基于预先获取的CNN提取第一图像的第一基础特征层。S201. Extract a first basic feature layer of the first image based on the pre-acquired CNN.
S202、根据第一图像中目标区域的位置和CNN的降维比率,计算第一基础特征层中相对于目标区域的第一窗口的位置。S202. Calculate, according to a location of the target area in the first image and a dimensionality reduction ratio of the CNN, a position of the first window in the first basic feature layer relative to the target area.
具体的,本发明实施例所提供的基于卷积神经网络的目标匹配方法将第一图像作为输入层输入预先训练的CNN,以CNN输出作为基础特征层。本发明实施例是根据第一图像中目标区域的位置和CNN的降维比率,计算第一基础特征层中相对于目标区域的第一窗口的位置。具体的,假设第一图像的尺寸为[W1_0,H1_0],卷积神经网络的降维比率为R,且第一图像中框选的矩形目标区域的左上角点坐标为(X0_lt,Y0_lt),右下角点坐标为(X0_rb,Y0_rb),则第一图像的基础特征层的尺寸为[W1,H1]=[Floor(W1_0/R),Floor(H1_0/R)](其中,Floor表示向下取整),对应的第一基础特征层的第一窗口的位置为:Specifically, the target matching method based on the convolutional neural network provided by the embodiment of the present invention inputs the first image as an input layer into the pre-trained CNN, and uses the CNN output as the basic feature layer. In the embodiment of the present invention, the position of the first window in the first basic feature layer relative to the target area is calculated according to the position of the target area in the first image and the dimensionality reduction ratio of the CNN. Specifically, assuming that the size of the first image is [W1_0, H1_0], the dimensionality reduction ratio of the convolutional neural network is R, and the coordinates of the upper left corner of the rectangular target area selected in the first image are (X0_lt, Y0_lt), The coordinates of the lower right corner point are (X0_rb, Y0_rb), and the size of the base feature layer of the first image is [W1, H1] = [Floor(W1_0/R), Floor(H1_0/R)] (where Floor represents downward Rounding), the position of the first window of the corresponding first basic feature layer is:
左上点坐标为(X1_lt,Y1_lt)=(Floor(X0_lt/R),Floor(Y0_lt/R));The coordinates of the upper left point are (X1_lt, Y1_lt) = (Floor (X0_lt / R), Floor (Y0_lt / R));
右下点坐标为(X1_rb,Y1_rb)=(Floor(X0_rb/R),Floor(Y0_rb/R))。The coordinates of the lower right point are (X1_rb, Y1_rb) = (Floor (X0_rb/R), Floor (Y0_rb/R)).
另外,随着大数据时代的到来,只有比较复杂的模型,或者说表达能力强的模型,才能充分发掘海量数据中蕴藏的丰富信息,所以,本发明实施例中预先训练的CNN是可以对目标区域进行特征的深度学习的神经网络,由于CNN的特征检测层通过训练数据进行学习,所以在使用CNN时,避免了显式的特征抽取,而隐式地从训练数据中进行学习;再者由于同一 特征映射面上的神经元权值相同,所以网络可以并行学习,这也是卷积网络相对于神经元彼此相连网络的一大优势。In addition, with the advent of the era of big data, only a relatively complex model, or a model with strong expressive ability, can fully exploit the rich information contained in the massive data. Therefore, the pre-trained CNN in the embodiment of the present invention can target the target. The neural network for deep learning of features in the region, since the feature detection layer of CNN learns through training data, when using CNN, explicit feature extraction is avoided, and learning is implicitly learned from the training data; Same The weights of the neurons on the feature map are the same, so the network can learn in parallel, which is also a big advantage of the convolutional network relative to the network connected to the neurons.
S203、基于预设的池化参数和所述第一窗口的位置,确定第一基础特征层的第二窗口的位置。S203. Determine a location of the second window of the first basic feature layer based on the preset pooling parameter and the location of the first window.
为了更好根据第一基础特征层的第一窗口的位置确定第一基础特征层的第二窗口的位置,参见图3,上述第二窗口的位置的确定过程,具体通过如下步骤实现:In order to determine the position of the second window of the first basic feature layer according to the position of the first window of the first basic feature layer, referring to FIG. 3, the determining process of the position of the second window is specifically implemented by the following steps:
S2031、根据预设的池化层的最小窗口尺寸和第一窗口的位置,计算池化层的第一输出尺寸。S2031: Calculate a first output size of the pooling layer according to a preset minimum window size of the pooled layer and a position of the first window.
S2032、根据预设的池化层的最大输出尺寸和第一输出尺寸,计算池化层的第二输出尺寸。S2032: Calculate a second output size of the pooling layer according to a preset maximum output size of the pooled layer and a first output size.
S2033、根据第二输出尺寸和第一窗口的位置,计算池化层的窗口尺寸。S2033. Calculate a window size of the pooling layer according to the second output size and the position of the first window.
S2034、根据第二输出尺寸和窗口尺寸,计算第一基础特征层的第二窗口的位置。S2034. Calculate a position of the second window of the first basic feature layer according to the second output size and the window size.
具体的,本发明实施例所提供的基于卷积神经网络的目标匹配方法对于第一基础特征层的第二窗口的确定是基于预设的池化参数和上述第一窗口的位置。本发明实施例的一个具体实施例如下:Specifically, the determining of the second window of the first basic feature layer by the target matching method based on the convolutional neural network provided by the embodiment of the present invention is based on the preset pooling parameter and the position of the first window. A specific implementation of an embodiment of the present invention is as follows:
首先,根据预设的池化层的最小窗口尺寸和第一窗口的位置,计算池化层的第一输出尺寸。假设:池化层的最小窗口尺寸为[MinPoolX,MinPoolY],由上述计算的第一基础特征层的第一窗口的左上点坐标(X1_lt,Y1_lt)和右下点坐标(X1_rb,Y1_rb),可知,池化层的第一输出尺寸[PoolOutX_1,PoolOutY_1]为:First, the first output size of the pooled layer is calculated according to a preset minimum window size of the pooled layer and a position of the first window. Assume that the minimum window size of the pooled layer is [MinPoolX, MinPoolY], and the coordinates of the upper left point (X1_lt, Y1_lt) and the coordinates of the lower right point (X1_rb, Y1_rb) of the first window of the first basic feature layer calculated above are known. The first output size [PoolOutX_1, PoolOutY_1] of the pooled layer is:
[Floor((X1_rb-X1_lt)/MinPoolX),Floor((Y1_rb-Y1_lt)/MinPoolY)]。[Floor((X1_rb-X1_lt)/MinPoolX), Floor((Y1_rb-Y1_lt)/MinPoolY)].
其次,根据预设的池化层的最大输出尺寸和第一输出尺寸,计算池化层的第二输出尺寸。假设:池化层的最大输出尺寸为[MaxPoolOutX, MaxPoolOutY],由上述第一输出尺寸[PoolOutX_1,PoolOutY_1],可知,池化层的第二输出尺寸[PoolOutX_2,PoolOutY_2]为:Secondly, the second output size of the pooled layer is calculated according to a preset maximum output size of the pooled layer and a first output size. Assume that the maximum output size of the pooled layer is [MaxPoolOutX, MaxPoolOutY], from the first output size [PoolOutX_1, PoolOutY_1], the second output size [PoolOutX_2, PoolOutY_2] of the pooled layer is:
[Max(PoolOutX_1,MaxPoolOutX),Max(PoolOutY_1,MaxPoolOutU)]。[Max(PoolOutX_1, MaxPoolOutX), Max(PoolOutY_1, MaxPoolOutU)].
再次,根据第二输出尺寸和第一窗口的位置,计算池化层的窗口尺寸。由上述第二输出尺寸[PoolOutX_2,PoolOutY_2]和第一基础特征层的第一窗口的左上点坐标(X1_lt,Y1_lt)和右下点坐标(X1_rb,Y1_rb),可知:池化层的窗口尺寸[PoolSizeX,PoolSizeY]为:Again, the window size of the pooled layer is calculated based on the second output size and the position of the first window. From the second output size [PoolOutX_2, PoolOutY_2] and the upper left point coordinates (X1_lt, Y1_lt) and the lower right point coordinates (X1_rb, Y1_rb) of the first window of the first basic feature layer, it can be known that the window size of the pooled layer [ PoolSizeX, PoolSizeY] is:
[Floor((X1_rb-X1_lt)/PoolOutX_2),Floor((Y1_rb-Y1_lt)/PoolOutY_2)]。[Floor((X1_rb-X1_lt)/PoolOutX_2), Floor((Y1_rb-Y1_lt)/PoolOutY_2)].
最后,根据第二输出尺寸和窗口尺寸,计算第一基础特征层的第二窗口的位置。由上述第二输出尺寸[PoolOutX_2,PoolOutY_2]和池化层的窗口尺寸[PoolSizeX,PoolSizeY],可知:第一基础特征层的第二窗口的位置为:Finally, the position of the second window of the first base feature layer is calculated based on the second output size and the window size. From the second output size [PoolOutX_2, PoolOutY_2] and the window size of the pooled layer [PoolSizeX, PoolSizeY], the position of the second window of the first basic feature layer is:
左上点坐标为:(X2_lt,Y2_lt)=(X1_lt,Y1_lt);The coordinates of the upper left point are: (X2_lt, Y2_lt) = (X1_lt, Y1_lt);
右下点坐标为:(X1_rb,Y1_rb)=(X1_lt+PoolOutX_2×PoolSizeX,Y1_lt+PoolOutY_2×PoolSizeY)。The coordinates of the lower right point are: (X1_rb, Y1_rb) = (X1_lt + PoolOutX_2 × PoolSizeX, Y1_lt + PoolOutY_2 × PoolSizeY).
其中,本发明实施例所提供的基于卷积神经网络的目标匹配方法还将池化层的池化步长设置为与池化窗口尺寸相同的值。The target matching method based on the convolutional neural network provided by the embodiment of the present invention further sets the pooling step size of the pooling layer to the same value as the pooling window size.
S204、将第二窗口的第一基础特征层输入至池化参数对应的池化层进行特征提取,得到池化特征。S204. Input the first basic feature layer of the second window to the pooling layer corresponding to the pooling parameter for feature extraction, to obtain a pooling feature.
具体的,根据上述各个池化参数来配置池化层,以第二窗口中的第一基础特征层作为输入,产生池化特征。设基础特征层包含C个通道,则局域池化特征的维度为[PoolOutX,PoolOutY,C]。Specifically, the pooling layer is configured according to each of the pooling parameters, and the first basic feature layer in the second window is used as an input to generate a pooling feature. Let the base feature layer contain C channels, then the dimension of the local pooling feature is [PoolOutX, PoolOutY, C].
考虑到本发明实施例所提供的基于卷积神经网络的目标匹配方法采用的是遍历匹配的方式实现第二图像对第一图像的目标匹配,而本发明实施例遍历匹配后得到的是匹配分值图,即对第二图像中的待匹配区域进行遍历匹配,以得到每个待匹配区域与第一图像中的目标区域的相关度信息。 参见图4,上述匹配分值图的生成过程,具体通过如下步骤实现,所述方法还包括:The target matching method based on the convolutional neural network provided by the embodiment of the present invention adopts a traversal matching manner to achieve target matching of the second image to the first image, and the traversal matching in the embodiment of the present invention obtains a matching score. The value map is traversed and matched to the to-be-matched area in the second image to obtain correlation information of each to-be-matched area and the target area in the first image. Referring to FIG. 4, the process of generating the matching score map is specifically implemented by the following steps, where the method further includes:
S301、基于预先获取的CNN提取第二图像的第二基础特征层。S301. Extract a second basic feature layer of the second image based on the pre-acquired CNN.
为了更好的进行第二图像和第一图像的匹配,在对第二图像进行特征提取之前,本发明实施例所提供的基于卷积神经网络的目标匹配方法将对第二图像进行缩放处理,因此,参见图5,上述对第二图像的特征提取具体通过如下步骤实现:In order to better perform the matching between the second image and the first image, the target matching method based on the convolutional neural network provided by the embodiment of the present invention performs scaling processing on the second image before performing feature extraction on the second image. Therefore, referring to FIG. 5, the feature extraction of the second image is specifically implemented by the following steps:
S3011、按照第一图像对第二图像进行缩放处理,得到缩放处理后的第二图像。S3011: Perform a scaling process on the second image according to the first image to obtain a second image after the scaling process.
S3012、基于预先获取的CNN提取缩放处理后的第二图像的第二基础特征层。S3012: Extract a second basic feature layer of the second image after the scaling process based on the pre-acquired CNN.
具体的,首先将第二图像缩放到与第一图像相对应的尺寸,对于图像检索而言,上述第二图像的尺寸与第一图像应相近,而对于图像跟踪而言,上述第二图像和第一图像的尺寸相同;然后再利用与第一图像相同的CNN提取缩放处理后的第二图像的第二基础特征层。Specifically, the second image is first scaled to a size corresponding to the first image. For image retrieval, the size of the second image should be similar to the first image, and for image tracking, the second image and The first image has the same size; then the second base feature layer of the scaled second image is extracted using the same CNN as the first image.
S302、为第二基础特征层分别配置匹配卷积层和模值卷积层;其中,匹配卷积层和模值卷积层使用的卷积核均为取自第一图像的归一化池化特征,归一化池化特征是对池化特征进行归一化处理得到的。S302. Configure a matching convolution layer and a modulus convolution layer for the second basic feature layer. The convolution kernel used by the matching convolution layer and the modulus convolution layer are all normalized cells taken from the first image. The characterization feature, the normalization pooling feature is obtained by normalizing the pooling features.
S303、根据匹配卷积层的输出和模值卷积层的输出之间的比值关系,得出第二图像的每个待匹配区域相对于第一图像的目标区域的匹配分值图。S303. Obtain a matching score map of each of the to-be-matched regions of the second image with respect to the target region of the first image according to a ratio relationship between the output of the matching convolutional layer and the output of the modulus convolutional layer.
具体的,本发明实施例所提供的基于卷积神经网络的目标匹配方法为第二基础特征层配置匹配卷基层和模值卷积层分别建立在配置的待匹配池化层和配置的模值计算层的基础之上,其中,上述为第二基础特征层配置匹配卷积层,参见图6,具体通过如下步骤实现: Specifically, the object matching method based on the convolutional neural network provided by the embodiment of the present invention is that the second basic feature layer is configured to match the volume base layer and the modulus value convolution layer respectively to be configured in the configured pooling layer to be matched and the configured modulus value. Based on the calculation layer, wherein the matching basic convolution layer is configured for the second basic feature layer, as shown in FIG. 6, the following steps are specifically implemented:
S401、基于池化层的窗口尺寸和窗口遍历颗粒度为第二基础特征层配置待匹配池化层,以根据待匹配池化层对第二基础特征层的输出按照池化层的窗口尺寸进行池化处理。S401. The window size of the pooling layer and the window traversal granularity are used to configure the pooling layer to be matched for the second basic feature layer, so that the output of the second basic feature layer according to the pooled layer to be matched is performed according to the window size of the pooling layer. Pool processing.
S402、根据归一化池化特征为待匹配池化层配置匹配卷积层,以根据匹配卷积层对待匹配池化层的输出按照归一化池化特征进行卷积处理。S402. Configure a matching convolution layer for the pooled layer to be matched according to the normalized pooling feature, so as to perform convolution processing according to the normalized pooling feature according to the output of the matching convolution layer to be matched with the pooling layer.
具体的,本发明实施例所提供的基于卷积神经网络的目标匹配方法首先在第二基础特征层之上配置一个待匹配池化层。其中,该待匹配池化层的窗口尺寸与第一图像池化层的池化窗口尺寸相同。另外,待匹配池化层的池化步长[PoolStepX2,PoolStepY2]代表窗口遍历的颗粒度,因此步长可为预设值,也可为随池化窗口尺寸增大而增大的整数。步长的取值范围在1到池化窗口尺寸之间。本发明实施例不做具体的限制,以满足不同用户的不同需求。Specifically, the object matching method based on the convolutional neural network provided by the embodiment of the present invention first configures a pooling layer to be matched on the second basic feature layer. The window size of the pooled layer to be matched is the same as the pooled window size of the first image pooling layer. In addition, the pooling step size of the pooled layer to be matched [PoolStepX2, PoolStepY2] represents the granularity of the window traversal, so the step size can be a preset value or an integer that increases as the size of the pooled window increases. The step size ranges from 1 to the pooled window size. The embodiments of the present invention are not specifically limited to meet the different needs of different users.
另外,本发明实施例在上述待匹配池化层上述还配置有匹配卷积层。该匹配卷积层以第一图像提取的归一化池化特征作为第二图像的匹配卷积层的卷积核,维度为[PoolOutX,PoolOutY,C]。设第二图像的待匹配池化层输出的维度为[W2,H2,C],则匹配卷积层输出的维度为[W2,H2,1],每个空间位置代表一个与第一图像局部特征的匹配值。In addition, in the embodiment of the present invention, the matching convolution layer is further disposed in the above-mentioned pooling layer to be matched. The matching convolution layer uses the normalized pooled feature extracted by the first image as the convolution kernel of the matching convolution layer of the second image, and the dimension is [PoolOutX, PoolOutY, C]. Let the dimension of the output of the pooled layer of the second image to be matched be [W2, H2, C], then the dimension of the output of the matching convolution layer is [W2, H2, 1], and each spatial position represents a part with the first image. The matching value of the feature.
其中,上述归一化池化特征是对池化特征进行归一化处理得到的结果,而本发明实施例通过如下步骤进行归一化:首先计算池化特征在空间维度[PoolOutX,PoolOutY]中每个位置的C维向量的模,并将每个位置的模累加。然后再将池化特征除以累加模值得到归一化的池化特征。The normalization pooling feature is a result of normalizing the pooled features, and the embodiment of the present invention normalizes by first calculating the pooling feature in the spatial dimension [PoolOutX, PoolOutY]. The modulus of the C-dimensional vector for each position, and the modules for each position are accumulated. The pooled feature is then divided by the accumulated modulus to obtain a normalized pooling feature.
另外,上述为第二基础特征层配置模值卷积层,参见图7,具体通过如下步骤实现:In addition, the foregoing is configured to configure a modulus convolution layer for the second basic feature layer. Referring to FIG. 7, the following steps are specifically implemented:
S501、基于模值运算对待匹配池化层配置模值计算层,以根据模值计算层对待匹配池化层的输出进行归一化处理。 S501. Configure a modulus calculation layer according to the modulo operation to match the pooling layer, and perform normalization processing on the output of the pooled layer to be matched according to the modulus calculation layer.
S502、根据归一化池化特征为模值计算层配置模值卷积层,以根据模值卷积层对模值计算层的输出按照归一化池化特征进行卷积处理。S502. Configure a modulus convolution layer for the modulus calculation layer according to the normalization pooling feature, so as to perform convolution processing on the output of the modulus calculation layer according to the normalized pooling feature according to the modulus convolution layer.
具体的,首先通过模值计算层计算每个位置的C维特征的模值,输出维度为[PoolOutX,PoolOutY,1]的模值。然后在上述模值计算层上配置模值卷积层,该卷积层的卷积核尺寸、卷积步长等参数与匹配卷积层相同,输入及输出通道数为1,卷积核值全部为1,偏移量为0。设第二图像基础特征层的维度为[W2,H2,C],则模值卷积层输出的维度为[W2,H2,1]。Specifically, the modulus value of the C-dimensional feature of each position is first calculated by the modulus calculation layer, and the modulus value of the dimension [PoolOutX, PoolOutY, 1] is output. Then, a modulus convolution layer is disposed on the modulus calculation layer, and the convolution kernel size, the convolution step size and the like of the convolution layer are the same as the matching convolution layer, and the number of input and output channels is 1, and the convolution kernel value is All are 1 and the offset is 0. Let the dimension of the second image base feature layer be [W2, H2, C], and the dimension of the modulo convolution layer output is [W2, H2, 1].
本发明实施例所提供的基于卷积神经网络的目标匹配方法根据上述配置后的匹配卷基层和模值卷积层的输出的两个标量图像点对点相除,得出第一图像中目标区域的池化特征在第二图像中各个待匹配区域的匹配分值图。The target matching method based on the convolutional neural network provided by the embodiment of the present invention divides two scalar image points of the output of the matched volume base layer and the modulus convolution layer according to the above configuration, and obtains a target area in the first image. The matching score map of the pooled features in each of the to-be-matched regions in the second image.
为了保证对第二图像进行卷积处理所采用的卷积核的各个像素的作用域与第一图像的目标区域的作用域相同,参见图8,本发明实施例所提供的基于卷积神经网络的目标匹配方法中的匹配卷积层的配置过程具体通过如下步骤实现:In order to ensure that the scope of each pixel of the convolution kernel used for convolution processing of the second image is the same as the scope of the target region of the first image, referring to FIG. 8, the convolutional neural network based on the embodiment of the present invention is provided. The matching convolution layer configuration process in the target matching method is specifically implemented by the following steps:
S601、根据池化层的窗口尺寸和窗口遍历颗粒度之间的差值运算结果对归一化池化特征进行加孔处理,得到加孔处理后的归一化池化特征。S601: Perform a hole-polishing process on the normalized pooling feature according to the difference between the window size of the pooling layer and the window traversal granularity, and obtain a normalized pooling feature after the hole processing.
S602、根据加孔处理后的归一化池化特征为待匹配池化层配置匹配卷积层。S602. Configure a matching convolution layer for the pooled layer to be matched according to the normalized pooling feature after the hole processing.
具体的,本发明实施例所提供的基于卷积神经网络的目标匹配方法是将第一图像的归一化池化特征作为第二图像的匹配卷积层的卷积核,且对该卷积核加孔,孔的维数为预匹配池化层池化窗口尺寸减去预匹配池化层的池化步长(即窗口遍历颗粒度),即[PoolSizeX-PoolStepX2,PoolSizeY-PoolStepY2]。然后再根据上述加孔处理后的归一化池化特征对待匹配池化层配置匹配卷积层,匹配卷积层的偏移量为0,卷积步长为1。 Specifically, the object matching method based on the convolutional neural network provided by the embodiment of the present invention is to use the normalized pooling feature of the first image as the convolution kernel of the matching convolution layer of the second image, and the convolution kernel The core is added with holes, and the dimension of the hole is the pooling window size of the pre-matching pooling layer minus the pooling step size of the pre-matching pooling layer (ie, window traversal granularity), that is, [PoolSizeX-PoolStepX2, PoolSizeY-PoolStepY2]. Then, according to the normalized pooling feature after the hole processing, the matching matching layer is configured to match the pooling layer, and the offset of the matching convolution layer is 0, and the convolution step is 1.
其中,所谓加孔,可以等效为将原卷积核每隔一个像素间填充若干个0,填充后等效的卷积核尺寸为[PoolOutX+PoolSizeX-PoolStepX2,PoolOutY+PoolSizeY-PoolStepY2],而实际卷积运算时程序可跳过充0位置的计算因而不会增加运算量。The so-called hole addition can be equivalent to filling the original convolution kernel with every other pixel by 0, and the equivalent convolution kernel size after filling is [PoolOutX+PoolSizeX-PoolStepX2, PoolOutY+PoolSizeY-PoolStepY2], and In the actual convolution operation, the program can skip the calculation of the zero position and thus does not increase the amount of calculation.
参见图9a和图9b,本发明实施例提供了一个卷积核加孔后的匹配示意图,图中卷积核尺寸为[2,2],孔的尺寸为[1,1]。点阵代表基础特征层。第一图像(图9a)中池化层的池化窗口尺寸、池化步长、池化输出尺寸均为[2,2]。第二图像(图9b)的待匹配池化层的池化窗口尺寸为[2,2],池化步长为[1,1]。第二图像的匹配卷积层卷积核尺寸为[2,2]。当不加孔时,[2,2]卷积核各像素的作用域(如细线框所示)存在重叠,与第一图像局部特征不同;当加[1,1]的孔时,卷积核作用域(如粗线框所示)与第一图像局部特征相同。Referring to FIG. 9a and FIG. 9b, an embodiment of the present invention provides a matching diagram of a convolution core after hole-adding, wherein the size of the convolution kernel is [2, 2], and the size of the hole is [1, 1]. The dot matrix represents the underlying feature layer. The pooling window size, pooling step size, and pooling output size of the pooled layer in the first image (Fig. 9a) are both [2, 2]. The pooled window size of the pooled layer to be matched in the second image (Fig. 9b) is [2, 2], and the pooling step size is [1, 1]. The matching convolutional layer convolution kernel size of the second image is [2, 2]. When no holes are added, the scope of each pixel of the [2, 2] convolution kernel (as shown by the thin line frame) overlaps, which is different from the local feature of the first image; when the hole of [1, 1] is added, the volume The nucleation scope (as indicated by the thick line box) is the same as the first image local feature.
另外,参见图10,本发明实施例所提供的基于卷积神经网络的目标匹配方法中的模值卷积层的配置过程具体通过如下步骤实现:In addition, referring to FIG. 10, the configuration process of the modulus convolution layer in the target matching method based on the convolutional neural network provided by the embodiment of the present invention is specifically implemented by the following steps:
S701、根据池化层的窗口尺寸和窗口遍历颗粒度之间的差值运算结果对归一化池化特征进行加孔处理,得到加孔处理后的归一化池化特征。S701. Perform a hole-polishing process on the normalized pooling feature according to a difference between a window size of the pooling layer and a window traversing granularity, to obtain a normalized pooling feature after the hole processing.
S702、根据加孔处理后的归一化池化特征为模值计算层配置模值卷积层。S702. Configure a modulus convolution layer for the modulus calculation layer according to the normalized pooling feature after the hole processing.
具体的,本发明实施例中的模值卷积层的卷积核尺寸、卷积步长、及加孔等参数与匹配卷积层均相同,同样的,上述加孔处理的过程也类型,在此不再赘述。然后再根据加孔处理后的上述加孔处理后的归一化池化特征对模值计算层配置模值卷积层。Specifically, the convolution kernel size, the convolution step size, and the hole-adding layer of the modulus convolution layer in the embodiment of the present invention are the same as the matching convolution layer. Similarly, the process of the above-mentioned hole-punching process is also I will not repeat them here. Then, the modulus convolution layer is configured on the modulus calculation layer according to the normalized pooling feature after the hole processing after the hole processing.
对于遍历匹配得到的匹配分值图而言,为了更好的确定第二图像相对第一图像的目标区域,上述S104的确定过程,具体通过如下步骤实现,所述方法还包括:For determining the matching score map obtained by the traversal matching, in order to better determine the target area of the second image relative to the first image, the determining process of the foregoing S104 is specifically implemented by the following steps, the method further comprising:
选取匹配分值图中的最高分值对应的待匹配区域作为第二图像中的目 标区域。Select the area to be matched corresponding to the highest score in the matching score map as the target in the second image. Standard area.
具体的,对于生成的匹配分值图而言,该匹配分值图是指遍历第二图像的每个待匹配区域相对于第一图像的目标区域的匹配相关度,对应像素点的匹配分值越高,则说明该待匹配区域与第一图像的目标区域越相似,本发明实施例选择匹配分值图的最高分值对应的待匹配区域作为第二图像中的目标区域。Specifically, for the generated matching score map, the matching score map refers to matching the degree of matching of each to-be-matched region of the second image with respect to the target region of the first image, and the matching score of the corresponding pixel The higher the value, the more similar the target area to the first image is. The embodiment of the present invention selects the to-be-matched area corresponding to the highest score of the matching score map as the target area in the second image.
本发明实施例提供的基于卷积神经网络的目标匹配方法,与现有技术中的点特征匹配方法和线特征匹配方法的准确度较差,且面特征匹配方法的效率较低相比,其首先获取第一图像和第二图像,其次对第一图像中的目标区域进行池化特征的计算,再次基于上述计算得到的池化特征对第二图像进行遍历匹配,最后根据遍历匹配得到的匹配分值图确定第二图像中的目标区域,其采用第一图像的池化特征对第二图像进行遍历匹配,匹配的准确度较佳、效率也较高。The target matching method based on the convolutional neural network provided by the embodiment of the present invention is less accurate than the point feature matching method and the line feature matching method in the prior art, and the surface feature matching method is less efficient than the Firstly, the first image and the second image are acquired, and then the target region in the first image is calculated by the pooling feature, and the second image is traversed and matched based on the calculated pooling feature again, and finally the matching is obtained according to the traversal matching. The score map determines a target area in the second image, and uses the pooled feature of the first image to perform traversal matching on the second image, and the matching accuracy is better and the efficiency is higher.
本发明实施例还提供了一种基于卷积神经网络的目标匹配装置,所述装置用于执行上述基于卷积神经网络的目标匹配方法,参见图11,所述装置包括:The embodiment of the present invention further provides a target matching device based on a convolutional neural network, where the device is used to perform the above-described convolutional neural network-based target matching method. Referring to FIG. 11, the device includes:
获取模块11,配置为获取第一图像和第二图像;The obtaining module 11 is configured to acquire the first image and the second image;
计算模块22,配置为计算第一图像中目标区域的池化特征;The calculating module 22 is configured to calculate a pooling feature of the target area in the first image;
生成模块33,配置为基于池化特征对第二图像进行遍历匹配,得到对应的匹配分值图;The generating module 33 is configured to perform traversal matching on the second image based on the pooling feature to obtain a corresponding matching score map.
确定模块44,配置为根据匹配分值图确定第二图像中的目标区域。The determining module 44 is configured to determine a target area in the second image based on the matching score map.
为了更好的计算第一图像中目标区域的池化特征,上述计算模块22包括:In order to better calculate the pooling feature of the target area in the first image, the calculation module 22 includes:
第一提取子模块,配置为基于预先获取的CNN提取第一图像的第一基础特征层; a first extraction submodule configured to extract a first basic feature layer of the first image based on the pre-acquired CNN;
计算子模块,配置为根据第一图像中目标区域的位置和CNN的降维比率,计算第一基础特征层中相对于目标区域的第一窗口的位置;a calculating submodule configured to calculate a position of the first window in the first base feature layer relative to the target area according to the position of the target area in the first image and the dimensionality reduction ratio of the CNN;
确定子模块,配置为基于预设的池化参数和第一窗口的位置,确定第一基础特征层的第二窗口的位置;Determining a submodule configured to determine a location of the second window of the first basic feature layer based on the preset pooling parameter and the location of the first window;
第一生成子模块,配置为将第二窗口的第一基础特征层输入至池化参数对应的池化层进行特征提取,得到池化特征。The first generation sub-module is configured to input the first basic feature layer of the second window to the pooling layer corresponding to the pooling parameter for feature extraction, to obtain a pooling feature.
为了更好根据第一基础特征层的第一窗口的位置确定第一基础特征层的第二窗口的位置,上述确定子模块包括:In order to determine the position of the second window of the first basic feature layer according to the position of the first window of the first basic feature layer, the determining submodule comprises:
第一计算单元,配置为根据预设的池化层的最小窗口尺寸和第一窗口的位置,计算池化层的第一输出尺寸;a first calculating unit, configured to calculate a first output size of the pooling layer according to a preset minimum window size of the pooling layer and a position of the first window;
第二计算单元,配置为根据预设的池化层的最大输出尺寸和第一输出尺寸,计算池化层的第二输出尺寸;a second calculating unit, configured to calculate a second output size of the pooling layer according to a preset maximum output size of the pooled layer and a first output size;
第三计算单元,配置为根据第二输出尺寸和第一窗口的位置,计算池化层的窗口尺寸;a third calculating unit, configured to calculate a window size of the pooling layer according to the second output size and the position of the first window;
第四计算单元,配置为根据第二输出尺寸和窗口尺寸,计算第一基础特征层的第二窗口的位置。And a fourth calculating unit configured to calculate a position of the second window of the first basic feature layer according to the second output size and the window size.
考虑到本发明实施例所提供的基于卷积神经网络的目标匹配装置采用的是遍历匹配的方式实现第二图像对第一图像的目标匹配,而本发明实施例遍历匹配后得到的是匹配分值图,即对第二图像中的待匹配区域进行遍历匹配,以得到每个待匹配区域与第一图像中的目标区域的相关度信息。本发明实施例所提供的基于卷积神经网络的目标匹配装置还包括生成模块33,上述生成模块33包括:The target matching device based on the convolutional neural network provided by the embodiment of the present invention adopts a traversal matching manner to achieve target matching of the second image to the first image, and the traversal matching in the embodiment of the present invention obtains a matching score. The value map is traversed and matched to the to-be-matched area in the second image to obtain correlation information of each to-be-matched area and the target area in the first image. The target matching device based on the convolutional neural network provided by the embodiment of the present invention further includes a generating module 33, and the generating module 33 includes:
第二提取子模块,配置为基于预先获取的CNN提取第二图像的第二基础特征层;a second extraction submodule configured to extract a second basic feature layer of the second image based on the pre-acquired CNN;
配置子模块,配置为为第二基础特征层分别配置匹配卷积层和模值卷 积层;其中,匹配卷积层和模值卷积层使用的卷积核均为取自第一图像的归一化池化特征,归一化池化特征是对池化特征进行归一化处理得到的;The configuration submodule is configured to respectively configure a matching convolution layer and a modulus volume for the second basic feature layer. The stacking kernel used in the matching convolution layer and the modulus convolution layer is a normalized pooling feature taken from the first image, and the normalized pooling feature is to normalize the pooling feature. Processed
第二生成单元子模块,配置为根据匹配卷积层的输出和模值卷积层的输出之间的比值关系,得出第二图像的每个待匹配区域相对于第一图像的目标区域的匹配分值图。a second generating unit submodule configured to obtain, according to a ratio relationship between an output of the matching convolution layer and an output of the modulo convolution layer, obtaining a target region of the second image relative to the target region of the first image Match the score map.
为了更好的进行第二图像和第一图像的匹配,在对第二图像进行特征提取之前,本发明实施例所提供的基于卷积神经网络的目标匹配装置将对第二图像进行缩放处理,因此,上述第二提取子模块包括:In order to better perform the matching between the second image and the first image, the target matching device based on the convolutional neural network provided by the embodiment of the present invention performs scaling processing on the second image before performing feature extraction on the second image. Therefore, the second extraction submodule includes:
缩放单元,配置为按照第一图像对第二图像进行缩放处理,得到缩放处理后的第二图像;a scaling unit configured to perform a scaling process on the second image according to the first image to obtain a second image after the scaling process;
提取单元,配置为基于预先获取的CNN提取缩放处理后的第二图像的第二基础特征层。And an extracting unit configured to extract a second basic feature layer of the second image after the scaling process based on the pre-acquired CNN.
其中,本发明实施例所提供的基于卷积神经网络的目标匹配装置法为第二基础特征层配置匹配卷基层和模值卷积层分别建立在配置的待匹配池化层和配置的模值计算层的基础之上,其中,上述配置子模块包括:The target matching device method based on the convolutional neural network provided by the embodiment of the present invention is configured to configure a matching volume base layer and a modulus value convolution layer respectively in the configured pooling layer to be matched and the configured modulus value. Based on the computing layer, the configuration sub-module includes:
第一配置单元,配置为基于池化层的窗口尺寸和窗口遍历颗粒度为第二基础特征层配置待匹配池化层,以根据待匹配池化层对第二基础特征层的输出按照池化层的窗口尺寸进行池化处理;The first configuration unit is configured to configure a pooling layer to be matched according to the window size of the pooling layer and the window traversal granularity, so as to be pooled according to the output of the pooled layer to be matched to the second basic layer The window size of the layer is pooled;
第二配置单元,配置为根据归一化池化特征为待匹配池化层配置匹配卷积层,以根据匹配卷积层对待匹配池化层的输出按照归一化池化特征进行卷积处理;a second configuration unit, configured to configure a matching convolution layer for the pooled layer to be matched according to the normalized pooling feature, to perform convolution processing according to the normalized pooling feature according to the output of the matching convolution layer to be matched with the pooling layer ;
第三配置单元,配置为基于模值运算对待匹配池化层配置模值计算层,以根据模值计算层对待匹配池化层的输出进行归一化处理;a third configuration unit, configured to configure a modulus calculation layer to be matched with the pooling layer based on the modulus value operation, to perform normalization processing on the output of the pooled layer to be matched according to the modulus calculation layer;
第四配置单元,配置为根据归一化池化特征为模值计算层配置模值卷积层,以根据模值卷积层对模值计算层的输出按照归一化池化特征进行卷 积处理。a fourth configuration unit configured to configure a modulus convolution layer for the modulus calculation layer according to the normalization pooling feature, to perform volume according to the normalized pooling feature according to the modulus convolution layer Product processing.
为了保证对第二图像进行卷积处理所采用的卷积核的各个像素的作用域与第一图像的目标区域的作用域相同,本发明实施例所提供的基于卷积神经网络的目标匹配装置中的第二配置单元包括:In order to ensure that the scope of each pixel of the convolution kernel used in the convolution process of the second image is the same as the scope of the target region of the first image, the object matching device based on the convolutional neural network provided by the embodiment of the present invention is provided. The second configuration unit in the middle includes:
第一加孔子单元,配置为根据池化层的窗口尺寸和窗口遍历颗粒度之间的差值运算结果对归一化池化特征进行加孔处理,得到加孔处理后的归一化池化特征;The first hole-adding sub-unit is configured to perform a hole-polishing process on the normalized pooling feature according to the difference between the window size of the pooling layer and the window traversing granularity, and obtain a normalized pooling after the hole processing feature;
第一配置子单元,配置为根据加孔处理后的归一化池化特征为待匹配池化层配置匹配卷积层。The first configuration subunit is configured to configure a matching convolution layer for the pooled layer to be matched according to the normalized pooling feature after the hole processing.
另外,本发明实施例所提供的基于卷积神经网络的目标匹配装置中的第四配置单元包括:In addition, the fourth configuration unit in the target matching device based on the convolutional neural network provided by the embodiment of the present invention includes:
第二加孔子单元,配置为根据池化层的窗口尺寸和窗口遍历颗粒度之间的差值运算结果对归一化池化特征进行加孔处理,得到加孔处理后的归一化池化特征;The second hole-adding sub-unit is configured to perform a hole-polishing process on the normalized pooling feature according to the difference between the window size of the pooling layer and the window traversing granularity, and obtain a normalized pooling after the hole processing feature;
第二配置子单元,配置为根据加孔处理后的归一化池化特征为模值计算层配置模值卷积层。The second configuration subunit is configured to configure the modulus convolution layer for the modulus calculation layer according to the normalized pooling feature after the hole processing.
对于遍历匹配得到的匹配分值图而言,为了更好的确定第二图像相对第一图像的目标区域,上述确定模块44,还配置为选取匹配分值图中的最高分值对应的待匹配区域作为第二图像中的目标区域。For the matching score map obtained by the traversal matching, in order to better determine the target area of the second image relative to the first image, the determining module 44 is further configured to select the highest score corresponding to the matching score map to be matched. The area serves as a target area in the second image.
本发明实施例中,所述基于卷积神经网络的目标匹配装置中的获取模块11、计算模块22、生成模块33和确定模块44,以及上述各模块中所包含的子模块,在实际应用中均可由所述装置中的中央处理器(CPU,Central Processing Unit)、数字信号处理器(DSP,Digital Signal Processor)、微控制单元(MCU,Microcontroller Unit)或可编程门阵列(FPGA,Field-Programmable Gate Array)实现。 In the embodiment of the present invention, the acquisition module 11, the calculation module 22, the generation module 33, and the determination module 44 in the target matching device based on the convolutional neural network, and the submodules included in the above modules are in practical applications. The central processing unit (CPU), the digital signal processor (DSP), the micro control unit (MCU), or the programmable gate array (FPGA, Field-Programmable) can be used in the device. Gate Array) implementation.
本发明实施例提供的基于卷积神经网络的目标匹配装置,与现有技术中的点特征匹配方法和线特征匹配方法的准确度较差,且面特征匹配方法的效率较低相比,其首先获取第一图像和第二图像,其次对第一图像中的目标区域进行池化特征的计算,再次基于上述计算得到的池化特征对第二图像进行遍历匹配,最后根据遍历匹配得到的匹配分值图确定第二图像中的目标区域,其采用第一图像的池化特征对第二图像进行遍历匹配,匹配的准确度较佳、效率也较高。The target matching device based on the convolutional neural network provided by the embodiment of the present invention has poor accuracy with the point feature matching method and the line feature matching method in the prior art, and the surface feature matching method is less efficient than the Firstly, the first image and the second image are acquired, and then the target region in the first image is calculated by the pooling feature, and the second image is traversed and matched based on the calculated pooling feature again, and finally the matching is obtained according to the traversal matching. The score map determines a target area in the second image, and uses the pooled feature of the first image to perform traversal matching on the second image, and the matching accuracy is better and the efficiency is higher.
另外,本发明实施例所提供的基于卷积神经网络的目标匹配方法及装置还可以应用于图像检索和图像跟踪中,其中,应用于图像检索中,能够带来如下技术效果:In addition, the object matching method and apparatus based on the convolutional neural network provided by the embodiments of the present invention can also be applied to image retrieval and image tracking, wherein the application of the image retrieval can bring the following technical effects:
1、采用深度学习技术提高了滑框法定位目标的鲁棒性;1. Using deep learning technology to improve the robustness of the sliding frame method;
2、提出了一种运算效率高且便于并行化的滑窗遍历方法。A sliding window traversal method with high computational efficiency and convenient parallelization is proposed.
应用于图像跟踪中,还能够带来如下技术效果:Applied to image tracking, it can also bring the following technical effects:
1、基于深度学习技术,提高了跟踪的成功率和稳定性;1. Based on deep learning technology, the success rate and stability of tracking are improved;
2、跟踪初始阶段和跟踪过程中不需要对神经网络进行训练,大幅缩减了单目标跟踪耗时;2. The neural network is not required to be trained during the initial stage of tracking and tracking, which greatly reduces the time consumption of single target tracking;
3、多目标跟踪时,各跟踪共享基础特征层,相比于基础特征层的运算量,每个跟踪单独的运算量非常小,因此适合实现视频实时多目标跟踪。3. In multi-target tracking, each tracking shares the basic feature layer. Compared with the calculation amount of the basic feature layer, the individual operation amount of each tracking is very small, so it is suitable for real-time video multi-target tracking.
本发明实施例所提供的进行基于卷积神经网络的目标匹配的方法的计算机程序产品,包括存储了程序代码的计算机可读存储介质,所述程序代码包括的指令可用于执行前面方法实施例中所述的方法,具体实现可参见方法实施例,在此不再赘述。A computer program product for performing a method for target matching based on a convolutional neural network according to an embodiment of the present invention, comprising a computer readable storage medium storing program code, the program code comprising instructions operable to execute the foregoing method embodiment For the specific implementation of the method, refer to the method embodiment, and details are not described herein again.
本发明实施例所提供的基于卷积神经网络的目标匹配的装置可以为设备上的特定硬件或者安装于设备上的软件或固件等。本发明实施例所提供的装置,其实现原理及产生的技术效果和前述方法实施例相同,为简要描 述,装置实施例部分未提及之处,可参考前述方法实施例中相应内容。所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,前述描述的系统、装置和单元的具体工作过程,均可以参考上述方法实施例中的对应过程,在此不再赘述。The device for matching the target based on the convolutional neural network provided by the embodiment of the present invention may be specific hardware on the device or software or firmware installed on the device. The implementation principle and the technical effects produced by the device provided by the embodiments of the present invention are the same as those of the foregoing method embodiments, and are briefly described. Where the device embodiment is not mentioned, reference may be made to the corresponding content in the foregoing method embodiments. A person skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working processes of the foregoing system, the device and the unit can refer to the corresponding processes in the foregoing method embodiments, and details are not described herein again.
在本发明所提供的实施例中,应该理解到,所揭露装置和方法,可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,又例如,多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the embodiments provided by the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. The device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some communication interface, device or unit, and may be electrical, mechanical or otherwise.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
另外,在本发明提供的实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in the embodiment provided by the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only  Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。The functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product. Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including The instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention. The foregoing storage medium includes: a U disk, a mobile hard disk, a read only memory (ROM, Read-Only) Memory, random access memory (RAM), disk or optical disk, and other media that can store program code.
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释,此外,术语“第一”、“第二”、“第三”等仅用于区分描述,而不能理解为指示或暗示相对重要性。It should be noted that similar reference numerals and letters indicate similar items in the following figures. Therefore, once an item is defined in a drawing, it is not necessary to further define and explain it in the subsequent drawings. Moreover, the terms "first", "second", "third", and the like are used merely to distinguish a description, and are not to be construed as indicating or implying a relative importance.
最后应说明的是:以上所述实施例,仅为本发明的具体实施方式,用以说明本发明的技术方案,而非对其限制,本发明的保护范围并不局限于此,尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化,或者对其中部分技术特征进行等同替换;而这些修改、变化或者替换,并不使相应技术方案的本质脱离本发明实施例技术方案的精神和范围。都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应所述以权利要求的保护范围为准。Finally, it should be noted that the above-mentioned embodiments are merely specific embodiments of the present invention, and are used to explain the technical solutions of the present invention, and are not limited thereto, and the scope of protection of the present invention is not limited thereto, although reference is made to the foregoing. The present invention has been described in detail, and those skilled in the art should understand that any one skilled in the art can still modify the technical solutions described in the foregoing embodiments within the technical scope disclosed by the present invention. The changes may be easily conceived, or equivalents may be substituted for some of the technical features. The modifications, variations, and substitutions of the present invention do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention. All should be covered by the scope of the present invention. Therefore, the scope of the invention should be determined by the scope of the claims.
工业实用性Industrial applicability
本发明实施例的技术方案通过首先获取第一图像和第二图像,其次对第一图像中的目标区域进行池化特征的计算,再次基于上述计算得到的池化特征对第二图像进行遍历匹配,最后根据遍历匹配得到的匹配分值图确定第二图像中的目标区域,其采用第一图像的池化特征对第二图像进行遍历匹配,匹配的准确度较佳、效率也较高。 The technical solution of the embodiment of the present invention first acquires the first image and the second image, and then performs the calculation of the pooled feature on the target region in the first image, and performs traversal matching on the second image again based on the calculated pooled feature. Finally, the target region in the second image is determined according to the matching score map obtained by the traversal matching, and the second image is traversed and matched by using the pooling feature of the first image, and the matching accuracy is better and the efficiency is higher.

Claims (19)

  1. 一种基于卷积神经网络的目标匹配方法,包括:A target matching method based on convolutional neural network, comprising:
    获取第一图像和第二图像;Obtaining the first image and the second image;
    计算所述第一图像中目标区域的池化特征;Calculating a pooling feature of the target area in the first image;
    基于所述池化特征对所述第二图像进行遍历匹配,得到对应的匹配分值图;Performing traversal matching on the second image based on the pooling feature to obtain a corresponding matching score map;
    根据所述匹配分值图确定所述第二图像中的目标区域。Determining a target area in the second image according to the matching score map.
  2. 根据权利要求1所述的方法,其中,所述计算所述第一图像中目标区域的池化特征,包括:The method of claim 1, wherein the calculating the pooling characteristics of the target area in the first image comprises:
    基于预先获取的卷积神经网络CNN提取所述第一图像的第一基础特征层;Extracting a first basic feature layer of the first image based on a pre-acquired convolutional neural network CNN;
    根据所述第一图像中目标区域的位置和所述CNN的降维比率,计算所述第一基础特征层中相对于目标区域的第一窗口的位置;Calculating a position of the first window in the first base feature layer relative to the target area according to a location of the target area in the first image and a reduction ratio of the CNN;
    基于预设的池化参数和所述第一窗口的位置,确定所述第一基础特征层的第二窗口的位置;Determining a location of the second window of the first base feature layer based on the preset pooling parameter and the location of the first window;
    将所述第二窗口的第一基础特征层输入至所述池化参数对应的池化层进行特征提取,得到池化特征。The first basic feature layer of the second window is input to the pooling layer corresponding to the pooling parameter for feature extraction, and a pooling feature is obtained.
  3. 根据权利要求2所述的方法,其中,所述基于预设的池化参数和所述第一窗口的位置,确定所述第一基础特征层的第二窗口的位置,包括:The method of claim 2, wherein the determining the location of the second window of the first base feature layer based on the preset pooling parameter and the location of the first window comprises:
    根据预设的池化层的最小窗口尺寸和所述第一窗口的位置,计算池化层的第一输出尺寸;Calculating a first output size of the pooling layer according to a preset minimum window size of the pooled layer and a position of the first window;
    根据预设的池化层的最大输出尺寸和所述第一输出尺寸,计算所述池化层的第二输出尺寸;Calculating a second output size of the pooling layer according to a preset maximum output size of the pooled layer and the first output size;
    根据所述第二输出尺寸和所述第一窗口的位置,计算所述池化层的窗口尺寸; Calculating a window size of the pooling layer according to the second output size and a position of the first window;
    根据所述第二输出尺寸和所述窗口尺寸,计算所述第一基础特征层的第二窗口的位置。Calculating a position of the second window of the first base feature layer according to the second output size and the window size.
  4. 根据权利要求3所述的方法,其中,所述基于所述池化特征对所述第二图像进行遍历匹配,得到对应的匹配分值图,包括:The method according to claim 3, wherein the traversing matching of the second image based on the pooling feature to obtain a corresponding matching score map comprises:
    基于预先获取的CNN提取所述第二图像的第二基础特征层;Extracting a second basic feature layer of the second image based on the pre-acquired CNN;
    为所述第二基础特征层分别配置匹配卷积层和模值卷积层;其中,所述匹配卷积层和所述模值卷积层使用的卷积核均为取自所述第一图像的归一化池化特征,所述归一化池化特征是对所述池化特征进行归一化处理得到的;Configuring a matching convolution layer and a modulus convolution layer for the second basic feature layer; wherein the matching convolution layer and the convolution kernel used by the modulus convolution layer are all taken from the first a normalized pooling feature of the image, wherein the normalized pooling feature is obtained by normalizing the pooled feature;
    根据所述匹配卷积层的输出和所述模值卷积层的输出之间的比值关系,得出所述第二图像的每个待匹配区域相对于第一图像的目标区域的匹配分值图。And obtaining, according to a ratio relationship between an output of the matching convolution layer and an output of the modulo convolution layer, a matching score of each of the to-be-matched regions of the second image with respect to a target region of the first image Figure.
  5. 根据权利要求4所述的方法,其中,所述基于预先获取的CNN提取所述第二图像的第二基础特征层,包括:The method according to claim 4, wherein the extracting the second basic feature layer of the second image based on the pre-acquired CNN comprises:
    按照所述第一图像对所述第二图像进行缩放处理,得到缩放处理后的第二图像;Performing a scaling process on the second image according to the first image to obtain a second image after the scaling process;
    基于预先获取的CNN提取所述缩放处理后的第二图像的第二基础特征层。And extracting, according to the pre-acquired CNN, the second basic feature layer of the second image after the scaling process.
  6. 根据权利要求4所述的方法,其中,为所述第二基础特征层配置匹配卷积层,包括:The method of claim 4, wherein configuring the matching convolution layer for the second base feature layer comprises:
    基于所述池化层的窗口尺寸和窗口遍历颗粒度为所述第二基础特征层配置待匹配池化层,以根据所述待匹配池化层对第二基础特征层的输出按照所述池化层的窗口尺寸进行池化处理;And configuring, according to the window size of the pooling layer and the window traversal granularity, a pooling layer to be matched for the second basic feature layer, according to the output of the second basic feature layer according to the pooled layer to be matched according to the pool The window size of the layer is pooled;
    根据所述归一化池化特征为所述待匹配池化层配置匹配卷积层,以根据所述匹配卷积层对待匹配池化层的输出按照所述归一化池化特征进行卷 积处理;And configuring, according to the normalized pooling feature, a matching convolution layer for the pooled layer to be matched, to perform volume according to the normalized pooling feature according to the output of the matching convolution layer to be matched with the pooling layer Product processing
    所述为所述第二基础特征层配置模值卷积层,包括:The configuring the modulus convolution layer for the second basic feature layer includes:
    基于模值运算对所述待匹配池化层配置模值计算层,以根据所述模值计算层对所述待匹配池化层的输出进行归一化处理;And configuring a modulus calculation layer on the to-be-matched pooling layer to perform normalization processing on the output of the to-be-matched pooling layer according to the modulus calculation layer;
    根据所述归一化池化特征为所述模值计算层配置模值卷积层,以根据所述模值卷积层对模值计算层的输出按照归一化池化特征进行卷积处理。And configuring a modulus convolution layer for the modulus calculation layer according to the normalization pooling feature, to perform convolution processing according to the normalized pooling feature according to the output of the modulus value convolution layer .
  7. 根据权利要求6所述的方法,其中,所述根据所述归一化池化特征为所述待匹配池化层配置匹配卷积层,包括:The method of claim 6, wherein the configuring the matching convolution layer for the to-be-matched pooling layer according to the normalized pooling feature comprises:
    根据所述池化层的窗口尺寸和窗口遍历颗粒度之间的差值运算结果对所述归一化池化特征进行加孔处理,得到加孔处理后的归一化池化特征;Performing a hole-polishing process on the normalized pooling feature according to a difference between a window size of the pooling layer and a window traversing granularity, to obtain a normalized pooling feature after the hole processing;
    根据所述加孔处理后的归一化池化特征为所述待匹配池化层配置匹配卷积层。And matching the convolution layer to the pooled layer to be matched according to the normalized pooling feature after the hole processing.
  8. 根据权利要求6所述的方法,其中,所述根据所述归一化池化特征为所述模值计算层配置模值卷积层,包括:The method of claim 6, wherein the configuring the modulus convolution layer for the modulus calculation layer according to the normalization pooling feature comprises:
    根据所述池化层的窗口尺寸和窗口遍历颗粒度之间的差值运算结果对所述归一化池化特征进行加孔处理,得到加孔处理后的归一化池化特征;Performing a hole-polishing process on the normalized pooling feature according to a difference between a window size of the pooling layer and a window traversing granularity, to obtain a normalized pooling feature after the hole processing;
    根据所述加孔处理后的归一化池化特征为所述模值计算层配置模值卷积层。And configuring a modulus convolution layer for the modulus calculation layer according to the normalized pooling feature after the hole processing.
  9. 根据权利要求4所述的方法,其中,所述根据所述匹配分值图确定所述第二图像中的目标区域,包括:The method of claim 4, wherein the determining the target area in the second image based on the matching score map comprises:
    选取匹配分值图中的最高分值对应的待匹配区域作为第二图像中的目标区域。The area to be matched corresponding to the highest score in the matching score map is selected as the target area in the second image.
  10. 一种基于卷积神经网络的目标匹配装置,包括:A target matching device based on a convolutional neural network, comprising:
    获取模块,配置为获取第一图像和第二图像;Obtaining a module, configured to acquire the first image and the second image;
    计算模块,配置为计算所述第一图像中目标区域的池化特征; a calculation module configured to calculate a pooling feature of the target area in the first image;
    生成模块,配置为基于所述池化特征对所述第二图像进行遍历匹配,得到对应的匹配分值图;a generating module, configured to perform traversal matching on the second image based on the pooling feature, to obtain a corresponding matching score map;
    确定模块,配置为根据所述匹配分值图确定所述第二图像中的目标区域。a determining module configured to determine a target area in the second image based on the matching score map.
  11. 根据权利要求10所述的装置,其中,所述计算模块包括:The apparatus of claim 10 wherein said computing module comprises:
    第一提取子模块,配置为基于预先获取的CNN提取第一图像的第一基础特征层;a first extraction submodule configured to extract a first basic feature layer of the first image based on the pre-acquired CNN;
    计算子模块,配置为根据第一图像中目标区域的位置和CNN的降维比率,计算第一基础特征层中相对于目标区域的第一窗口的位置;a calculating submodule configured to calculate a position of the first window in the first base feature layer relative to the target area according to the position of the target area in the first image and the dimensionality reduction ratio of the CNN;
    确定子模块,配置为基于预设的池化参数和第一窗口的位置,确定第一基础特征层的第二窗口的位置;Determining a submodule configured to determine a location of the second window of the first basic feature layer based on the preset pooling parameter and the location of the first window;
    第一生成子模块,配置为将第二窗口的第一基础特征层输入至池化参数对应的池化层进行特征提取,得到池化特征。The first generation sub-module is configured to input the first basic feature layer of the second window to the pooling layer corresponding to the pooling parameter for feature extraction, to obtain a pooling feature.
  12. 根据权利要求11所述的装置,其中,所述确定子模块包括:The apparatus of claim 11 wherein said determining sub-module comprises:
    第一计算单元,配置为根据预设的池化层的最小窗口尺寸和第一窗口的位置,计算池化层的第一输出尺寸;a first calculating unit, configured to calculate a first output size of the pooling layer according to a preset minimum window size of the pooling layer and a position of the first window;
    第二计算单元,配置为根据预设的池化层的最大输出尺寸和第一输出尺寸,计算池化层的第二输出尺寸;a second calculating unit, configured to calculate a second output size of the pooling layer according to a preset maximum output size of the pooled layer and a first output size;
    第三计算单元,配置为根据第二输出尺寸和第一窗口的位置,计算池化层的窗口尺寸;a third calculating unit, configured to calculate a window size of the pooling layer according to the second output size and the position of the first window;
    第四计算单元,配置为根据第二输出尺寸和窗口尺寸,计算第一基础特征层的第二窗口的位置。And a fourth calculating unit configured to calculate a position of the second window of the first basic feature layer according to the second output size and the window size.
  13. 根据权利要求12所述的装置,其中,所述生成模块包括:The apparatus of claim 12, wherein the generating module comprises:
    第二提取子模块,配置为基于预先获取的CNN提取第二图像的第二基础特征层; a second extraction submodule configured to extract a second basic feature layer of the second image based on the pre-acquired CNN;
    配置子模块,配置为为第二基础特征层分别配置匹配卷积层和模值卷积层;其中,匹配卷积层和模值卷积层使用的卷积核均为取自第一图像的归一化池化特征,归一化池化特征是对池化特征进行归一化处理得到的;a configuration submodule configured to respectively configure a matching convolution layer and a modulus convolution layer for the second basic feature layer; wherein the convolution kernel used by the matching convolution layer and the modulus convolution layer are all taken from the first image The normalized pooling feature is obtained by normalizing the pooling features;
    第二生成单元子模块,配置为根据匹配卷积层的输出和模值卷积层的输出之间的比值关系,得出第二图像的每个待匹配区域相对于第一图像的目标区域的匹配分值图。a second generating unit submodule configured to obtain, according to a ratio relationship between an output of the matching convolution layer and an output of the modulo convolution layer, obtaining a target region of the second image relative to the target region of the first image Match the score map.
  14. 根据权利要求13所述的装置,其中,所述第二提取子模块包括:The apparatus of claim 13 wherein said second extraction sub-module comprises:
    缩放单元,配置为按照第一图像对第二图像进行缩放处理,得到缩放处理后的第二图像;a scaling unit configured to perform a scaling process on the second image according to the first image to obtain a second image after the scaling process;
    提取单元,配置为基于预先获取的CNN提取缩放处理后的第二图像的第二基础特征层。And an extracting unit configured to extract a second basic feature layer of the second image after the scaling process based on the pre-acquired CNN.
  15. 根据权利要求13所述的装置,其中,所述配置子模块包括:The apparatus of claim 13 wherein said configuration sub-module comprises:
    第一配置单元,配置为基于池化层的窗口尺寸和窗口遍历颗粒度为第二基础特征层配置待匹配池化层,以根据待匹配池化层对第二基础特征层的输出按照池化层的窗口尺寸进行池化处理;The first configuration unit is configured to configure a pooling layer to be matched according to the window size of the pooling layer and the window traversal granularity, so as to be pooled according to the output of the pooled layer to be matched to the second basic layer The window size of the layer is pooled;
    第二配置单元,配置为根据归一化池化特征为待匹配池化层配置匹配卷积层,以根据匹配卷积层对待匹配池化层的输出按照归一化池化特征进行卷积处理;a second configuration unit, configured to configure a matching convolution layer for the pooled layer to be matched according to the normalized pooling feature, to perform convolution processing according to the normalized pooling feature according to the output of the matching convolution layer to be matched with the pooling layer ;
    第三配置单元,配置为基于模值运算对待匹配池化层配置模值计算层,以根据模值计算层对待匹配池化层的输出进行归一化处理;a third configuration unit, configured to configure a modulus calculation layer to be matched with the pooling layer based on the modulus value operation, to perform normalization processing on the output of the pooled layer to be matched according to the modulus calculation layer;
    第四配置单元,配置为根据归一化池化特征为模值计算层配置模值卷积层,以根据模值卷积层对模值计算层的输出按照归一化池化特征进行卷积处理。a fourth configuration unit configured to configure a modulus convolution layer for the modulus calculation layer according to the normalization pooling feature, to convolute the output of the modulus calculation layer according to the normalized pooling feature according to the modulus convolution layer deal with.
  16. 根据权利要求15所述的装置,其中,所述第二配置单元包括:The apparatus of claim 15, wherein the second configuration unit comprises:
    第一加孔子单元,配置为根据池化层的窗口尺寸和窗口遍历颗粒度之 间的差值运算结果对归一化池化特征进行加孔处理,得到加孔处理后的归一化池化特征;a first hole subunit configured to traverse the granularity according to a window size and a window of the pooling layer The result of the difference calculation is performed on the normalized pooling feature to obtain the normalized pooling feature after the hole processing;
    第一配置子单元,配置为根据加孔处理后的归一化池化特征为待匹配池化层配置匹配卷积层。The first configuration subunit is configured to configure a matching convolution layer for the pooled layer to be matched according to the normalized pooling feature after the hole processing.
  17. 根据权利要求15所述的装置,其中,所述第四配置单元包括:The apparatus of claim 15, wherein the fourth configuration unit comprises:
    第二加孔子单元,配置为根据池化层的窗口尺寸和窗口遍历颗粒度之间的差值运算结果对归一化池化特征进行加孔处理,得到加孔处理后的归一化池化特征;The second hole-adding sub-unit is configured to perform a hole-polishing process on the normalized pooling feature according to the difference between the window size of the pooling layer and the window traversing granularity, and obtain a normalized pooling after the hole processing feature;
    第二配置子单元,配置为根据加孔处理后的归一化池化特征为模值计算层配置模值卷积层。The second configuration subunit is configured to configure the modulus convolution layer for the modulus calculation layer according to the normalized pooling feature after the hole processing.
  18. 根据权利要求13所述的装置,其中,所述确定模块,还配置为选取匹配分值图中的最高分值对应的待匹配区域作为第二图像中的目标区域。The apparatus according to claim 13, wherein the determining module is further configured to select a to-be-matched region corresponding to the highest score in the matching score map as the target region in the second image.
  19. 一种存储介质,所述存储介质中存储有计算机可执行指令,所述计算机可执行指令用于执行权利要求1至9任一项所述的基于卷积神经网络的目标匹配方法。 A storage medium storing computer-executable instructions for performing the convolutional neural network-based target matching method according to any one of claims 1 to 9.
PCT/CN2017/077579 2016-08-26 2017-03-21 Convolutional neural network-based target matching method, device and storage medium WO2018036146A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610741539.1A CN106407891B (en) 2016-08-26 2016-08-26 Target matching method and device based on convolutional neural networks
CN201610741539.1 2016-08-26

Publications (1)

Publication Number Publication Date
WO2018036146A1 true WO2018036146A1 (en) 2018-03-01

Family

ID=58002442

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/077579 WO2018036146A1 (en) 2016-08-26 2017-03-21 Convolutional neural network-based target matching method, device and storage medium

Country Status (2)

Country Link
CN (1) CN106407891B (en)
WO (1) WO2018036146A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110147828A (en) * 2019-04-29 2019-08-20 广东工业大学 A kind of local feature matching process and system based on semantic information
CN110348411A (en) * 2019-07-16 2019-10-18 腾讯科技(深圳)有限公司 A kind of image processing method, device and equipment
CN110532414A (en) * 2019-08-29 2019-12-03 深圳市商汤科技有限公司 A kind of picture retrieval method and device
CN111445420A (en) * 2020-04-09 2020-07-24 北京爱芯科技有限公司 Image operation method and device of convolutional neural network and electronic equipment
CN112488126A (en) * 2020-11-30 2021-03-12 北京百度网讯科技有限公司 Feature map processing method, device, equipment and storage medium
CN112686269A (en) * 2021-01-18 2021-04-20 北京灵汐科技有限公司 Pooling method, apparatus, device and storage medium
CN115439673A (en) * 2022-11-10 2022-12-06 中山大学 Image feature matching method based on sector convolution neural network
CN117132590A (en) * 2023-10-24 2023-11-28 威海天拓合创电子工程有限公司 Image-based multi-board defect detection method and device

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106407891B (en) * 2016-08-26 2019-06-28 东方网力科技股份有限公司 Target matching method and device based on convolutional neural networks
CN108509961A (en) * 2017-02-27 2018-09-07 北京旷视科技有限公司 Image processing method and device
CN107452025A (en) * 2017-08-18 2017-12-08 成都通甲优博科技有限责任公司 Method for tracking target, device and electronic equipment
CN107657256A (en) * 2017-10-27 2018-02-02 中山大学 The more character locatings of image end to end and matching process based on deep neural network
CN108038502A (en) * 2017-12-08 2018-05-15 电子科技大学 Object collaborative detection method based on convolutional neural networks
CN110298214A (en) * 2018-03-23 2019-10-01 苏州启铭臻楠电子科技有限公司 A kind of stage multi-target tracking and classification method based on combined depth neural network
CN110322388B (en) * 2018-03-29 2023-09-12 上海熠知电子科技有限公司 Pooling method and apparatus, pooling system, and computer-readable storage medium
CN109255382B (en) * 2018-09-07 2020-07-17 阿里巴巴集团控股有限公司 Neural network system, method and device for picture matching positioning
CN109934342B (en) * 2018-12-28 2022-12-09 奥比中光科技集团股份有限公司 Neural network model training method, depth image restoration method and system
CN110197213B (en) * 2019-05-21 2021-06-04 北京航空航天大学 Image matching method, device and equipment based on neural network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104616032A (en) * 2015-01-30 2015-05-13 浙江工商大学 Multi-camera system target matching method based on deep-convolution neural network
CN104778464A (en) * 2015-05-04 2015-07-15 中国科学院重庆绿色智能技术研究院 Garment positioning and detecting method based on depth convolution nerve network
WO2016054779A1 (en) * 2014-10-09 2016-04-14 Microsoft Technology Licensing, Llc Spatial pyramid pooling networks for image processing
CN106407891A (en) * 2016-08-26 2017-02-15 东方网力科技股份有限公司 Target matching method based on convolutional neural network and device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9684960B2 (en) * 2014-01-25 2017-06-20 Pangea Diagnostics Limited Automated histological diagnosis of bacterial infection using image analysis
CN105701507B (en) * 2016-01-13 2018-10-23 吉林大学 Image classification method based on dynamic random pond convolutional neural networks
CN105718960B (en) * 2016-01-27 2019-01-04 北京工业大学 Based on convolutional neural networks and the matched image order models of spatial pyramid

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016054779A1 (en) * 2014-10-09 2016-04-14 Microsoft Technology Licensing, Llc Spatial pyramid pooling networks for image processing
CN104616032A (en) * 2015-01-30 2015-05-13 浙江工商大学 Multi-camera system target matching method based on deep-convolution neural network
CN104778464A (en) * 2015-05-04 2015-07-15 中国科学院重庆绿色智能技术研究院 Garment positioning and detecting method based on depth convolution nerve network
CN106407891A (en) * 2016-08-26 2017-02-15 东方网力科技股份有限公司 Target matching method based on convolutional neural network and device

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110147828A (en) * 2019-04-29 2019-08-20 广东工业大学 A kind of local feature matching process and system based on semantic information
CN110147828B (en) * 2019-04-29 2022-12-16 广东工业大学 Local feature matching method and system based on semantic information
CN110348411A (en) * 2019-07-16 2019-10-18 腾讯科技(深圳)有限公司 A kind of image processing method, device and equipment
CN110348411B (en) * 2019-07-16 2024-05-03 腾讯科技(深圳)有限公司 Image processing method, device and equipment
CN110532414B (en) * 2019-08-29 2022-06-21 深圳市商汤科技有限公司 Picture retrieval method and device
CN110532414A (en) * 2019-08-29 2019-12-03 深圳市商汤科技有限公司 A kind of picture retrieval method and device
CN111445420A (en) * 2020-04-09 2020-07-24 北京爱芯科技有限公司 Image operation method and device of convolutional neural network and electronic equipment
CN111445420B (en) * 2020-04-09 2023-06-06 北京爱芯科技有限公司 Image operation method and device of convolutional neural network and electronic equipment
CN112488126A (en) * 2020-11-30 2021-03-12 北京百度网讯科技有限公司 Feature map processing method, device, equipment and storage medium
CN112686269A (en) * 2021-01-18 2021-04-20 北京灵汐科技有限公司 Pooling method, apparatus, device and storage medium
CN115439673A (en) * 2022-11-10 2022-12-06 中山大学 Image feature matching method based on sector convolution neural network
CN115439673B (en) * 2022-11-10 2023-03-24 中山大学 Image feature matching method based on sector convolution neural network
CN117132590A (en) * 2023-10-24 2023-11-28 威海天拓合创电子工程有限公司 Image-based multi-board defect detection method and device
CN117132590B (en) * 2023-10-24 2024-03-01 威海天拓合创电子工程有限公司 Image-based multi-board defect detection method and device

Also Published As

Publication number Publication date
CN106407891A (en) 2017-02-15
CN106407891B (en) 2019-06-28

Similar Documents

Publication Publication Date Title
WO2018036146A1 (en) Convolutional neural network-based target matching method, device and storage medium
WO2019154262A1 (en) Image classification method, server, user terminal, and storage medium
US20200142928A1 (en) Mobile video search
US9100630B2 (en) Object detection metadata
US8686992B1 (en) Methods and systems for 3D shape matching and retrieval
US11195046B2 (en) Method and system for image search and cropping
CN105096377A (en) Image processing method and apparatus
CN107636639B (en) Fast orthogonal projection
CN111008935B (en) Face image enhancement method, device, system and storage medium
US20190197133A1 (en) Shape-based graphics search
US20150131873A1 (en) Exemplar-based feature weighting
KR20220004009A (en) Key point detection method, apparatus, electronic device and storage medium
Zhang et al. Retargeting semantically-rich photos
CN110007764B (en) Gesture skeleton recognition method, device and system and storage medium
CN111461196A (en) Method and device for identifying and tracking fast robust image based on structural features
JP4570995B2 (en) MATCHING METHOD, MATCHING DEVICE, AND PROGRAM
US9786030B1 (en) Providing focal length adjustments
Cheema et al. Disguised heterogeneous face recognition using deep neighborhood difference relational network
JP6393495B2 (en) Image processing apparatus and object recognition method
US11961249B2 (en) Generating stereo-based dense depth images
CN113221824B (en) Human body posture recognition method based on individual model generation
CN112348069B (en) Data enhancement method, device, computer readable storage medium and terminal equipment
CN113822871A (en) Target detection method and device based on dynamic detection head, storage medium and equipment
CN114548218A (en) Image matching method, device, storage medium and electronic device
CN112784807A (en) Sign language extraction method, computer-readable storage medium and electronic device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17842572

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17842572

Country of ref document: EP

Kind code of ref document: A1