CN108447082A

CN108447082A - A kind of objective matching process based on combination learning Keypoint detector

Info

Publication number: CN108447082A
Application number: CN201810215020.9A
Authority: CN
Inventors: 夏春秋
Original assignee: Shenzhen Vision Technology Co Ltd
Current assignee: Shenzhen Vision Technology Co Ltd
Priority date: 2018-03-15
Filing date: 2018-03-15
Publication date: 2018-08-24

Abstract

A kind of objective matching process based on combination learning Keypoint detector proposed in the present invention, main contents include：Faster region convolutional neural networks, training, combined optimization, its process is, given two depth images with some attitude disturbances, two groups of suggestions are generated to each image, then carry out projection three-dimensional suggestion using known images of gestures, to establish face and negative pair, by these to passing to comparison loss, attempt to make the characteristic distance between face to minimize and make it is negative to the distance between maximize, introduce new fractional loss, adjustment region suggests the parameter of network, in the suggestion of the Area generation balloon score of depth map, find correspondence, to complete to match.The present invention proposes a kind of new sample level, can be with the corresponding label of in-time generatin part patch, while improving the accuracy of target correspondence, and qualitative and quantitative improvement has been carried out to the prior art.

Description

A kind of objective matching process based on combination learning Keypoint detector

Technical field

The present invention relates to object matching fields, more particularly, to a kind of three-dimensional based on combination learning Keypoint detector Target matching method.

Background technology

With the development of human society, object matching technology has become modern society's field of information processing, especially An especially important technology in image processing field.In image registration, camera calibration, target identification and image retrieval etc. In computer vision application, the key point that same object or scene are correctly matched from two images is a basis and important Work.Object matching technology be widely used in daily life, production, military activity, aerospace detection, target identification with In a series of field such as tracking, robot vision.Such as in field of image search, by user's input picture and database Image carries out matching search, and user can be helped to search the relevant information of target image content.In robot vision field, machine People is compared with stored database by identifying target object, can obtain object information, to preferably complete to refer to Determine task.The overall situation indicates it is common method the problems such as solving matching, retrieval, Attitude estimation and registration, can be with directly side Formula is trained in a manner of end to end, and disadvantage is susceptible to that the blocking of a large amount of confusions, partial view or scene block.

The present invention proposes a kind of objective matching process based on combination learning Keypoint detector, gives two tools The depth image for having some attitude disturbances is generated two groups of suggestions to each image, is then projected using known images of gestures Three-dimensional is suggested, to establish face and negative pair, by these to passing to comparison loss, it is intended to make characteristic distance between face most Smallization and make it is negative to the distance between maximize, introduce new fractional loss, adjustment region suggests the parameter of network, in depth The suggestion of the Area generation balloon score of figure, finds correspondence, to complete to match.The present invention proposes a kind of new sampling Layer, can be with the corresponding label of in-time generatin part patch, while improving the accuracy of target correspondence, is carried out to the prior art Qualitative and quantitative improvement.

Invention content

For be easy to be blocked by a large amount of confusions, partial view or scene are blocked the problem of, it is an object of the invention to A kind of objective matching process based on combination learning Keypoint detector is provided, giving two has some attitude disturbances Depth image generates two groups of suggestions to each image, then carrys out projection three-dimensional suggestion using known images of gestures, to establish Face and negative pair, by these to passing to comparison loss, it is intended to so that the characteristic distance between face is minimized and make to bear between Distance maximize, introduce new fractional loss, adjustment region suggests the parameter of network, in the Area generation high score of depth map Several suggestions, finds correspondence, to complete to match.

To solve the above problems, the present invention provides a kind of objective match party based on combination learning Keypoint detector Method, main contents include：

(1) faster region convolutional neural networks；

(2) training；

(3) combined optimization.

Wherein, the combination learning Keypoint detector uses modified faster region convolutional neural networks (R- CNN) carry out study-leading process；Specifically, two depth images with some attitude disturbances are given, first to each image Generate two groups of suggestions；Then, carry out projection three-dimensional suggestion using known images of gestures, to establish face and negative pair；In three dimensions Suggestion with small distance is considered as correspondence, therefore is marked as just；Then by these to passing to comparison loss, with Attempt to make the characteristic distance between face to minimize and make it is negative to the distance between maximize；In addition, introducing a kind of new point Number loss, it has adjusted the parameter that network (RPN) is suggested in the faster regions R-CNN, in the Area generation balloon score of depth map Suggestion, correspondence can be found always.

Wherein, the faster region convolutional neural networks, using faster R-CNN as the company with shared weight A part for body Model；Liang Ge branches are all connected to a layer for being responsible for searching correspondence, referred to as sample level；Use comparison Loss indicates to train, and there is the fractional loss in trained critical point detection stage in each branch.

Wherein, the training needs pairs of depth image { I for training pattern⁰,I¹, each depth image tool There is its camera posture information { g⁰,g¹And intrinsic camera parameter C；These can be by from multiple viewpoint renders three-dimensional models or making With RGB-D video sequences be registrated frame；In order to by depth image { I⁰,I¹By network, their depth value is normalized first For RGB ranges, and single channel is copied into triple channel image；Input g⁰,g¹, the rest part and depth image of C, value with Rice is unit D⁰,D¹It is directly delivered to sample level.

Further, the depth image, for each depth image, region suggests that network (RPN) can generate one group Score and area-of-interest (RoIs), using their barycenter as key point position.

Further, the area-of-interest, each RoI also determine the space of the feature calculation for current key point Range, and after the layer of the ponds RoI, obtain the expression of each key point；Retain highest t key points according to their score, And establish key pointCorresponding a pair of of the depth images of wherein m={ 0,1 }, It is the two-dimensional coordinate on the plane of delineation,It is the score for indicating key point significance degree,It is corresponding spy Sign vector；Then sample level receives key point barycenter and its feature group { x from two images⁰,x¹,f⁰,f¹}；Finally determine two Correspondence between the key point of width image.

Further, the correspondence between the key point of the determination two images, barycenter project to three-dimensional first In space；For each key pointIt is found according to Euclidean distance immediate in three dimensionsAnd form n-th pair Feature If distance is less than a small threshold value, it is marked as faceOtherwise will It is considered as negative pair

Wherein, the combined optimization, combination learning viewpoint invariant representation and Keypoint detector；It introduces more following Business loss function：

Wherein, L_cIt is the comparison loss slightly changed, to key point to carrying out operation and being optimized to expression, It is to use key point score with the fractional loss component of optimized detector, l ' is set of the feature to F ', λ_cAnd λ_sIt is weight ginseng Number.

Further, the feature pair is indicated since these features define one group to F ' using mark n N-th of feature pairComparison loss is defined as：

Wherein, v indicates back gauge, N_posAnd N_negIt is positive and negative couple of quantity (N=N respectively_pos+N_neg)；Each classification is to loss Contribution be all to be normalized according to its group, to explain face and the negative imbalance between；Fractional loss defines For：

Wherein,It is to come from image I^mI-th of key point label, value depending on key point belong to face or Negative pair, and γ is regularization parameter；Due to these to be by from each image choose a key point formed, and Each key point can only belong to one it is right, then | l⁰|=| l¹|=N.

Further, the purpose of the fractional loss, fractional loss is the correspondence maximized between two views Quantity；Optimization generates more positive key points as much as possible, and maximizes their score；If the score generated is very low, Only consider face and they are punished；Loss is standardized by the quantity of positive number, however, it is possible to which γ is excellent to adjust Change keypoint quantity and optimizes the tradeoff between score；Frame allows to adjust by adjusting three-dimensional distance threshold value in sample level Tradeoff during training between matching times and positioning accuracy；During backpropagation, by being stored during forward direction is transmitted Simultaneously propose to realize function backward in layer in region, the gradual change of each key point is transmitted to suitable in gradual change mapping in their position Work as position；For fractional loss, gradient predicts the convolutional layer of score by being responsible for.

Description of the drawings

Fig. 1 is a kind of system flow of the objective matching process based on combination learning Keypoint detector of the present invention Figure.

Fig. 2 is a kind of faster region of the objective matching process based on combination learning Keypoint detector of the present invention Convolutional neural networks framework.

Fig. 3 is a kind of present invention network training phase of the objective matching process based on combination learning Keypoint detector Between gradient reverse propagate.

Specific implementation mode

It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase It mutually combines, invention is further described in detail in the following with reference to the drawings and specific embodiments.

Fig. 1 is a kind of system flow of the objective matching process based on combination learning Keypoint detector of the present invention Figure.Include mainly faster region convolutional neural networks, training, combined optimization.

Combination learning Keypoint detector is guided using modified faster region convolutional neural networks (R-CNN) Learning process；Specifically, two depth images with some attitude disturbances are given, generating two to each image first sets up View；Then, carry out projection three-dimensional suggestion using known images of gestures, to establish face and negative pair；There is small distance in three dimensions Suggestion be considered as correspondence, therefore be marked as just；Then by these to passing to comparison loss, to attempt to make face Between characteristic distance minimize and make it is negative to the distance between maximize；In addition, introducing a kind of new fractional loss, it is adjusted The parameter of network (RPN) is suggested in the whole faster regions R-CNN, can be with in the suggestion of the Area generation balloon score of depth map Always correspondence is found.

Combined optimization, combination learning viewpoint invariant representation and Keypoint detector；Introduce following multitask loss function：

Since these features define one group to F ', n-th of feature pair is indicated using mark nComparison Loss is defined as：

Fig. 2 is a kind of faster region of the objective matching process based on combination learning Keypoint detector of the present invention Convolutional neural networks framework.Using faster R-CNN as a part for the connected with auxilliary crankshuft model with shared weight；Liang Ge branches are all It is connected to a layer for being responsible for searching correspondence, referred to as sample level；It is indicated to train using comparison loss, and each divided Zhi Douyou trains the fractional loss in critical point detection stage.

For training pattern, pairs of depth image { I is needed⁰,I¹, each depth image has its camera posture information {g⁰,g¹And intrinsic camera parameter C；These can by from multiple viewpoint renders three-dimensional models or using RGB-D video sequences with It is registrated frame；In order to by depth image { I⁰,I¹By network, their depth value is normalized to RGB ranges first, and will be single A channel is copied into triple channel image；Input g⁰,g¹, the rest part and depth image of C, value is using rice as unit D⁰,D¹Directly It is transmitted to sample level.

For each depth image, region suggests that network (RPN) can generate a number of components and area-of-interest (RoIs), makes Use their barycenter as key point position.

Each RoI also determines the spatial dimension of the feature calculation for current key point, and after the layer of the ponds RoI, Obtain the expression of each key point；Retain highest t key points according to their score, and establishes key point Corresponding a pair of of the depth images of wherein m={ 0,1 },It is that image is flat Two-dimensional coordinate on face,It is the score for indicating key point significance degree,It is corresponding feature vector；Then sample level from Key point barycenter and its feature group { x are received in two images⁰,x¹,f⁰,f¹}；Between the key point for finally determining two images Correspondence.

Barycenter projects in three dimensions first；For each key pointThree-dimensional space is found according to Euclidean distance Between in it is immediateAnd form n-th pair of featureIf distance is less than a small threshold value, it is marked as Face Otherwise it is regarded as negative pair

Fig. 3 is a kind of present invention network training phase of the objective matching process based on combination learning Keypoint detector Between gradient reverse propagate.The purpose of fractional loss is to maximize the quantity of the correspondence between two views；As much as possible Optimization generates more positive key points, and maximizes their score；If the score generated is very low, face is only considered and to it Punished；Loss standardized by the quantity of positive number, however, it is possible to γ come adjust optimize keypoint quantity with it is excellent Change the tradeoff between score；Frame allows by adjusting the three-dimensional distance threshold value in sample level come matching times during adjusting training Tradeoff between positioning accuracy；During backpropagation, by storing their position during forward direction is transmitted and in area Function backward is realized in domain proposal layer, and the gradual change of each key point is transmitted to the appropriate location in gradual change mapping；For score Loss, gradient predict the convolutional layer of score by being responsible for.

For those skilled in the art, the present invention is not limited to the details of above-described embodiment, in the essence without departing substantially from the present invention In the case of refreshing and range, the present invention can be realized in other specific forms.In addition, those skilled in the art can be to this hair Bright to carry out various modification and variations without departing from the spirit and scope of the present invention, these improvements and modifications also should be regarded as the present invention's Protection domain.Therefore, the following claims are intended to be interpreted as including preferred embodiment and falls into all changes of the scope of the invention More and change.

Claims

1. a kind of objective matching process based on combination learning Keypoint detector, which is characterized in that main includes faster Region convolutional neural networks (one)；Training (two)；Combined optimization (three).

2. based on the combination learning Keypoint detector described in claims 1, which is characterized in that using modified faster Region convolutional neural networks (R-CNN) carry out study-leading process；Specifically, two depth with some attitude disturbances are given Image generates two groups of suggestions to each image first；Then, carry out projection three-dimensional suggestion using known images of gestures, to establish Face and negative pair；The suggestion with small distance is considered as correspondence in three dimensions, therefore is marked as just；Then by these To passing to comparison loss, with attempt to make the characteristic distance between face to minimize and make it is negative to the distance between maximization；This Outside, a kind of new fractional loss is introduced, it has adjusted the parameter that network (RPN) is suggested in the faster regions R-CNN, in depth The suggestion of the Area generation balloon score of figure, can find correspondence always.

3. based on the faster region convolutional neural networks (one) described in claims 1, which is characterized in that use faster R- A parts of the CNN as the connected with auxilliary crankshuft model with shared weight；Liang Ge branches are all connected to one and are responsible for searching correspondence Layer, referred to as sample level；It is indicated using comparison loss to train, and there be point in trained critical point detection stage in each branch Number loss.

4. based on the training (two) described in claims 1, which is characterized in that for training pattern, need pairs of depth map As { I⁰,I¹, each depth image has its camera posture information { g⁰,g¹And intrinsic camera parameter C；These can by from Multiple viewpoint renders three-dimensional models use RGB-D video sequences and are registrated frame；In order to by depth image { I⁰,I¹By network, Their depth value is normalized to RGB ranges first, and single channel is copied into triple channel image；Input g⁰,g¹, C its Remaining part point and depth image, value is using rice as unit D⁰,D¹It is directly delivered to sample level.

5. based on the depth image described in claims 4, which is characterized in that for each depth image, network is suggested in region (RPN) number of components and area-of-interest (RoIs) can be generated, using their barycenter as key point position.

6. based on the area-of-interest described in claims 5, which is characterized in that each RoI, which is also determined, is used for current key point Feature calculation spatial dimension, and after the layer of the ponds RoI, obtain the expression of each key point；According to their score Retain highest t key points, and establishes key pointWherein m={ 0,1 } is right A pair of of depth image is answered,It is the two-dimensional coordinate on the plane of delineation,It is point for indicating key point significance degree Number,It is corresponding feature vector；Then sample level receives key point barycenter and its feature group { x from two images⁰,x¹, f⁰,f¹}；Finally determine the correspondence between the key point of two images.

7. based on the correspondence between the key point of the determination two images described in claims 6, which is characterized in that barycenter It projects in three dimensions first；For each key pointIt is found according to Euclidean distance immediate in three dimensionsAnd form n-th couple of feature F '_n=(f_i ⁰,f_j ¹)；If distance is less than a small threshold value, it is marked as face Otherwise it is regarded as negative pair

8. based on the combined optimization (three) described in claims 1, which is characterized in that combination learning viewpoint invariant representation and pass Key spot detector；Introduce following multitask loss function：

Wherein, L_cIt is the comparison loss slightly changed, to key point to carrying out operation and being optimized to expression,It is to make With key point score with the fractional loss component of optimized detector, l ' is set of the feature to F ', λ_cAnd λ_sIt is weight parameter.

9. based on the feature pair described in claims 8, which is characterized in that since these features define one group to F ', N-th of feature pair is indicated using mark nComparison loss is defined as：

Wherein, v indicates back gauge, N_posAnd N_negIt is positive and negative couple of quantity (N=N respectively_pos+N_neg)；Each tribute of the classification to loss Offering is normalized according to its group, to explain face and the negative imbalance between；Fractional loss is defined as：

Wherein,It is to come from image I^mI-th of key point label, value belongs to face or negative depending on key point It is right, and γ is regularization parameter；Due to these to be by from each image choose a key point is formed, and often A key point can only belong to one it is right, then | l⁰|=| l¹|=N.

10. based on the fractional loss described in claims 8, which is characterized in that the purpose of fractional loss is to maximize two to regard The quantity of correspondence between figure；Optimization generates more positive key points as much as possible, and maximizes their score；If The score of generation is very low, only considers face and punishes them；Loss is standardized by the quantity of positive number, but It is that adjust optimization keypoint quantity and the tradeoff between score can be optimized with γ；Frame allows by adjusting in sample level Three-dimensional distance threshold value is come the tradeoff during adjusting training between matching times and positioning accuracy；During backpropagation, pass through Their position is stored during positive transmit and proposes to realize function backward in layer in region, and the gradual change of each key point is passed It is delivered to the appropriate location in gradual change mapping；For fractional loss, gradient predicts the convolutional layer of score by being responsible for.