CN110110847A

CN110110847A - A kind of depth based on attention accelerates the object localization method of intensified learning

Info

Publication number: CN110110847A
Application number: CN201910362771.8A
Authority: CN
Inventors: 王光耀; 王生生; 刘家运
Original assignee: Jilin University
Current assignee: Jilin University
Priority date: 2019-04-30
Filing date: 2019-04-30
Publication date: 2019-08-09
Anticipated expiration: 2039-04-30
Also published as: CN110110847B

Abstract

The invention discloses a kind of, and the depth based on attention accelerates the object localization method of intensified learning, comprising the following steps: step 1, the input picture into model, the model are divided into two sub-networks, are deeply learning network and attention network respectively；Step 2, model treatment image, it is divided into four-stage: the first stage, it is the training stage of deeply study, under intensified learning frame, target location tasks can be corresponded in three elements, should be accelerated the object localization method of intensified learning based on the depth of attention, and be added to attention network under original deeply learning framework；The data generated using intensified learning training process are trained attention network by this method, with this force vector that gains attention, deeply learning network DQN black box Study on Problems is converted to the whitepack problem for paying attention to force vector, while the control using attention mechanism optimization DQN to location positioning procedure herein.

Description

A kind of depth based on attention accelerates the object localization method of intensified learning

Technical field

The present invention relates to target location tasks technical field, specially a kind of depth based on attention accelerates intensified learning Object localization method.

Background technique

Target location tasks are generally decomposed into position and classification two sub-problems, and current main models are learned based on supervision Under the mode of habit, under the application of deep learning network technology, clarification of objective describes to achieve important breakthrough in performance, but Still regression problem is taken as to be handled in the determination of the position of target.Deeply learns to position the position of target and make It is a behaviour control problem to be handled, i.e. the observed region of manipulation is overlapped to determine target position with target area It sets.Compared with other methods for following certain principle to carry out position positioning, the target positioning based on deeply learning art Method has higher flexibility and high efficiency, and principle has more interpretation due to class human nature.In the feelings of sample distribution complexity Under condition, the target location model based on deeply learning art has better generalization ability.

But the characteristic of deeply learning art itself existing defects, required instruction in the stability of target positioning application Practice that the time is also longer, therefore designing a kind of depth based on attention to accelerate the object localization method of intensified learning is that extremely have must It wants.

Summary of the invention

The purpose of the present invention is to provide a kind of, and the depth based on attention accelerates the object localization method of intensified learning, with Solve the problems mentioned above in the background art.

In order to solve the above technical problem, the present invention provides following technical solutions: a kind of depth acceleration based on attention The object localization method of intensified learning, comprising the following steps:

Step 1, the input picture into model, the model are divided into two sub-networks, are deeply learning network respectively With attention network；

Step 2, model treatment image, is divided into four-stage:

1) first stage is the training stage of deeply study, and under intensified learning frame, target location tasks can quilt It corresponds in three elements, i.e. state State, movement Action, income Reward, learning training needed for deeply learns Be exactly controlling behavior policing parameter π；

State State carries out coding to observed region by depth convolutional neural networks CNNs and generates vector o；

Movement Action includes moving horizontally, vertically moving, scaling variation, the variation of width ratio, position determination；

Income Reward is used to measure the relativeness between observed region b and target actual area g；

IoU (b, g)=area (b ∩ g)/area (b ∪ g),

Reward is represented as Ra (s, s1)=sign (IoU (b1, g)-IoU (b, g))；

2) second stage, observed region propagate backward to attention network, to train the parameter of attention vector layer；

3) phase III, the attention network after being trained by meeting threshold value in Reward intercept pass in test image Infuse region；

4) fourth stage, region-of-interest are transmitted to deeply learning network and promote effect with rapid lock onto target region Rate.

According to the above technical scheme, in the step 1, deeply learning network DQN refers in intensified learning frame Under, coding dimensionality reduction is carried out to this high dimensional data of image using depth convolutional neural networks, extracts characteristics of image.

According to the above technical scheme, the step 2 1) in, in target location tasks, State represents observed area The characteristics of image in domain, Action represent the various control actions of the deformation to observed region, and Reward represents observation area Correlation between domain and target actual position.

According to the above technical scheme, the step 2 1) in, control strategy π is that controlled to search behavior is two The neural network of full articulamentum.

According to the above technical scheme, the step 2 1) in, the first stage, which uses, enforcement mechanisms.

According to the above technical scheme, the step 2 2) in,

1) attention network converts H × W × C size feature for image first with depth convolutional neural networks technology Figure；

2) then we with channel describe sub- p come the spatial information in coding characteristic figure, and expression formula is

3) next in order to which we utilize the weight a in these description informations establishment attention network_i=σ (W₂f(W₁p))；

4) next the attention weight in different channels is constructed as paying attention to trying hard to by we Here [tx；ty；Ts]=fCNet (Mi), fCNet () represents cutting function here, will Notice that the high attention rate region in trying hard to is cut out from input picture to come, for end-to-end operation, we are processed into two Tie up the form V (x of mask；Y)=VxVy, Vx=f (x-tx+0:5ts)-f (x-tx-0:5ts), Vy=f (y-ty+0:5ts)- F (y-ty-0:5ts), wherein f (x)=1/ (1+exp (- kx)), and region-of-interest is expressed as x ⊙ Vi, wherein x represents input figure Picture, i represent the index of regional area.

According to the above technical scheme, the b_cIndicate the feature on the C channel；C represents port number, and c represents C Channel；F () is used as activation primitive, a_iFor weight some portion of in association channel；Tx and ty represents region-of-interest center Transverse and longitudinal coordinate, ts represent the elongated of region-of-interest.

Compared with prior art, the beneficial effects obtained by the present invention are as follows being: extensive chemical should be accelerated based on the depth of attention The object localization method of habit is added to attention network under original deeply learning framework；This method will be using by force Chemistry is practised the data that training process generates and is trained to attention network, with this force vector that gains attention, herein will be deep Degree intensified learning network DQN black box Study on Problems is converted to the whitepack problem for paying attention to force vector, while using attention mechanism Optimize control of the DQN to location positioning procedure.

Detailed description of the invention

Attached drawing is used to provide further understanding of the present invention, and constitutes part of specification, with reality of the invention It applies example to be used to explain the present invention together, not be construed as limiting the invention.In the accompanying drawings:

Fig. 1 is overall flow schematic diagram of the invention；

In figure: 1, full articulamentum；2, pond layer；3, attention vector layer.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.

Referring to Fig. 1, the present invention provides a kind of technical solution: a kind of depth based on attention accelerates the mesh of intensified learning Mark localization method, comprising the following steps:

Step 1, the input picture into model, model are divided into two sub-networks, are deeply learning network and note respectively Meaning power network；

Step 2, model treatment image, is divided into four-stage:

IoU (b, g)=area (b ∩ g)/area (b ∪ g),

Reward is represented as Ra (s, s1)=sign (IoU (b1, g)-IoU (b, g))；

According to the above technical scheme, in step 1, deeply learning network DQN refers under intensified learning frame, benefit Coding dimensionality reduction is carried out to this high dimensional data of image with depth convolutional neural networks, extracts characteristics of image.

According to the above technical scheme, step 2 1) in, in target location tasks, region observed by State is represented Characteristics of image, Action represent the various control actions of the deformation to observed region, Reward represent viewing area with Correlation between target actual position.

According to the above technical scheme, step 2 1) in, control strategy π be search behavior is controlled be two and connect entirely Connect the neural network of layer.

According to the above technical scheme, step 2 1) in, the first stage, which uses, enforcement mechanisms.

According to the above technical scheme, step 2 2) in,

4) next the attention weight in different channels is constructed as paying attention to trying hard to by we Here [tx；ty；Ts]=fCNet (Mi), fCNet () represents cutting function here, will infuse Meaning try hard in high attention rate region cut out from input picture come, for end-to-end operation, we are processed into two dimension Form V (the x of mask；Y)=VxVy, Vx=f (x-tx+0:5ts)-f (x-tx-0:5ts), Vy=f (y-ty+0:5ts)-f (y-ty-0:5ts), wherein f (x)=1/ (1+exp (- kx)), and region-of-interest is expressed as x ⊙ Vi, wherein x represents input figure Picture, i represent the index of regional area.

According to the above technical scheme, b_cIndicate the feature on the C channel；C represents port number, and c represents C and leads to Road；F () is used as activation primitive, a_iFor weight some portion of in association channel；Tx and ty represents the cross at region-of-interest center Ordinate, ts represent the elongated of region-of-interest.

It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.

Finally, it should be noted that the foregoing is only a preferred embodiment of the present invention, it is not intended to restrict the invention, Although the present invention is described in detail referring to the foregoing embodiments, for those skilled in the art, still may be used To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features. All within the spirits and principles of the present invention, any modification, equivalent replacement, improvement and so on should be included in of the invention Within protection scope.

Claims

1. the object localization method that a kind of depth based on attention accelerates intensified learning, comprising the following steps: it is characterized by:

Step 1, the input picture into model, the model are divided into two sub-networks, are deeply learning network and note respectively Meaning power network；

Step 2, model treatment image, is divided into four-stage:

1) first stage is the training stage of deeply study, and under intensified learning frame, target location tasks can be corresponded to Into three elements, i.e. state State, movement Action, income Reward, learning training needed for deeply learns is just It is the policing parameter π of controlling behavior；

IoU (b, g)=area (b ∩ g)/area (b ∪ g),

Reward is represented as Ra (s, s1)=sign (IoU (b1, g)-IoU (b, g))；

3) phase III, the attention network after being trained by meeting threshold value in Reward intercept concern area in test image Domain；

4) fourth stage, region-of-interest are transmitted to deeply learning network with rapid lock onto target region raising efficiency.

2. a kind of depth based on attention according to claim 1 accelerates the object localization method of intensified learning, special Sign is: in the step 1, deeply learning network DQN refers under intensified learning frame, utilizes depth convolutional Neural Network carries out coding dimensionality reduction to this high dimensional data of image, extracts characteristics of image.

3. a kind of depth based on attention according to claim 1 accelerates the object localization method of intensified learning, special Sign is: the step 2 1) in, in target location tasks, the characteristics of image in region observed by State is represented, Action The various control actions of the deformation to observed region are represented, Reward is represented between viewing area and target actual position Correlation.

4. a kind of depth based on attention according to claim 1 accelerates the object localization method of intensified learning, special Sign is: the step 2 1) in, control strategy π is that is controlled search behavior is the nerve net of two full articulamentums Network.

5. a kind of depth based on attention according to claim 1 accelerates the object localization method of intensified learning, special Sign is: the step 2 1) in, the first stage, which uses, enforcement mechanisms.

6. a kind of depth based on attention according to claim 1 accelerates the object localization method of intensified learning, special Sign is: the step 2 2) in,

1) attention network converts H × W × C size characteristic pattern for image first with depth convolutional neural networks technology；

4) next the attention weight in different channels is constructed as paying attention to trying hard to Mi (x)=σ by weHere [tx；ty；Ts]=fCNet (Mi), fCNet () represents cutting function here, By pay attention to try hard in high attention rate region cut out from input picture come, for end-to-end operation, we are processed into Form V (the x of two-dimentional mask；Y)=VxVy, Vx=f (x-tx+0:5ts)-f (x-tx-0:5ts), Vy=f (y-ty+0: 5ts)-f (y-ty-0:5ts), wherein f (x)=1/ (1+exp (- kx)), and region-of-interest is expressed as x ⊙ Vi, wherein x represents defeated Enter image, i represents the index of regional area.

7. a kind of depth based on attention according to claim 6 accelerates the object localization method of intensified learning, special Sign is: the b_cIndicate the feature on the C channel；C represents port number, and c represents the C channel；F () is as activation Function, a_iFor weight some portion of in association channel；Tx and ty represents the transverse and longitudinal coordinate at region-of-interest center, and ts represents pass Infuse the elongated of region.