CN109376576A

CN109376576A - The object detection method for training network from zero based on the intensive connection of alternately update

Info

Publication number: CN109376576A
Application number: CN201810951609.5A
Authority: CN
Inventors: 年睿; 徐晓; 张世昌; 郝宝趁; 孔双
Original assignee: Ocean University of China
Current assignee: Ocean University of China
Priority date: 2018-08-21
Filing date: 2018-08-21
Publication date: 2019-02-22

Abstract

The present invention provides a kind of based on object detection method of the intensive connection from zero training network is alternately updated, and collection target image first makes data set, and is labeled；Carry out intensively being connected to again alternately update module, bounded can deformation convolution module, decouple based on the attention power module, spatial scaling feature pyramid module and variable segment that combine from the transverse and longitudinal of attention and channel attention the training of convolution module, obtain training pattern；Training image is obtained, data set is obtained, detection identification finally is carried out to target image using data set and training pattern.The present invention will also promote to realize that deformation, posture, dimensional variation is big and has the high-precision identification for blocking underwater movement objective efficiently from the development of zero training study mechanism from distorted movement target detection and follow under deep learning angle auxiliary water.

Description

The object detection method for training network from zero based on the intensive connection of alternately update

Technical field

The present invention relates to the alternatings being oriented to based on attention to update underwater distorted movement of the intensive connection from zero training network Object detection method belongs to Intelligent Information Processing and object detection and recognition technical field.

Background technique

Object detection and recognition is link important in vision system, and target detection technique is in video monitoring, intelligence machine The fields such as people's navigation, automatic Pilot, gesture recognition, Shape-memory behavior have broad application prospects.Object detection and recognition is to close Reason utilizes and protection marine resources, permanently effective multi-angle monitoring ocean, is also cultivation fishery, marine fishing and Fish behavior The offers basic data and information support such as analysis.However for underwater distorted movement target, since marine optics is at slice Part restricts, and vulnerable to illumination, visual angle, block, the factors such as form and dimensional variation influence so that pole occurs for the external appearance characteristic of target Big variation, to bring great challenge to image detection and identification.

In recent years, target detection achieves quantum jump, has benefited from deep learning --- mainly convolutional neural networks (Convolution Neural Network, CNN) and candidate region (Region Proposal) algorithm.Target detection and knowledge Other main stream approach includes: conventional target detection, deep learning target detection based on Region Proposal and based on returning The deep learning target detection of method.State-of-the-art target detection identification network is very dependent on this kind of big in Imagenet at present Classification task is trained to obtain feature extraction network and then training objective detection identification on type categorized data set in advance, but due to classification It will lead to study deviation with difference of the detection identification on loss function and classification, model fine tuning can alleviate this deviation but not This deviation can be fundamentally solved, and pre-training model is moved into the bigger detection identification neck of difference domain from classification task Domain is more difficult.The DSOD method proposed gives a very good solution thinking within 2017, utilizes the outstanding ladder of DenseNet Degree transmission mechanism can not depend on pre-training disaggregated model start from scratch training detection identification network.

Target detection is extremely important a part in computer vision field, blocks deformation, background between object Complexity, illumination variation, dimensional variation etc. are urgent problems to be solved in detection process.It is i.e. existing in the prior art mainly to ask Topic: (1) since marine optics image-forming condition restrict, and vulnerable to illumination, visual angle, block, the factors such as form and dimensional variation influence, So that the problem of greatly variation, occurs for the external appearance characteristic of target；(2) since image object lacks contextual information, in image object In the case where partial occlusion or deformation occurs, the problem of will lead to target detection mistake.

Summary of the invention

In view of the above-mentioned problems, the object of the present invention is to provide a kind of alternating based on attention guiding update intensive connection from The underwater distorted movement object detection and recognition method of zero training network, to realize for distorted movement target's feature-extraction, spy Sign refining, the unification of attention, it is intended to quickly excavate and detect in the underwater observation data low from magnanimity, high speed, value density Distorted movement target, to make up the deficiencies in the prior art.

In order to achieve the above objectives, the present invention adopts the following technical scheme:

A kind of underwater distorted movement target inspection that the intensive connection of alternating update based on attention guiding trains network from zero Survey and recognition methods, comprising the following steps:

(1) it collects target image and makes data set, and be labeled；

(2) it intensive connection alternating update module: in order to not depend on classification pre-training model from zero training, avoids classifying and examine Study deviation and data set cross-border issue caused by difference of the identification on loss function and classification are surveyed, especially for from waterborne Data set is transitioned into the problem of underwater data collection, using intensive connection alternating update module, relies on its outstanding gradient conveyer System, efficient feature extraction and feature refining effect, included attention effect realize feature extraction, feature refining, pay attention to The unification of power；

(3) bounded can deformation convolution module: in order to overcome the detection of deformation target to identify difficult problem, handing in intensive connection For use after update module bounded can deformation convolution module, can deformation convolution the limitation of receptive field is not present, there is study sense Target area can be effectively paid close attention to the modified receptive field of the variation of input picture by the effect of open country offset, and Can deformation convolution there is very strong adaptive faculty for target deformation, the detection effect of network will not restrict by target deformation；

(4) based on the attention power module combined from the transverse and longitudinal of attention and channel attention: in order to preferably optimize transmitting The intensive connection feature that alternately update module is extracted, using based on the attention combined from the transverse and longitudinal of attention and channel attention Module is laterally intended to pay close attention to the relationship between area-of-interest and hard objectives, is longitudinally intended to pay close attention to the important of different channel characteristics Property, reinforce important feature, weaken inessential feature, be characterized extraction module and convey superior feature, is drawn using attention force characteristic Lead feature transmission, dominant carry out characteristic optimization；

(5) spatial scaling feature pyramid module: accuracy of identification is detected to improve, in network header feature pyramid module It is middle to replace up-sampling using spatial scaling, the fusion between different resolution feature is carried out under the premise of not destructive characteristics, together When greatly reduce the calculation amount of network header；

(6) variable segment decouples convolution: to solve the problems, such as that the detection from zero training identifies that network convergence is slow and common Convolution is difficult to distinguish the problem of difference and class inherited in class, using the amplitude and angle of decoupling convolution, uses amplitude measure class Interior difference, angle measure class inherited, and direct study amplitude and angle is taken to replace being fitted；

(7) training image is obtained, and carries out scale amplification to it and hide at random to obtain { I_n, n=1,2 ..., N }；

(8) the target detection identification of model training: { I is used_n, n=1,2 ..., N and trained model to target Image carries out detection identification.

Further, the intensive connection alternating update module in the step (2):

X expression characteristic pattern, k represent the number alternately updated, and i represents i-th layer in certain alternating update, and W is represented wait instruct Experienced weight, * represent convolution operation, and g represents nonlinear function, m, and l is the index of summation sign.

Further, the bounded in the step (3) can deformation convolution module:

X indicates that characteristic pattern, p indicate the p of characteristic pattern X₀Position and p=p after the offset of position receptive field₀+ Δ p, wherein Δ p is the variable that can learn, and q indicates the position for four integral points being located at around p, and G (q, p) is the interpolation for asking p and q, is passed through The mode of interpolation finds out the characteristic pattern after offset.

Further, in the step (4) based on the attention mould combined from the transverse and longitudinal of attention and channel attention Block, wherein lateral attention:

F (x)=W_fx

G (x)=W_gx

y_i=γ o_i

Wherein, x is the characteristic pattern of input, and W is weight to be trained, and f, g, h is respectively three kinds for laterally paying attention to power module Feature extraction mode, γ are the significance level of lateral attention, are a trainable variable, y_iLaterally to pay attention to power module Output；

Longitudinal attention are as follows:

S=σ (W₂δ(W₁Z))

Wherein, Z is the feature in global average pond, and H, W are characterized the ranks size of figure, and U is to input longitudinal attention Characteristic pattern, σ, δ are nonlinear function, and W is weight to be trained, and s is longitudinal output for paying attention to power module；

Finally, lateral attention and longitudinal attention are merged:

Y=(X+y_i)*(1+s)

Wherein, X is the input that transverse and longitudinal pays attention to power module, and Y is the output that transverse and longitudinal pays attention to power module.

Further, the spatial scaling feature pyramid module in the step (5):

Wherein I is characterized figure, and before LR is spatial scaling, after SR is spatial scaling, r be the step-length of conversion, x, y, and c divides Not Wei column coordinate, row coordinate, depth coordinate.

Further, the variable segment in the step (6) decouples convolution:

Wherein, x is the characteristic pattern of input, and w is weight to be trained, and β, ρ are that can train variable, and E is to seek desired value.

Further, the size 7 for the characteristic pattern that the multiple dimensioned training method in the step (7) is finally obtained using network ~10 times be used as network inputs, this is conducive to network for the target detection precision of different scale, at the same to the pixel of image into Row blocks at random, and the detection accuracy of identification of network can be improved in this in the training process, promote the entirety of network attention target and It is not a certain position.

Beneficial effects of the present invention:

The present invention from distorted movement target detection and follow under deep learning angle auxiliary water, also will promote efficiently from The development of zero training study mechanism realizes that deformation, posture, dimensional variation is big and has the high-precision for blocking underwater movement objective to know Not.It can be inclined to avoid classifying and detecting study caused by difference of the identification on loss function and classification from zero training study mechanism Difference and data set cross-border issue, and the intensive connection alternating update module proposed also achieves feature extraction, feature refines, The unification of attention.Can deformation convolution sum transverse and longitudinal pay attention to power module effectively overcome deformation target detection know Not, while using the transmission of attention guide features, play the role of further refining feature.Spatial scaling feature pyramid module The fusion of different resolution feature can be carried out under the premise of not destructive characteristics, improve the precision of detection identification.Variable segment Decoupling convolution can solve that slow from the detection identification network convergence of zero training and common convolution is difficult to distinguish difference and class in class Between difference the problem of.Finally, multiple dimensioned and hiding at random training method can be improved network for target scale and block Robustness.

Detailed description of the invention

Fig. 1 is overall flow figure of the invention.

Fig. 2 is that marine environment is to be detected in embodiment 1 and identifies image.

Fig. 3 is intensive connection of the invention alternately update module figure.

Fig. 4 is that bounded of the invention can deformation convolution module figure.

Fig. 5 is the attention module map of the invention combined from the transverse and longitudinal of attention and channel attention.

Fig. 6 is spatial scaling feature pyramid module map of the invention.

Fig. 7 is multiple dimensioned and random hiding training method schematic diagram of the invention.

Fig. 8 is the detection recognition result figure in example 1.

Specific embodiment

To keep the purpose of the present invention, embodiment and advantage relatively sharp, with reference to the accompanying drawing and pass through specific embodiment It is next that present invention be described in more detail.

Embodiment 1: being detection identification object with the dynamic of ocean underwater environment Mesichthyes.

The specific flow chart of the present embodiment is as shown in Figure 1.

Specifically using one section as shown in Fig. 2, under the marine environment that Shandong Province's aquafarm is shot in the present embodiment In the daytime fish movement video (1920*1080 pixel, 25 frame per second) is as to be detected and identification video.

Following steps should be described in detail in conjunction with attached drawing and concrete outcome, and should be general in summary of the invention The step of condition.

Step 1: a large amount of fish image making data sets that will be taken in aquafarm, mark all fishes in image Position, type；

Step 2: as shown in figure 3, alternately update module realizes feature extraction, feature refining, attention by intensively connection Unification:

X indicate characteristic pattern, k represent alternately update number, i represent certain time alternately update in i-th layer, W represent to Trained weight, * represent convolution operation, and g represents nonlinear function.

Step 3: in order to overcome the detection of deformation target to identify difficult problem, it can deformation using bounded after step 2 Convolution module, as shown in figure 4, can deformation convolution be not present receptive field limitation, have the function of learn receptive field deviate, with The modified receptive field of the variation of input picture, can effectively pay close attention to target area, and can deformation convolution for target Deformation has very strong adaptive faculty, and the detection effect of network will not be restricted by target deformation, and formula is as follows:

Step 4: used after step 3 based on the attention power module combined from the transverse and longitudinal of attention and channel attention, As shown in figure 5, preferably to optimize the intensive connection feature that alternately update module is extracted of transmitting, wherein lateral attention:

F (x)=W_fx

G (x)=W_gx

y_i=γ o_i

Wherein x is the characteristic pattern of input, and W is weight to be trained, and f, g, h is respectively three kinds of spies for laterally paying attention to power module Extracting mode is levied, it is a trainable variable, y that γ, which is the significance level of lateral attention,_iLaterally to pay attention to the defeated of power module Out.

Longitudinal attention are as follows:

S=σ (W₂δ(W₁Z))

Wherein, Z is the feature in global average pond, and H, W are characterized the ranks size of figure, and U is to input longitudinal attention Characteristic pattern, σ, δ are nonlinear function, and W is weight to be trained, and s is longitudinal output for paying attention to power module.

Finally, lateral attention and longitudinal attention are merged:

Y=X+s*X+y_i

Step 5: three times by Module cycle of the step 2 into step 4, then using spatial scaling feature pyramid mould Block improves detection accuracy of identification, replaces up-sampling using spatial scaling in the network header feature pyramid module, is not breaking The fusion between different resolution feature is carried out under the premise of bad feature, while greatly reducing the calculation amount of network header.

Spatial scaling feature pyramid module, as shown in Figure 6:

Note: all convolution are all the variable segment decoupling convolution used in the present invention, are able to solve the detection from zero training The identification slow problem of network convergence and common convolution are difficult to distinguish the problem of difference and class inherited in class, use decoupling convolution Amplitude and angle, using difference in amplitude measure class, angle measures class inherited, direct study amplitude and angle to replace It is fitted.

Step 6: obtaining training image, and carry out scale amplification to it and hide at random to obtain { I_n, n=1,2 ..., N }, as shown in Figure 7；

Step 7: the target detection of model training identifies: using { I_n, n=1,2 ..., N } to what is be oriented to based on attention It alternately updates intensive connection to be trained from zero training network to obtain training pattern, using trained model to target image Carry out detection identification.

Detection, identification and statistics result is shown in Fig. 8, the interior fish to be identified of rectangle frame, and carries out above rectangle frame The display of target category and confidence level result, verified detection recognition result is essentially identical with legitimate reading, that is, illustrates this hair It is bright that the feasibility and high accuracy of detection method are provided.

Claims

1. a kind of object detection method for training network from zero based on the intensive connection of alternately update, which is characterized in that including following Step:

(1) it collects target image and makes data set, and be labeled；

(2) intensive connection alternating update module；

(3) bounded can deformation convolution module；

(4) based on the attention power module combined from the transverse and longitudinal of attention and channel attention；

(5) spatial scaling feature pyramid module；

(6) variable segment decouples convolution；

(7) training image is obtained, data set { I is obtained_n, n=1,2 ..., N }, obtain training pattern；

(8) the target detection identification of model training: { I is used_n, n=1,2 ..., N and training pattern to target image carry out Detection identification.

2. object detection method as described in claim 1, which is characterized in that the intensive connection alternating in the step (2) is more New module:

X expression characteristic pattern, k represent the number alternately updated, and i represents i-th layer in certain alternating update, and W represents to be trained Weight, * represent convolution operation, and g represents nonlinear function, m, and l is the index of summation sign.

3. object detection method as described in claim 1, which is characterized in that the bounded in the step (3) can deformation convolution Module:

X indicates that characteristic pattern, p indicate the p of characteristic pattern X₀Position and p=p after the offset of position receptive field₀+ Δ p, wherein Δ p be The variable that can learn, q indicate the position for four integral points being located at around p, and G (q, p) is the interpolation for asking p and q, passes through interpolation Mode finds out the characteristic pattern after offset.

4. object detection method as described in claim 1, which is characterized in that in the step (4) based on from attention and The attention power module that the transverse and longitudinal of channel attention combines, wherein lateral attention:

F (x)=W_fx

G (x)=W_gx

y_i=γ o_i

Wherein, x is the characteristic pattern of input, and W is weight to be trained, and f, g, h is respectively three kinds of features for laterally paying attention to power module Extracting mode, γ are the significance level of lateral attention, are a trainable variable, y_iLaterally to pay attention to the defeated of power module Out；

Longitudinal attention are as follows:

S=σ (W₂δ(W₁Z))

Wherein, Z is the feature in global average pond, and H, W are characterized the ranks size of figure, and U is the feature for inputting longitudinal attention Figure, σ, δ are nonlinear function, and W is weight to be trained, and s is longitudinal output for paying attention to power module；

Finally, lateral attention and longitudinal attention are merged:

Y=(X+y_i)*(1+s)

5. object detection method as described in claim 1, which is characterized in that the spatial scaling feature gold in the step (5) Word tower module:

Wherein I is characterized figure, and before LR is spatial scaling, after SR is spatial scaling, r be the step-length converted, x, y, and c is respectively Column coordinate, row coordinate, depth coordinate.

6. object detection method as described in claim 1, which is characterized in that the variable segment in the step (6) decouples volume Product:

7. object detection method as described in claim 1, which is characterized in that the multiple dimensioned training method in the step (7) 7~10 times of size of the characteristic pattern finally obtained using network are as network inputs.