CN109376591B - Ship target detection method for deep learning feature and visual feature combined training - Google Patents

Ship target detection method for deep learning feature and visual feature combined training Download PDF

Info

Publication number
CN109376591B
CN109376591B CN201811050911.XA CN201811050911A CN109376591B CN 109376591 B CN109376591 B CN 109376591B CN 201811050911 A CN201811050911 A CN 201811050911A CN 109376591 B CN109376591 B CN 109376591B
Authority
CN
China
Prior art keywords
feature
layer
multiplied
size
traditional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811050911.XA
Other languages
Chinese (zh)
Other versions
CN109376591A (en
Inventor
邵振峰
吴文静
张瑞倩
王岭钢
李成源
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN201811050911.XA priority Critical patent/CN109376591B/en
Publication of CN109376591A publication Critical patent/CN109376591A/en
Application granted granted Critical
Publication of CN109376591B publication Critical patent/CN109376591B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/50Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a ship target detection method for deep learning characteristic and visual characteristic combined training, which comprises the following steps: sample data collection, CNN feature extraction, traditional invariant moment feature and LOMO feature extraction, feature dimension reduction and feature fusion network FCNN construction, finally, the network is trained by using the sample data, and the model is tested by using test data. Compared with the prior art, the visual feature extraction process comprehensively considers the characteristics of ship shapes, colors and textures, so that the detection process has interpretability, and other features except the traditional features can be studied in a CNN back propagation process. The method is rapid, efficient and high in accuracy, has a good detection result for complex scenes such as cloud, cloudy days, rainy days and the like, and is high in robustness. The method can extract the characteristics complementary with the traditional characteristics, has extremely high speed and can achieve the effect of real-time monitoring.

Description

Ship target detection method for deep learning feature and visual feature combined training
Technical Field
The invention belongs to the field of ship detection computer vision, and particularly relates to a ship target detection method for deep learning feature and visual feature combined training.
Background
China has wide coastlines, sea areas and abundant ocean resources, with the continuous development of economy, the number of ships on the sea is more and more, and the ship detection has urgent practical requirements. The ship target detection is to detect an interested ship target from an image by using computer vision and image processing technologies, further extract a large amount of useful information, and have wide application prospects in military and civil fields. For example, in the civil field, by acquiring information such as the position, size, driving direction, driving speed and the like of a ship, monitoring can be performed on a specific sea area and a bay port, monitoring can be performed on marine transportation, illegal fishing, illegal smuggling, illegal oil pollution dumping and the like, and the method has important significance on economic development, environmental protection, sea area use management, and ocean rights and interests maintenance.
In the modern society, video monitoring cameras are ubiquitous, multiple monitoring pictures can be displayed on a television wall of a monitoring center at the same time, and if the monitoring is only observed and detected by human eyes, abnormal events in the video can be easily missed. With the rapid development of computer networks, people increasingly favor to analyze video images obtained by a sensor by using computer vision instead of human eyes to obtain target information in the images. The picture target detection is generally divided into two steps: feature extraction and classifier classification location, there are two main categories of features used to ship detection: visual features and Convolutional Neural Network (CNN) extracted features.
(one) visual characteristics. More commonly used are visual features such as color, shape and texture.
(1) And (4) color characteristics. Since color is often quite related to an object or scene, color characterization is the most widely used visual feature. In addition, the color features have small dependence on the size, direction and visual angle of the image, and have high robustness. The more commonly used color characteristics are: color histogram and information entropy.
(2) And (4) shape characteristics. The shape feature describes the local nature of the target, and the reflected target shape information is not completely consistent with the visual perception of a human. The more commonly used shape characteristics are: area, aspect ratio, and moment-invariant. The invariant moment is a moment characteristic quantity which is still unchanged after the target is subjected to translation, rotation, scaling and scaling, and 7 geometric invariant moments (Hu) can be selected to represent the shape characteristics of the target area.
(3) And (4) texture features. The texture features describe surface properties of the object to which the image or image region corresponds. As a statistical feature, the texture feature has rotation invariance and has strong resistance to noise. However, when the resolution of the image changes, the calculated texture may have a large deviation. In addition, the texture reflected from the 2-D image is not necessarily the true texture of the surface of the 3-D object, as it may be affected by illumination and reflections. The gray level co-occurrence matrix is the most common texture feature and has strong adaptability and robustness.
(II) CNN characteristics
Natural images have their inherent properties, i.e. for some parts of the image, their statistical properties are the same as for other parts. This means that features learned in this part can also be used in another part, so that the same learned features can be used for all positions on the image. In other words, for the identification problem of the large-size image r × c (r is the number of rows and c is the number of columns), a small area a × b (a is the number of rows and b is the number of columns) is randomly selected from the image as a training sample, some features are learned from the small sample, and then the features are used as a filter to perform convolution operation with the original whole image, so that a feature map after convolution at any position in the original image is obtained. The method can automatically learn the characteristics of various targets to obtain the high-dimensional characteristics of the ship, and compared with the traditional method, the detection result precision is greatly improved.
However, the application of the conventional feature and the CNN feature to the ship detection has the following limitations:
(1) the traditional characteristics have excellent interpretability and controllability, and the detection result under a calm sea surface is good. However, when interference such as cloud shadow and sea wave exists, the false detection rate is high. And the speed of manually selecting the features is slow, which is not beneficial to practical application.
(2) The convolutional neural network can automatically learn the high-dimensional characteristics of the ship, and the detection speed is high. However, the black box type features have poor comprehensibility, and the feature retention degree of different vessels with different sizes after convolution is different, which can also cause inconsistency of detection effects of different vessels.
Disclosure of Invention
The technical problem solved by the invention is as follows: the defects of the prior art are overcome, and the ship target detection method for the deep learning characteristic and visual characteristic combined training is provided.
The technical scheme of the invention provides a ship target detection method for deep learning characteristic and traditional characteristic combined training, which comprises the following steps:
collecting sample data, namely collecting monitoring video frame data of a coastal area under visible light, extracting an image, and labeling the image containing a ship target;
step two, CNN characteristic extraction, which comprises inputting the obtained sample into a convolutional neural network for training to obtain a training result model of the ship target, and outputting the CNN characteristic by the convolutional neural network;
step three, traditional feature extraction, including the extracted invariant moment feature and LOMO feature of the ship target area;
fourthly, reducing dimension of the characteristics, namely connecting the invariant rectangular shape characteristics with the LOMO characteristics, and reducing dimension of the connected traditional characteristics by adopting a principal component analysis algorithm;
constructing a feature fusion network (FCNN) to map the CNN features and the traditional features to a uniform feature space;
and sixthly, training the feature fusion network FCNN by using the sample data, and verifying and testing the feature fusion network FCNN obtained by training by using the test data.
In the step I, the images containing the ship target are marked according to the standard of the PASCAL VOC data set, and the generated marking files are four vertex coordinates and corresponding images of the minimum enclosing rectangle of the ship target on each image, so that a ship image sample library is constructed.
And in the second step, a convolutional neural network based on a region is adopted, the convolutional neural network consists of a plurality of alternating convolutional layers, pooling layers and full-link layers, and a back propagation algorithm is adopted for updating.
In the second step, the convolutional neural network based on region is adopted, and the structure comprises the following steps,
1) a first layer: the size of a convolution kernel is 11 multiplied by 11, the size of max power convolution is 2 multiplied by 2, a BN layer is connected, and the size of an output feature map is 55 multiplied by 55;
2) a second layer: the size of a convolution kernel is 5 multiplied by 5, the size of max power convolution is 2 multiplied by 2, a BN layer is connected, and the size of an output feature map is 27 multiplied by 27;
3) and a third layer: the size of a convolution kernel is 3 multiplied by 3, the size of max power convolution is 2 multiplied by 2, a BN layer is connected, and the size of an output feature map is 13 multiplied by 13;
4) a fourth layer: the size of the convolution kernel is 3 multiplied by 3, and the size of the output feature map is 13 multiplied by 13;
5) and a fifth layer: the size of the convolution kernel is 3 multiplied by 3, and the size of the output feature map is 13 multiplied by 13;
6) two full-link layers FC7 and FC 8.
In the third step, the LOMO characteristic comprehensively considers the influence of illumination and visual angle change on the image, and firstly, a Retinex algorithm is adopted to preprocess the input image, so that the influence caused by illumination is reduced; secondly, aiming at the image preprocessed by the Retinex algorithm, extracting color features by applying an HSV color histogram; in addition, SILTP descriptors are applied to extract illumination invariant texture features of images.
And in the fifth step, a fusion layer and a regression layer are arranged in the feature fusion network FCNN, the input of the fusion layer is CNN features and traditional features, the number of ship types detected by the target is T, the output of the regression layer is a T multiplied by 1 vector, the value range of each line is 0 to 1, and the probability that the sample belongs to each class is represented.
Compared with the prior art, the invention has the following advantages and positive effects:
the characteristics of ship shape, color and texture are comprehensively considered in the traditional characteristic extraction process, so that the detection process has interpretability, and other characteristics except the traditional characteristics can be learnt in the CNN back propagation process. In addition, the Hu invariant moment features are only 7, and color histogram features HSV and scale invariant feature patterns (SILTP) used in Local maximum triggering (LOMO) features are also simpler to calculate, so that the overall calculation speed is not slowed down.
The CNN feature extraction part adopts a convolution neural network based on a region, and the method is rapid, efficient and high in accuracy. The method still has a good detection result for complex scenes such as cloud and fog, cloudy days, raining and the like, and has high robustness. The method can extract the characteristics complementary with the traditional characteristics, has extremely high speed and can achieve the effect of real-time monitoring.
The deep learning characteristic and the traditional characteristic are trained in a combined manner, so that on one hand, a classical ship detection operator can be utilized, the detection process is simplified, and the understanding is facilitated; on the other hand, joint training and feature complementation can fully automate the detection process, and the method does not need human-computer interaction and utilizes practical application.
Drawings
FIG. 1 is a general flow diagram of an embodiment of the present invention.
FIG. 2 is a flow chart of Hu invariant moment extraction in step (c) -a according to an embodiment of the present invention.
FIG. 3 is a flowchart of LOMO feature extraction in step c-b according to an embodiment of the present invention.
Fig. 4 is a structural diagram of a fusion network in step (v) of the embodiment of the present invention.
Detailed Description
For better understanding of the technical solutions of the present invention, the following detailed description of the present invention is made with reference to the accompanying drawings and examples.
Referring to fig. 1, a method provided by an embodiment of the invention includes the following steps:
firstly, collecting sample data.
The data required to be collected by the method is mainly coastal area monitoring video frame data under visible light. For the collected video data, each frame of image, which is 1920 × 1080 pixels in size, can be obtained by decoding and extracting in specific implementation. According to the standard of a Pascal data set (PASCAL VOC), the image containing the ship target is labeled, and the generated labeling file is the coordinates and the corresponding images of four vertexes of the minimum enclosing rectangle of the ship target on each picture, so that a ship image sample library is constructed.
② CNN characteristic extraction.
Unifying the size of the samples obtained in the step (i) to 224 multiplied by 224, and then inputting the samples into a convolutional neural network for training to obtain a training result model of the ship target. The convolutional neural network based on the region used in the embodiment of the invention comprises the following layer structure:
7) a first layer: the convolution kernel size is 11 multiplied by 11, the convolution size of max power is 2 multiplied by 2, a BN layer is connected, and the output feature map size is 55 multiplied by 55
8) A second layer: the convolution kernel size is 5 multiplied by 5, the convolution size of max power is 2 multiplied by 2, a BN layer is connected, and the output feature map size is 27 multiplied by 27
9) And a third layer: the convolution kernel size is 3 multiplied by 3, the convolution size of max power is 2 multiplied by 2, a BN layer is connected, and the output feature map size is 13 multiplied by 13
10) A fourth layer: the convolution kernel size is 3 multiplied by 3, and the output feature map size is 13 multiplied by 13
11) And a fifth layer: the convolution kernel size is 3 multiplied by 3, and the output feature map size is 13 multiplied by 13
6) Two full-connection layers FC7 and FC8
5 convolutional layers, 3 pooling layers (max power), 3 normalization layers (BN layers), and 2 full-link layers, and finally the output of one full-link layer FC8 is a 4096-dimensional vector, which is the CNN feature.
In specific implementation, the deep learning network is composed of a plurality of alternating convolution layers, a pooling layer and a full connection layer, and the network parameters are updated mainly by adopting a back propagation algorithm (BP algorithm), and the deep learning network is composed of an input layer, a plurality of hidden layers and an output layer. The layers are connected through different convolution modes. For a common convolution layer, the feature layer of the previous layer is convolved by a learnable convolution kernel, and then an output feature layer can be obtained through an activation function. Each output layer may be a combination of convolving the values of multiple input layers:
Figure GDA0002939992890000051
wherein M isjRepresenting a set of selected input layers, i being the index value of an input layer cell, j being the index value of an output layer cell,
Figure GDA0002939992890000052
representing the weight between the input layer and the output layer, i.e. the value at each position of the convolution kernel,
Figure GDA0002939992890000053
representing the additive offset between the layers, f () representing the activation function of the output layer,
Figure GDA0002939992890000054
the jth output layer representing the l layer,
Figure GDA0002939992890000055
the ith input layer represents the l-1 layers, l is used for identifying the ith convolutional layer, and represents convolution.
For the pooling layer, there are N input layers and N output layers, except that each output layer is smaller.
Figure GDA0002939992890000056
Wherein down () represents a down-sampling function. Typically, all pixels in different n × n regions of the input image are summed. Therefore, the output image is reduced by n times in two dimensions, and the value of n can be preset by a user during specific implementation. Each output layer corresponds to a multiplicative bias beta and an additive bias b,
Figure GDA0002939992890000057
denotes the first layerThe multiplicative offset for the j output layers,
Figure GDA0002939992890000058
represents the additive offset of the jth output layer of the ith layer,
Figure GDA0002939992890000059
the jth output layer representing the l layer,
Figure GDA00029399928900000510
the jth input layer representing the l-1 layer.
For the output fully connected layer, it is often better to convolve the input feature layers and sum the convolved values to obtain an output layer. Examples with alphaijIndicating the weight or contribution of the ith input layer in obtaining the jth output feature layer. Thus, the jth output layer can be represented as:
Figure GDA00029399928900000511
Figure GDA00029399928900000512
wherein,
Figure GDA00029399928900000513
representing the weight between the input layer and the output layer, i.e. the value at each position of the convolution kernel,
Figure GDA00029399928900000514
indicating the activation bias between the various layers,
Figure GDA00029399928900000515
the jth output layer representing the l layer,
Figure GDA00029399928900000516
j-th input layer, N, representing l-1 layerinRepresents the obtained j output layer result and NinThe input layers are related.
Traditional feature extraction.
Extracting traditional characteristics of the ship target area obtained in the step I, wherein the visual characteristics used by the invention comprise the following steps: hu invariant moment feature and LOMO feature, the implementation of the embodiment is as follows:
and a invariant moment belongs to shape characteristics and is a digital characteristic with translation, scaling and rotation invariance in an image. Fig. 2 is a flow chart of the extraction of the Hu invariant moment. Firstly, preprocessing an input image, wherein the preprocessing comprises two operations of median filtering smoothing and binarization, then performing region segmentation by using a Simple Linear Iterative Clustering (SLIC) segmentation algorithm, and finally calculating 7 Hu invariant moment features of each ship region. Smoothing, binarization and segmentation are prior art, and the present invention is not repeated herein. Assuming that the input image is discretized in a preprocessing stage into a digital image f (x, y), (x, y) of size M × N, representing the coordinates of the pixel points on the image, the geometrical moments of which are defined as:
Figure GDA0002939992890000061
wherein p is the order of the image in the x direction and q is the order of the image in the y direction. Set { mpqIs uniquely defined by f (x, y), whereas f (x, y) is also defined by mpqAnd (4) determining uniquely.
Central moment u of image f (x, y)pqIs defined as:
Figure GDA0002939992890000062
wherein x is0、y0The central coordinate of the image is calculated by the following formula:
Figure GDA0002939992890000063
wherein m is10、m01Is the 1 st order geometric moment of the image, m00Is the 0 th order geometrical moment of the image. It is thus possible to obtain a central moment of the image of order no more than 3 u, respectively00、u01、u10、u11、u20、u02、u12、u21、u30、u03
For a general gray scale image, the central moment has the following law:
1)u20and u02Is the moment of inertia of the area gray around the vertical and horizontal axes, respectively, through the center of the gray. If u20>u02Then the image is elongated in the horizontal direction; otherwise, the image is elongated in the vertical direction.
2)u30And u03Can be used to measure the symmetry of the object with respect to the vertical and horizontal axes, respectively. If u300, then the object is symmetric about the vertical axis; if u030, the object is symmetrical about the horizontal axis. For rotation and scale sensitivity, scale invariance can be obtained by normalization, and central moment eta is normalizedpqIs defined as:
Figure GDA0002939992890000064
Figure GDA0002939992890000065
wherein r is an intermediate variable, p is greater than or equal to 0, q is greater than or equal to 0, and p + q is greater than or equal to 2.
7 feature sets phi with translational, scaling and rotational invariance can be derived by using the central moments of order 2 and 31~Φ7
Φ1=η2002
Figure GDA0002939992890000071
Φ3=(η30-3η12)2+(3η2103)2
Φ4=(η3012)2+(η2103)2
Φ5=(η30-3η12)(η3012)[(η3012)2-(3η2103)2]+(3η2103)(η0321)[3(η3012)2-(η2103)2]
Φ6=(η2002)2[(η3012)2-(η2103)2]+4η113012)(η2103)
Figure GDA0002939992890000072
The b LOMO Feature, Local maximum entertainment Feature, is a combination of color and texture features that describe the ship in the picture from both color and camera perspectives.
Fig. 3 is a flow chart of the LOMO feature extraction, and an image enhanced Retinex algorithm is first adopted to pre-process an input image, so as to reduce the influence caused by illumination. The Retinex algorithm considers the color information of the picture, aims to output a color image which is close to human perception and rich in color, and particularly can enhance the detail information of a shadow area.
The preprocessed image is then split equally into 5 vertical stripes, and within each vertical stripe, a 20 x 20 Size (Size) sub-window is used to locate a partial block of the ship's area with an overlap of 10 pixels (Strip). The image is divided into 5 vertical strips, and then in each strip, the size of 20 × 20 sub-windows is used, strip is slid at 10, and n is the number of sub-windows. Taking the ship size of 1280 × 480 as an example, the size of each vertical stripe is 256 × 480, the number of sub-windows with the size of 20 × 20 in each vertical stripe is n, 25 × 47, 1175, the total number of sub-windows 1175 × 5 is 5875, and the specific number needs to be determined according to the size of the ship target.
Within each sub-window, two SILTP histograms (i.e., SILTP 0.34, 3 and SILTP 0.34, 5, 3 total) are extracted4One) and an 8 x 8 joint HSV histogram, each histogram representing the probability of occurrence of a pattern within a sub-window. SILTP improves LBP descriptors by introducing scale-invariant local contrast tolerance, achieving invariance to image scale variations and robustness to noise. Suppose the position of the pixel point in the sub-window is (x)c,yc) The SILTP is calculated in the following manner:
Figure GDA0002939992890000073
Figure GDA0002939992890000074
wherein, IcIs the gray value of the center pixel point of the sub-window, IqIs the gray value of the pixel point corresponding to the Q neighborhood with radius R,
Figure GDA0002939992890000081
connecting binary values of all neighborhoods into a character string, wherein t is a threshold range and st(Ic,Iq) A binary value representing a certain pixel position. Referring to fig. 3, SILTP 0.34, 3 in both directions indicates that texture features are extracted within 4 neighborhoods of radius 3 with a threshold of 0.3. Similarly, SILTP 0.34, 5 indicates that texture features are extracted within a 4-neighborhood of radius 5, with a threshold of 0.3.
Then all the sub-windows in the same vertical position are compared, and the maximum value in each type of histogram in the sub-windows is selected as the final histogram. The obtained histogram realizes invariance to the change of the view angle and simultaneously captures the local area characteristics of the ship target.
In the embodiment, the specific implementation is as follows:
1) color is an important feature that describes visible light images. However, since the illumination condition of the video camera installed in the coastal area is not controllable, the camera is differently set. Thus, the color between pictures may differ in different camera views. The invention comprises the following steps:
firstly, the Retinex algorithm is adopted to preprocess the input image, so that the influence caused by illumination is reduced. The Retinex algorithm considers the color information of the picture, aims to output a color image which is close to human perception and rich in color, and particularly can enhance the detail information of a shadow area.
Secondly, aiming at the picture preprocessed by the Retinex algorithm, extracting color features by applying an HSV color histogram; besides, SILTP (Scale Invariant Local texture pattern) descriptors are applied to extract the illumination Invariant texture features of the pictures. SILTP improves LBP descriptors by introducing scale-invariant local contrast tolerance, achieving invariance to image scale variations and robustness to noise.
2) Boats under different cameras will typically appear at different viewing angles, which can also present difficulties in boat detection. Therefore, the temperature of the molten metal is controlled,
the present invention uses a sliding window to describe the local details of the ship area. Specifically, the method comprises the following steps:
a partial block of the ship area is first located with an overlap of 10 pixels using a sub-window of size 20 x 20. Within each sub-window, two SILTP histograms are extracted (3)4One) and an 8 x 8 joint HSV histogram, each histogram representing the probability of occurrence of a pattern within a sub-window.
Then all the sub-windows in the same vertical position are compared, and the maximum value in each type of histogram in the sub-windows is selected as the final histogram. The obtained histogram realizes invariance to the change of the view angle and simultaneously captures the local area characteristics of the ship target.
The invention takes the target size of a ship as 1280 multiplied by 480 for example, the target size of 640 × 240 and 320 × 120 will also be obtained after scaling. By concatenating all features, the resulting final feature possesses (8 × 8 × 8 color histograms + 3)4X 2 SILTP histograms) x (127+63+31 vertical stripes 694 x 221 153,374 dimensions.
Fourthly, reducing the dimension of the characteristic.
And (4) connecting the invariant rectangular features and the LOMO features obtained in the step (three), wherein the dimension is very large, and the embodiment of the invention adopts a Principal Component Analysis (PCA) algorithm to reduce the dimension of the connected traditional features to 4096 dimensions. The principal component analysis algorithm is the prior art, and the present invention is not described in detail.
Constructing a feature fusion network.
In order to map the CNN features and the traditional features to a uniform feature space, the invention provides a feature fusion network (FCNN). FIG. 4 is a block diagram of a converged network in which deep learning hyper-parameters are updated in the back propagation process, subject to conventional characteristics. The fused features will be more distinctive than the CNN features alone and the traditional features.
The embodiments are specifically realized as follows in the following,
the FC7 and FC8 layers are output layers of the convolutional neural network, the output of the conventional features is also 4096-dimensional feature vectors, and the input of the fusion layer (i.e. FC9 layer) is CNN features and the conventional features:
x=[LOMO+Gu,CNNfeatures]
where x is the input to the fusion layer, LOMO is the local maximization feature, Hu is the invariant moment feature, and CNNFETURES are the convolutional neural network features. Output of fused layer (4096-dimensional) ZFusion(x) Comprises the following steps:
Figure GDA0002939992890000091
where h () represents the activation function, with the modified linear unit ReLU,
Figure GDA0002939992890000092
is a weight, bFusionIs an offset.
Assuming that the number of ship categories detected by the target is T, the output of the FC9 layer is a 4096 × 1-dimensional vector, the output of the softmax layer (regression layer) is a T × 1-dimensional vector, and the size of the value of each row ranges from 0 to 1, representing the probability that this sample belongs to each category. The calculation process from the FC9 layer to the Softmax layer is network training, and an optimal T x 4096 matrix is found, so that the loss of the Softmax layer is minimum. The calculation process is according to BP algorithm, the hyperparameter of the l layer after iteration is:
Figure GDA0002939992890000093
Figure GDA0002939992890000094
wherein,
Figure GDA0002939992890000095
representing the weight of the ith layer after the iteration,
Figure GDA0002939992890000096
representing the offset of the ith layer after iteration;
W(l)representing the weight of the ith layer before iteration, b(l)Representing the offset of the ith layer before iteration;
ΔW(l)representing the weight gradient of the ith layer after iteration, Δ b(l)Representing the offset gradient of the ith layer after iteration;
α represents the activation rate of the L-th layer, λ represents the learning rate of the L-th layer, and m represents the number of samples.
The loss function p (y ═ j | x; θ) is calculated as:
Figure GDA0002939992890000101
wherein,
y represents an output node of the network;
j represents the output value, i.e., the class number;
(ii) a Representing an input vector;
θ represents all model parameters, with the magnitude k (n + 1);
e represents a natural logarithm;
Figure GDA0002939992890000102
model parameters representing the jth category;
Figure GDA0002939992890000103
model parameters representing the kth class;
n represents the total number of categories;
k denotes the kth category.
The last layer of the network uses cross entropy loss:
Figure GDA0002939992890000104
wherein,
Pkand J is the cross entropy loss obtained after the probability output of each category is operated for the output of the last layer.
Sixthly, training the feature fusion network FCNN.
And training the feature fusion network FCNN by using the sample data, and verifying and testing the feature fusion network FCNN obtained by training by using the test data.
In the embodiment, 3500 training pictures and 3500 testing pictures are adopted, and after fusion network training is completed by using the training pictures, testing is performed by using the testing pictures. And inputting the detection picture into the trained model to obtain a result.
Therefore, the specific implementation process of the ship target detection method for deep learning characteristic and traditional characteristic combined training is introduced. In specific implementation, the process provided by the technical scheme of the invention can be automatically operated by a person skilled in the art by adopting a computer software technology.
The specific examples described herein are merely illustrative of the invention. Various modifications or additions may be made or substituted in a similar manner to the specific embodiments described herein by those skilled in the art without departing from the spirit of the invention or exceeding the scope thereof as defined in the appended claims.

Claims (6)

1. A ship target detection method for deep learning feature and traditional feature combined training is characterized by comprising the following steps:
collecting sample data, namely collecting monitoring video frame data of a coastal area under visible light, extracting an image, and labeling the image containing a ship target;
step two, CNN characteristic extraction, which comprises inputting the obtained sample into a convolutional neural network for training to obtain a training result model of the ship target, and outputting the CNN characteristic by the convolutional neural network;
step three, traditional feature extraction, including the extracted invariant moment feature and LOMO feature of the ship target area;
fourthly, reducing dimension of the characteristics, namely connecting the invariant rectangular shape characteristics with the LOMO characteristics, and reducing dimension of the connected traditional characteristics by adopting a principal component analysis algorithm;
constructing a feature fusion network (FCNN) to map the CNN features and the traditional features to a uniform feature space;
and sixthly, training the feature fusion network FCNN by using the sample data, and verifying and testing the feature fusion network FCNN obtained by training by using the test data.
2. The ship target detection method based on deep learning feature and traditional feature combined training as claimed in claim 1, wherein: in the first step, images containing ship targets are marked according to the standard of the PASCAL VOC data set, and generated marking files are coordinates and corresponding images of four vertexes of a minimum enclosing rectangle of the ship targets on each image, so that a ship image sample library is constructed.
3. The ship target detection method based on deep learning feature and traditional feature combined training as claimed in claim 1, wherein: and step two, adopting a convolution neural network based on the region, wherein the convolution neural network consists of a plurality of alternating convolution layers, pooling layers and full-link layers, and updating by adopting a back propagation algorithm.
4. The ship target detection method based on deep learning feature and traditional feature combined training as claimed in claim 3, wherein: in the second step, the adopted convolution neural network based on the region has the following structure,
1) a first layer: the size of a convolution kernel is 11 multiplied by 11, the size of max power convolution is 2 multiplied by 2, a BN layer is connected, and the size of an output feature map is 55 multiplied by 55;
2) a second layer: the size of a convolution kernel is 5 multiplied by 5, the size of max power convolution is 2 multiplied by 2, a BN layer is connected, and the size of an output feature map is 27 multiplied by 27;
3) and a third layer: the size of a convolution kernel is 3 multiplied by 3, the size of max power convolution is 2 multiplied by 2, a BN layer is connected, and the size of an output feature map is 13 multiplied by 13;
4) a fourth layer: the size of the convolution kernel is 3 multiplied by 3, and the size of the output feature map is 13 multiplied by 13;
5) and a fifth layer: the size of the convolution kernel is 3 multiplied by 3, and the size of the output feature map is 13 multiplied by 13;
6) two full-link layers FC7 and FC 8.
5. The ship target detection method based on deep learning feature and traditional feature combined training as claimed in claim 1, wherein: in the third step, the LOMO characteristic comprehensively considers the influence of illumination and visual angle change on the image, firstly, a Retinex algorithm is adopted to preprocess the input image, and the influence caused by illumination is reduced; secondly, aiming at the image preprocessed by the Retinex algorithm, extracting color features by applying an HSV color histogram; in addition, SILTP descriptors are applied to extract illumination invariant texture features of images.
6. The ship target detection method based on deep learning feature and traditional feature combined training as claimed in claim 1 or 2 or 3 or 4 or 5, wherein: in the fifth step, a fusion layer and a regression layer are arranged in the feature fusion network FCNN, the input of the fusion layer is CNN features and traditional features, the number of the ship types detected by the target is T, the output of the regression layer is a T multiplied by 1 vector, the value range of each line is 0 to 1, and the probability that the sample belongs to each class is represented.
CN201811050911.XA 2018-09-10 2018-09-10 Ship target detection method for deep learning feature and visual feature combined training Active CN109376591B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811050911.XA CN109376591B (en) 2018-09-10 2018-09-10 Ship target detection method for deep learning feature and visual feature combined training

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811050911.XA CN109376591B (en) 2018-09-10 2018-09-10 Ship target detection method for deep learning feature and visual feature combined training

Publications (2)

Publication Number Publication Date
CN109376591A CN109376591A (en) 2019-02-22
CN109376591B true CN109376591B (en) 2021-04-16

Family

ID=65405386

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811050911.XA Active CN109376591B (en) 2018-09-10 2018-09-10 Ship target detection method for deep learning feature and visual feature combined training

Country Status (1)

Country Link
CN (1) CN109376591B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110298271A (en) * 2019-06-17 2019-10-01 上海大学 Seawater method for detecting area based on critical point detection network and space constraint mixed model
CN110555465B (en) * 2019-08-13 2022-03-11 成都信息工程大学 Weather image identification method based on CNN and multi-feature fusion
CN111639513A (en) * 2019-12-10 2020-09-08 珠海大横琴科技发展有限公司 Ship shielding identification method and device and electronic equipment
CN111178165B (en) * 2019-12-12 2023-07-18 河南省润通路空一体交通发展有限公司 Automatic extraction method for air-to-ground target information based on small sample training video
CN111612028A (en) * 2019-12-13 2020-09-01 珠海大横琴科技发展有限公司 Ship feature optimization method and device based on deep learning and electronic equipment
CN111368690B (en) * 2020-02-28 2021-03-02 珠海大横琴科技发展有限公司 Deep learning-based video image ship detection method and system under influence of sea waves
CN112491854B (en) * 2020-11-19 2022-12-09 郑州迪维勒普科技有限公司 Multi-azimuth security intrusion detection method and system based on FCNN
CN113691940B (en) * 2021-08-13 2022-09-27 天津大学 Incremental intelligent indoor positioning method based on CSI image
TWI771250B (en) * 2021-12-16 2022-07-11 國立陽明交通大學 Device and method for reducing data dimension, and operating method of device for converting data dimension

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107292259A (en) * 2017-06-15 2017-10-24 国家新闻出版广电总局广播科学研究院 The integrated approach of depth characteristic and traditional characteristic based on AdaRank
CN107563303A (en) * 2017-08-09 2018-01-09 中国科学院大学 A kind of robustness Ship Target Detection method based on deep learning
WO2018067080A1 (en) * 2016-10-07 2018-04-12 Aselsan Elektronik Sanayi Ve Ticaret Anonim Sirketi A marine vessel identification method
CN108388904A (en) * 2018-03-13 2018-08-10 中国海洋大学 A kind of dimension reduction method based on convolutional neural networks and covariance tensor matrix

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018067080A1 (en) * 2016-10-07 2018-04-12 Aselsan Elektronik Sanayi Ve Ticaret Anonim Sirketi A marine vessel identification method
CN107292259A (en) * 2017-06-15 2017-10-24 国家新闻出版广电总局广播科学研究院 The integrated approach of depth characteristic and traditional characteristic based on AdaRank
CN107563303A (en) * 2017-08-09 2018-01-09 中国科学院大学 A kind of robustness Ship Target Detection method based on deep learning
CN108388904A (en) * 2018-03-13 2018-08-10 中国海洋大学 A kind of dimension reduction method based on convolutional neural networks and covariance tensor matrix

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
《S-CNN-BASED SHIP DETECTION FROM HIGH-RESOLUTION REMOTE SENSING IMAGES》;Ruiqian Zhang,et al;《The International Archives of the Photogrammetry,Remote Sensing and Spatial Information Sciences》;20161231;第XLI-B7卷;第423-430页 *
《基于多特征融合和深度学习的图像分类算法》;李爽;《基于多特征融合和深度学习的图像分类算法》;20180831;第46卷(第4期);第50-56页 *
《面向目标识别的多特征融合研究与实现》;张建虎;《中国优秀硕士学位论文全文数据库 信息科技辑》;20180615(第6期);第I138-1311页 *

Also Published As

Publication number Publication date
CN109376591A (en) 2019-02-22

Similar Documents

Publication Publication Date Title
CN109376591B (en) Ship target detection method for deep learning feature and visual feature combined training
He et al. A fully convolutional neural network for wood defect location and identification
CN111553929B (en) Mobile phone screen defect segmentation method, device and equipment based on converged network
CN109584248B (en) Infrared target instance segmentation method based on feature fusion and dense connection network
Li et al. SAR image change detection using PCANet guided by saliency detection
Yin et al. Hot region selection based on selective search and modified fuzzy C-means in remote sensing images
CN109684922B (en) Multi-model finished dish identification method based on convolutional neural network
CN113065558A (en) Lightweight small target detection method combined with attention mechanism
CN109871902B (en) SAR small sample identification method based on super-resolution countermeasure generation cascade network
CN109740665A (en) Shielded image ship object detection method and system based on expertise constraint
CN111626993A (en) Image automatic detection counting method and system based on embedded FEFnet network
CN110796009A (en) Method and system for detecting marine vessel based on multi-scale convolution neural network model
CN109919223B (en) Target detection method and device based on deep neural network
CN109635726B (en) Landslide identification method based on combination of symmetric deep network and multi-scale pooling
CN111598098A (en) Water gauge water line detection and effectiveness identification method based on full convolution neural network
CN111368742B (en) Reconstruction and identification method and system of double yellow traffic marking lines based on video analysis
CN109977834B (en) Method and device for segmenting human hand and interactive object from depth image
CN109726660A (en) A kind of remote sensing images ship identification method
CN112884795A (en) Power transmission line inspection foreground and background segmentation method based on multi-feature significance fusion
Li et al. Evaluation the performance of fully convolutional networks for building extraction compared with shallow models
Wang et al. Scattering Information Fusion Network for Oriented Ship Detection in SAR Images
CN115700737A (en) Oil spill detection method based on video monitoring
CN113011359A (en) Method for simultaneously detecting plane structure and generating plane description based on image and application
CN113011438A (en) Node classification and sparse graph learning-based bimodal image saliency detection method
CN110910497B (en) Method and system for realizing augmented reality map

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant