CN106909924B - Remote sensing image rapid retrieval method based on depth significance - Google Patents

Remote sensing image rapid retrieval method based on depth significance Download PDF

Info

Publication number
CN106909924B
CN106909924B CN201710087670.5A CN201710087670A CN106909924B CN 106909924 B CN106909924 B CN 106909924B CN 201710087670 A CN201710087670 A CN 201710087670A CN 106909924 B CN106909924 B CN 106909924B
Authority
CN
China
Prior art keywords
image
training
network
layer
task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710087670.5A
Other languages
Chinese (zh)
Other versions
CN106909924A (en
Inventor
张菁
梁西
陈璐
卓力
耿文浩
李嘉锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN201710087670.5A priority Critical patent/CN106909924B/en
Publication of CN106909924A publication Critical patent/CN106909924A/en
Application granted granted Critical
Publication of CN106909924B publication Critical patent/CN106909924B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/05Underwater scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Library & Information Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)

Abstract

A remote sensing image rapid retrieval method based on depth saliency belongs to the field of computer vision, and particularly relates to technologies such as depth learning, saliency target detection and image retrieval. The invention takes the remote sensing image as a research object and researches a quick retrieval method of the remote sensing image by utilizing a deep learning technology. Firstly, a multi-task saliency target detection model is constructed by adopting a full-convolution neural network, a saliency detection task and a semantic segmentation task are simultaneously carried out on the model, and the depth saliency characteristics of the remote sensing image are learned in the network pre-training process. And then, improving a deep network structure, adding a hash layer fine tuning network, and learning to obtain a binary hash code of the remote sensing image. And finally, comprehensively utilizing the significance characteristics and the hash codes to measure the similarity. The method is practical and feasible for realizing accurate and efficient retrieval of the remote sensing image and has important application value.

Description

Remote sensing image rapid retrieval method based on depth significance
Technical Field
The invention takes a remote sensing image as a research object, and researches a rapid retrieval method of the remote sensing image by utilizing the latest research result in the field of artificial intelligence, namely a deep learning technology. Firstly, a multi-task saliency target detection model is constructed by adopting a full-convolution neural network, and the depth saliency characteristic of a remote sensing image is calculated; then improving a deep network structure, and adding a hash layer to learn to obtain a binary hash code; and finally, the remote sensing image is accurately and quickly retrieved by comprehensively utilizing the significance characteristics and the hash code. The invention belongs to the field of computer vision, and particularly relates to technologies of deep learning, salient target detection, image retrieval and the like.
Background
The remote sensing image data is used as basic data in three spatial Information technologies, namely, a Geographic Information System (GIS), a Global Positioning System (GPS) and a remote sensing mapping technology (RS), and is widely applied to various fields of environment monitoring, resource investigation, land utilization, urban planning, natural disaster analysis, military and the like. In recent years, with the development of high-resolution remote sensing satellite, imaging radar and Unmanned Aerial Vehicle (Unmanned Aerial Vehicle) technologies, remote sensing image data further show the characteristics of mass, complexity and high resolution, and the realization of efficient and accurate remote sensing image retrieval has important research significance and application value for promoting accurate extraction and data sharing of remote sensing image information.
Image Retrieval technology has evolved from early Text-Based Image Retrieval (TBIR) to Content-Based Image Retrieval (CBIR) by extracting Image features. The image retrieval method based on the saliency target can quickly select a few salient regions from a complex scene for priority processing, thereby effectively reducing the data processing complexity and improving the retrieval efficiency. Compared with the common image retrieval, the remote sensing image has complex and changeable information, small target and unobvious difference with the background, and if the traditional significance detection method is still adopted, the accurate description and analysis of the significance characteristics of the remote sensing image are difficult to realize. In recent years, with the recent research result in the field of artificial intelligence, deep learning techniques have been proposed, for example: the deep Neural Network represented by a Full Convolutional Neural Network (FCNN) shows excellent robustness in the aspect of learning of image depth saliency features by virtue of a unique convolution kernel similar to local perception of human eyes and a hierarchical cascade structure similar to biological nerves. The characteristic of weight sharing of the system also enables network parameters to be greatly reduced, meanwhile, the risk of overfitting training data is reduced, the system is easier to train than other types of deep networks, and the representing accuracy of the significant features can be improved.
In consideration of the problems that the number of remote sensing images is increasing day by day, the image semantic description capacity is limited and the like, the invention provides a remote sensing image fast retrieval method based on depth significance by taking a large-scale aerial image data set (AID), a Wuhan university remote sensing image data set (WHU-RS) and Google Earth remote sensing images as data sources. Firstly, a multitask saliency target detection model based on a Full Convolutional Neural Network (FCNN) is constructed, semantic information of different levels of a remote sensing image is learned on a pre-training data set and is used as a depth saliency feature and converted into a one-dimensional column vector. And further fine-tuning the neural network model, introducing a Hash layer, adding training samples, mapping the high-dimensional significance characteristics of the remote sensing image learned by the model to a low-dimensional space in the form of Binary Hash Codes (Binary Hash Codes), and respectively storing significance characteristic vectors and Hash Codes to construct a characteristic database. The method comprises the steps of extracting a salient feature vector and a Hash code of a remote sensing image to be inquired through a trained model, comparing a feature database, calculating Hamming Distance (Hamming Distance) and Euclidean Distance (Euclidean Distance) measurement similarity of the Hash code, and achieving quick retrieval of the remote sensing image.
Disclosure of Invention
The invention provides a remote sensing image fast retrieval method based on depth significance by utilizing a deep learning technology, which is different from the traditional remote sensing image retrieval method. Firstly, a multitask depth saliency target detection model is constructed by adopting a Full Convolutional Neural Network (FCNN), and the classification of a common Convolutional Neural Network (CNN) image level is further extended to the classification of a pixel level. The network is pre-trained on a large-scale aerial image data set (AID), the saliency detection task and the semantic segmentation task share a convolution layer, three layers of semantic information of the remote sensing image are comprehensively learned, feature redundancy is effectively removed, and depth saliency features are accurately extracted. Secondly, a hash layer is added into the model, a Wuhan university remote sensing image data set (WHU-RS) fine tuning neural network is expanded, the advantage of incremental learning is realized through a Stochastic Gradient Descent (SGD) algorithm by utilizing a deep neural network, binary hash codes are learned point by point, high-dimensional significance characteristic dimension reduction is realized, the storage space can be saved, and the retrieval efficiency can be improved. Meanwhile, compared with the traditional Hash method which needs to input training samples in pairs, the method adopted by the invention is easier to expand on a large-scale data set. The significance characteristics learned in the neural network pre-training and fine-tuning process are converted into a one-dimensional column vector, and a characteristic database is constructed together with the binary hash code. And finally, in the image retrieval stage, a coarse-to-fine retrieval strategy is adopted, and the Hamming distance and the Euclidean distance are measured by comprehensively utilizing the binary Hash codes and the significance characteristics, so that the remote sensing image can be quickly and accurately retrieved. The main process of the method is shown as attached figure 1 and can be divided into the following three steps: and constructing a target detection model based on the depth significance, pre-training a neural network, and adding hash layer fine adjustment and multi-level depth retrieval.
(1) Target detection model construction based on depth significance
In order to effectively extract the salient region of the image, the invention constructs a multitask salient object detection model based on a full convolution neural network. The model performs two tasks simultaneously: saliency detection and semantic segmentation. The saliency detection is used for learning the depth features of the remote sensing images and calculating the depth saliency, and the semantic segmentation is used for extracting the semantic information of the internal objects of the images, eliminating background confusion of the saliency map and supplementing missing parts of saliency targets.
(2) Neural network pre-training and adding hash layer fine tuning
The method selects a large-scale aerial image data set (AID) as a standard data set pre-training network. In order to enable the significance characteristics of significance target detection model learning to have better robustness for the retrieval of Chinese remote sensing images, 6050 Chinese remote sensing images with different illumination, shooting angles, resolutions and sizes are downloaded on the Google earth on the basis of a Wuhan university remote sensing image data set (WHU-RS), and the WHU-RS data set is expanded to 7000 images for fine tuning a neural network.
(3) Multi-level depth retrieval
The invention provides a coarse-to-fine retrieval scheme. The rough search measures similarity by hamming distance using a binary hash code learned by the hash layer. And the fine search maps the two-dimensional remote sensing image feature maps generated by the 13 th and 15 th layers of convolution layers into a one-dimensional array vector as a significant feature vector, and the similarity is measured through Euclidean distance. Using the ranking-based evaluation criteria, the Precision (Precision) of the search results is counted.
1. A remote sensing image fast retrieval method based on depth saliency is characterized by comprising the following steps:
step 1: target detection model construction based on depth significance
Inputting an RGB image, carrying out a series of convolution operations on the RGB image through 15 convolution layers, and then sharing the convolution layers by a saliency detection task and a superpixel target semantic segmentation task; initializing the first 13 convolutional layers by a convolutional neural network VGGNet, wherein the size of a convolutional core is 3 multiplied by 3, and a modified linear unit ReLU is adopted as an activation function after each convolutional layer; performing maximum pooling operation after the 2 nd, 4 th, 5 th and 13 th convolution layers; convolution kernel sizes of the 14 th convolution layer and the 15 th convolution layer are 7 x 7 and 1 x 1 respectively, and a Dropout layer is connected behind the 14 th convolution layer and the 15 th convolution layer;
constructing an deconvolution layer through upsampling, initializing parameters of the deconvolution layer through bilinear interpolation, and performing iterative updating in a training learning upsampling function; normalizing the output image to [0,1] through a sigmoid threshold function in a saliency target detection task, and learning saliency characteristics; in the semantic segmentation task, the feature map of the last convolutional layer is up-sampled by the anti-convolutional layer, and the up-sampling result is cut to make the size of the output image the same as that of the input image;
step 2: neural network pre-training and adding hash layer fine tuning
Step 2.1: multi-task significance target detection model pre-training
The FCNN pre-training is expanded together through a significance detection task and a segmentation task; χ represents N1A set of training images with width and height of W and Q respectively, wherein Xi is the ith image and YijkRepresenting the corresponding pixel level true segmentation map of the ith image with width height j and k, respectively, where i is 1 … N1J is 1 … W, k is 1 … Q; z represents N2Set of training images, ZnFor the nth image, N is 1 … N2It has corresponding true binary image M with significant objectn;θsTo share convolutional layer parameters, θhTo divide the task parameter, θfIs a significance task parameter; the formula (1) and the formula (2) are cross entropy cost functions J of the segmentation tasks respectively1(χ;θsh) And the squared Euclidean distance cost function J of the significance detection task2(Z;θsf) FCNN is trained by minimizing two cost functions:
Figure GDA0002557080360000041
Figure GDA0002557080360000042
in the formula (1), the first and second groups,
Figure GDA0002557080360000043
is an indicator function, hcjkIs an element (j, k) of the confidence segmentation map of class C, C1 … C, h (Xi; θ)sh) Is a semantic segmentation function, and returns confidence segmentation graphs of C target classes in total, wherein C is f (Z) in an image class formula (2) contained in a pre-training data setn;θsf) Is a saliency map output function, and F represents F-norm operation;
next, minimizing the cost function by using a random gradient descent SGD method on the basis of regularizing all training samples; as the data set used for pre-training does not have segmentation and significance labeling at the same time, the segmentation task and the significance detection task are performed alternately; the training process needs to normalize the sizes of all original images; the learning rate is 0.001 +/-0.01; the reference value of the momentum parameter is [0.9,1.0], and the reference value of the weight attenuation factor is 0.0005 plus or minus 0.0002; the random gradient descent learning process is carried out for more than 80000 iterations; the detailed pre-training process is as follows:
1) sharing full convolution parameters
Figure GDA0002557080360000044
Initializing based on VGGNet;
2) randomly initializing segmentation task parameters by normal distribution
Figure GDA0002557080360000045
And salient task parameters
Figure GDA0002557080360000046
3) According to
Figure GDA0002557080360000047
And
Figure GDA0002557080360000048
the SGD is used for training the segmentation network, and the two parameters are updated to
Figure GDA0002557080360000049
And
Figure GDA00025570803600000410
4) according to
Figure GDA00025570803600000411
And
Figure GDA00025570803600000412
training significance network by SGD and updating relevant parameters to
Figure GDA00025570803600000413
And
Figure GDA00025570803600000414
5) according to
Figure GDA00025570803600000415
And
Figure GDA00025570803600000416
training the segmentation network by using SGD to obtain
Figure GDA00025570803600000417
And
Figure GDA00025570803600000418
6) according to
Figure GDA00025570803600000419
And
Figure GDA00025570803600000420
training significance network by SGD and updating relevant parameters to
Figure GDA00025570803600000421
And
Figure GDA00025570803600000422
7) repeating the steps 3) -6) three times to obtain a pre-training final parameter thetas,θh,θf
Step 2.2: adding a hash layer to fine-tune the network for the target domain
Inserting a full connection layer containing s neurons, namely a hash layer H, between a pre-trained penultimate layer of the network and a final task layer, mapping high-dimensional features to a low-dimensional space, and generating a binary hash code for storage; the H weight of the Hash layer is initialized by constructing a Hash value through random projection, the neuron activation function adopts a sigmoid function to enable the output value to be between 0 and 1, and the number of the neurons is the code length of a target binary code;
the fine tuning process adjusts the network weight through a back propagation algorithm; network fine tuning is to adjust the network weight after the tenth convolutional layer; compared with the data set of the pre-training network, the data volume of the data set for the fine-tuning network can be reduced by 10-50%, compared with the pre-training network parameters, the iteration times and the learning rate of the network parameters in the fine-tuning process are reduced by 1-10%, and the momentum parameters and the weight attenuation factors are kept unchanged;
the detailed trimming process is as follows:
1) sharing full convolution parameters
Figure GDA0002557080360000051
Segmenting task parameters
Figure GDA0002557080360000052
And salient task parameters
Figure GDA0002557080360000053
Obtained through a pre-training process;
2) according to
Figure GDA0002557080360000054
And
Figure GDA0002557080360000055
the SGD is used for training the segmentation network, and the two parameters are updated to
Figure GDA0002557080360000056
And
Figure GDA0002557080360000057
3) according to
Figure GDA0002557080360000058
And
Figure GDA0002557080360000059
training significance network by SGD and updating relevant parameters to
Figure GDA00025570803600000510
And
Figure GDA00025570803600000511
4) according to
Figure GDA00025570803600000512
And
Figure GDA00025570803600000513
training the segmentation network by using SGD to obtain
Figure GDA00025570803600000514
And
Figure GDA00025570803600000515
5) according to
Figure GDA00025570803600000516
And
Figure GDA00025570803600000517
training significance network by SGD and updating relevant parameters to
Figure GDA00025570803600000518
And
Figure GDA00025570803600000519
6) repeating the above steps 3) -6) three times to obtain the final parameter thetas,θh,θf
And step 3: multi-level depth retrieval
Step 3.1: coarse search
Step 3.1.1: generating binary hash codes
An image I to be inquiredqInputting the data into a trimmed neural network, extracting the output of the hash layer as an image signature, and expressing the image signature by out (H); for each binary bit r being 1 … s, binary codes are obtained according to threshold value binary activation values;
Figure GDA00025570803600000520
(3)
where s is the number of neurons in the hash layer, and the initial value setting range is [40,100 ]];={I1,I2,…,ItDenotes a data set for retrieval containing t images; the corresponding binary code of each image is expressed asH={H1,H2,…,HtWhere m is 1 … t, Hm∈{0,1}sThe s-bit binary code values generated by the s neurons are respectively 0 or 1;
step 3.1.2: hamming distance metric similarity
The Hamming distance between two character strings with equal length is the number of different characters at the corresponding positions of the two character strings; for an image I to be inquiredqAnd its binary code HqIf H is presentqAnd HiHIf the hamming distance between the candidate pictures is less than the set threshold, a candidate pool P containing m candidate pictures is defined as { I ═ I }c1,Ic2,…,IcmAnd (4) considering that the two images are similar when the Hamming distance is less than 5;
step 3.2: fine search
Step 3.2.1: salient feature extraction
An image I to be inquiredqRespectively mapping the two-dimensional remote sensing image characteristic graphs generated by 13 th and 15 th layers of convolution layers of the neural network into one-dimensional vectors for storage; respectively comparing retrieval results adopting different feature vectors in a subsequent retrieval process to determine which layer of convolution generated feature map is finally selected to extract the salient features of the remote sensing image;
step 3.2.2: euclidean distance metric similarity
For a query image IqAnd a candidate pool P, using the extracted significant feature vector to select k images before ranking from the candidate pool P; vqAnd
Figure GDA0002557080360000061
representing query images q and I, respectivelyciThe feature vector of (2);definition IqAnd Euclidean distance s between the corresponding characteristic vectors of the ith image in the candidate pool PiAs the similarity level between them, as shown in formula (4);
Figure GDA0002557080360000062
the smaller the Euclidean distance is, the greater the similarity between the two images is; each candidate graph IciSorting in an ascending order according to the similarity with the query image, wherein the image of k before the ranking is a retrieval result;
step 3.3: evaluation of search results
Evaluating the retrieval result by using the ranking-based evaluation standard; for one query image q and k retrieval result images before the obtained ranking, Precision is calculated according to the following formula:
Figure GDA0002557080360000063
wherein Precision @ k represents the average accuracy rate from the first correct result to the kth correct result by setting a threshold k until the kth correct result is retrieved; rel (i) represents the correlation between the query image q and the ith image, and Rel (i) is equal to {0,1}, wherein 1 represents that the query image q and the ith image have the same classification, namely are correlated, and 0 is not correlated.
Compared with the prior art, the invention has the following obvious advantages and beneficial effects:
firstly, compared with the traditional method for manually extracting the remote sensing image characteristics, the method utilizes the full-convolution neural network to construct a depth significance target detection model, selects a training network of a remote sensing image database at home and abroad, comprehensively analyzes three-layer semantic information of the image, and automatically learns the significance characteristics of the remote sensing image. Meanwhile, the full convolution neural network is innovatively segmented and added to learn the depth significance of the remote sensing image, and the learned significance characteristics are effectively improved. Experiments prove that the model can be used for extracting a salient object with clear edges on a multi-target detection data set with a complex scene, such as a Microsoft COCO data set. The learning ability of the deep neural network can be further shifted to the learning of the salient features of the remote sensing images. Secondly, a hash layer is introduced into a full convolution neural network architecture, and binary hash codes are generated while the depth significance characteristics of the remote sensing image are learned, so that the storage space can be saved, and the subsequent retrieval efficiency can be improved. And finally, when image retrieval is carried out, a coarse-to-fine retrieval strategy is adopted, and binary hash codes and significance characteristics are comprehensively utilized to carry out similarity measurement. Experiments prove that a Hash layer is added into an AlexNet neural network, a multi-level retrieval strategy from coarse to fine is adopted, the accuracy of K similar images before the returned ranking, namely the topK precision, is counted in the retrieval of 250 ten thousand common images of different categories, when K is 1000, the topK precision can reach 88% on average, and the retrieval time is about 1 s. Therefore, the method is transferred to the remote sensing image retrieval, and has important application value for realizing accurate and efficient remote sensing image retrieval.
Description of the drawings:
FIG. 1 is a flow chart of a remote sensing image fast retrieval method based on depth saliency;
FIG. 2 is a diagram of a target detection model architecture based on depth saliency;
FIG. 3 is a diagram of a neural network architecture incorporating a hash layer;
fig. 4 is a diagram of a multi-level search process.
Detailed Description
In light of the above description, a specific implementation flow is as follows, but the scope of protection of this patent is not limited to this implementation flow.
Step 1: target detection model construction based on depth significance
The salient region is subjectively understood to be a region where human vision focuses attention, is closely related to a Human Visual System (HVS), and objectively, is a certain feature of an image, and a sub-region where the feature is most obvious exists. Therefore, the key to the saliency detection problem is feature learning and extraction. In view of the powerful functions of deep learning in this aspect, the invention uses the full convolution neural network for the saliency detection problem, and provides a multitask saliency target detection model based on the full convolution neural network. The model performs two tasks simultaneously: a saliency detection task and a semantic segmentation task. The saliency detection task is used for learning the depth features of the remote sensing images and calculating the depth saliency, and the semantic segmentation task is used for extracting semantic information of objects in the images, eliminating background confusion of the saliency maps and supplementing missing parts of saliency targets.
The full convolution neural network architecture provided by the invention is realized based on a mainstream open source deep learning framework Caffe, and the specific model structure is shown in an attached figure 2. Inputting an RGB image, carrying out a series of convolution operations through 15 convolution layers (Conv), and sharing the convolution layers by a saliency detection task and a superpixel target semantic segmentation task. The first 13 convolutional layers are initialized by convolutional neural network VGGNet, the size of convolutional kernel is 3 x 3, and each convolutional layer is followed by a modified linear unit (ReLU) as an activation function, so that the convergence speed is accelerated. And 2, carrying out maximum value pooling (Max Pooling) operation after the layers are coiled, reducing feature dimension, reducing calculation amount and ensuring feature invariance. Convolution kernels of the 14 th convolution layer and the 15 th convolution layer are 7 x 7 and 1 x 1 respectively, and a Dropout layer is connected after convolution of each convolution layer so as to solve the potential overfitting phenomenon of a complex neural network structure, namely the problem that the error rate is high and the generalization capability is poor in practical test due to the fact that a model learns noise and details in training data excessively. The deconvolution layer is constructed through upsampling, parameters of the deconvolution layer are initialized through bilinear interpolation, and iterative updating is carried out in training and learning an upsampling function. And (4) normalizing the output image to [0,1] through a sigmoid threshold function in a saliency target detection task, and learning a saliency characteristic. In the semantic segmentation task, the feature map of the last convolutional layer is up-sampled by the anti-convolutional layer, and the up-sampling result is clipped (Crop) to make the output image and the input image have the same size, so that a prediction is generated for each pixel, and the spatial information in the original input image is retained.
Step 2: neural network pre-training and adding hash layer fine tuning
The invention uses the disclosed large-scale aerial image data set (AID) for the pre-training of the neural network, and aims to better learn the semantic features of the remote sensing images at different levels. A Hash layer is introduced, and the network is further finely adjusted by using an expanded Wuhan university remote sensing image data set (WHU-RS), so that the high-dimensional features learned by the neural network can be mapped to the low-dimensional features, the retrieval time is shortened, and the features learned by the neural network have higher robustness.
Step 2.1: multi-task significance target detection model pre-training
Step 2.1.1: constructing a pre-training data set
The pre-training phase selects the published large-scale Aerial Image Dataset (AID) as a standard dataset for pre-training. AID contains 30 categories, 10000 images of taking photo by plane, and all images are selected from Google Earth and are marked by professional remote sensing technical field personnel. Each classified image is taken from different countries and regions, is shot by different shooting remote sensing detectors at different time, and has the image size of 600 multiplied by 600 pixels and the resolution ratio of 0.5 m/pixel to 8 m/pixel. Compared with other data sets, the data set has small intra-class difference and large inter-class difference, and is the data set with the largest scale in the current aerial image data set.
Step 2.1.2: salient object detection model pre-training
The FCNN pre-training is developed by the saliency detection task and the segmentation task together. χ represents N1A set of training images with width and height of W and Q respectively, wherein Xi is the ith image and YijkRepresenting the corresponding pixel level true segmentation map of the ith image with width height j and k, respectively, where i is 1 … N1J is 1 … W, k is 1 … Q. Z represents N2Set of training images, ZnFor the nth image, N is 1 … N2It has corresponding true binary image M with significant objectn。θsTo share convolutional layer parameters, θhTo divide the task parameter, θfIs a salient task parameter. The formula (1) and the formula (2) are cross entropy cost functions J of the segmentation tasks respectively1(χ;θsh) And the squared Euclidean distance cost function J of the significance detection task2(Z;θsf) FCNN is trained by minimizing two cost functions:
Figure GDA0002557080360000091
Figure GDA0002557080360000092
in the formula (1), the first and second groups,
Figure GDA0002557080360000093
is an indicator function, hcjkIs an element (j, k) of the confidence segmentation map of class C, C1 … C, h (Xi; θ)sh) The method comprises the steps of (1) returning confidence segmentation graphs of C target classes in total by a semantic segmentation function, wherein C is an image class contained in a pre-training data set, and 30 is selected in the method; in the formula (2), f (Z)n;θsf) Is a saliency map output function, and F represents the F-norm operation.
Since the training process requires normalization of all raw image sizes, the invention resets the raw image to 500 × 500 pixels for pre-training, the learning rate is an essential parameter of the SGD learning method, determines the weight update rate, sets too large to cause cost function oscillation, results over an optimal value, too small to cause too slow convergence, and generally tends to choose a smaller learning rate, such as 0.001 + -0.01 to keep the system stable, the momentum parameters and weight decay factors can improve training adaptivity, the momentum parameters are typically [0.9,1.0]The weight attenuation factor is typically 0.0005 ± 0.0002. Through experimental observation, the learning rate is set to 10 by the invention-10The momentum parameter is set to 0.99, and the weight attenuation factor is set to 0.0005 in the Caffe framework. The random gradient descent (SGD) learning process was accelerated by NVIDIA GTX 1080GPU device for a total of 80000 iterations.The detailed pre-training process is as follows:
1) sharing full convolution parameters
Figure GDA0002557080360000101
Initializing based on VGGNet;
2) randomly initializing segmentation task parameters by normal distribution
Figure GDA0002557080360000102
And salient task parameters
Figure GDA0002557080360000103
3) According to
Figure GDA0002557080360000104
And
Figure GDA0002557080360000105
the SGD is used for training the segmentation network, and the two parameters are updated to
Figure GDA0002557080360000106
And
Figure GDA0002557080360000107
4) according to
Figure GDA0002557080360000108
And
Figure GDA0002557080360000109
training significance network by SGD and updating relevant parameters to
Figure GDA00025570803600001010
And
Figure GDA00025570803600001011
5) according to
Figure GDA00025570803600001012
And
Figure GDA00025570803600001013
training the segmentation network by using SGD to obtain
Figure GDA00025570803600001014
And
Figure GDA00025570803600001015
6) according to
Figure GDA00025570803600001016
And
Figure GDA00025570803600001017
training significance network by SGD and updating relevant parameters to
Figure GDA00025570803600001018
And
Figure GDA00025570803600001019
7) repeating the steps 3) -6) three times to obtain a pre-training final parameter thetas,θh,θf
Step 2.2: adding a hash layer to fine-tune the network for the target domain
Step 2.2.1: construction of Chinese remote sensing image data set for fine tuning network
And selecting an expanded Wuhan university remote sensing image data set (WHU-RS) for neural network fine adjustment. The original WHU-RS dataset contains 19 scene classifications, and 950 remote sensing images with different resolutions, the image size is 600 x 600 pixels, and all the images are taken from Google Earth. Combining the landform of China, reconstructing and expanding to 7000 remote sensing images as a sample library on the basis of an original data set, wherein each category comprises more than 200 images. The newly added sample images are different in illumination, shooting angle, resolution and size, and therefore the significance characteristics of robustness are better benefited for neural network learning.
Step 2.2.2: joining hash layer trim networks
The feature vector generated by the deep neural network has high dimensionality and is very time-consuming in large-scale image retrieval. Because the image binary hash codes with similarity are similar, a full connection layer containing s neurons, namely a hash layer H, is inserted between the pre-trained penultimate layer of the network and the final task layer, high-dimensional features are mapped to a low-dimensional space, the binary hash codes are generated and stored, and the network structure is shown in figure 3. The H weight of the Hash layer is initialized by constructing a Hash value through random projection, the neuron activation function adopts a sigmoid function to enable the output value to be between 0 and 1, a threshold value is set to be 0.5 according to experience, and the number of neurons is the code length of a target binary code. The hash layer not only provides the feature abstraction of the previous layer, but also is a bridge connecting the semantic features of the intermediate and high-level images.
The fine tuning process adjusts the network weights through a Back Propagation (Back Propagation) algorithm. Network fine-tuning may be performed for the entire network or portions of the network. Because the learned features of the lower-level network structure are more generalized and overfitting is avoided, the invention emphasizes and adjusts the weight of the higher-level network, namely the network after the tenth convolutional layer, by utilizing the expanded WHU-RS data set. Generally, the data volume of the data set for the fine tuning network is reduced by 10% -50% compared with the data set of the pre-training, in the invention, the fine tuning network data set comprises 7000 images which are obviously smaller than the data set comprising 10000 images during the pre-training, compared with the pre-training network parameters, the network parameters in the fine tuning process are properly reduced, and the iteration times and the learning rate can be reduced by 1% -10%. In the invention, the iteration times are reduced to 8000 times in the fine tuning process, and the learning rate is reduced by 1 percent and is 10 percent-12The momentum parameter and the weight decay factor remain unchanged, i.e., set to 0.99 and 0.0005, respectively.
The detailed trimming process is as follows:
1) sharing full convolution parameters
Figure GDA0002557080360000111
Segmenting task parameters
Figure GDA0002557080360000112
And salient task parameters
Figure GDA0002557080360000113
Obtained through a pre-training process;
2) according to
Figure GDA0002557080360000114
And
Figure GDA0002557080360000115
the SGD is used for training the segmentation network, and the two parameters are updated to
Figure GDA0002557080360000116
And
Figure GDA0002557080360000117
3) according to
Figure GDA0002557080360000118
And
Figure GDA0002557080360000119
training significance network by SGD and updating relevant parameters to
Figure GDA00025570803600001110
And
Figure GDA00025570803600001111
4) according to
Figure GDA00025570803600001112
And
Figure GDA00025570803600001113
training the segmentation network by using SGD to obtain
Figure GDA00025570803600001114
And
Figure GDA00025570803600001115
5) according to
Figure GDA00025570803600001116
And
Figure GDA00025570803600001117
training significance network by SGD and updating relevant parameters to
Figure GDA00025570803600001118
And
Figure GDA00025570803600001119
6) repeating the above steps 3) -6) three times to obtain the final parameter thetas,θh,θf
And step 3: multi-level depth retrieval
The shallow part of the deep convolutional neural network learns the underlying visual features, while the deep part can capture image semantic information. Therefore, the invention adopts a coarse-to-fine retrieval strategy to realize rapid and accurate image retrieval. The feature extraction and retrieval process is shown in figure 4.
Step 3.1: coarse search
Firstly, a series of candidate areas with similar high-level semantic features are retrieved, namely, similar binary activation values are possessed in a hash layer, and then a similar image ranking is further generated according to a similarity measure.
Step 3.1.1: generating binary hash codes
An image I to be inquiredqThe output of the hash layer is extracted as an image signature, which is represented by out (H). For each bit r 1 … s, the binary code is obtained by binarizing the activation value according to the threshold value.
Figure GDA00025570803600001120
Wherein s is the number of neurons in the hash layer, overfitting can occur when the number is too large, and the initial value is suggested to be setThe setting range is [40,100 ]]The specific numerical value is adjusted according to the practical training data, and s is set to 48 in the invention. Is ═ I1,I2,…,InDenotes a data set for retrieval containing n images. The corresponding binary code of each image is expressed asH={H1,H2,…,HnWhere i is 1 … n, Hi∈{0,1}sThe s-bit binary code values representing the generation of s neurons are 0 or 1, respectively.
Step 3.1.2: hamming distance metric similarity
The hamming distance between two equal-length character strings is the number of different characters at the corresponding positions of the two character strings. For an image I to be inquiredqAnd its binary code HqIf H is presentqAnd HiHIf the hamming distance between the candidate pictures is less than the set threshold, a candidate pool P containing m candidate pictures is defined as { I ═ I }c1,Ic2,…,IcmGenerally, two images can be considered similar when the hamming distance is less than 5.
Step 3.2: fine search
Step 3.2.1: salient feature extraction
Because different convolution layers of the deep convolutional network learn semantic features of different levels of different images, the features learned by the middle and high-level convolution layers are more suitable for image retrieval tasks. Therefore, the image I to be inquiredqAnd respectively mapping the two-dimensional remote sensing image characteristic graphs generated by 13 th and 15 th layers of convolution layers of the neural network into one-dimensional vectors for storage. And respectively comparing retrieval results adopting different feature vectors in the subsequent retrieval process to determine which layer of convolution generated feature map is finally selected to extract the salient features of the remote sensing image.
Step 3.2.2: euclidean distance metric similarity
For a query image IqAnd a candidate pool P, wherein the k images before the ranking are selected from the candidate pool P by using the extracted significant feature vector. VqAnd
Figure GDA0002557080360000121
are respectively provided withRepresenting query images q and IciThe feature vector of (2). Definition IqAnd Euclidean distance s between the corresponding characteristic vectors of the ith image in the candidate pool PiAs the similarity level between them, as shown in formula (4).
Figure GDA0002557080360000122
The smaller the euclidean distance, the greater the similarity between the two images. Each candidate graph IciAnd sorting according to the similarity of the images to the query image in an ascending order, wherein the images at the top k in the ranking are retrieval results.
Step 3.3: evaluation of search results
The invention evaluates the retrieval result by using the ranking-based evaluation criterion. For one query image q and the k top ranked search result images obtained, Precision (Precision) is calculated according to the following formula:
Figure GDA0002557080360000123
wherein Precision @ k represents the average accuracy rate from the first correct result to the kth correct result by setting a threshold value k according to actual requirements until the kth correct result is retrieved; rel (i) represents the correlation between the query image q and the ith image, and Rel (i) is equal to {0,1}, wherein 1 represents that the query image q and the ith image have the same classification, namely are correlated, and 0 is not correlated.

Claims (1)

1. A remote sensing image fast retrieval method based on depth saliency is characterized by comprising the following steps:
step 1: target detection model construction based on depth significance
Inputting an RGB image, carrying out a series of convolution operations on the RGB image through 15 convolution layers, and then sharing the convolution layers by a saliency detection task and a superpixel target semantic segmentation task; initializing the first 13 convolutional layers by a convolutional neural network VGGNet, wherein the size of a convolutional kernel is 3 multiplied by 3, and a modified linear unit ReLU is adopted as an activation function after each convolutional layer; performing maximum pooling operation after the 2 nd, 4 th, 5 th and 13 th convolution layers; convolution kernel sizes of the 14 th convolution layer and the 15 th convolution layer are 7 x 7 and 1 x 1 respectively, and a Dropout layer is connected behind the 14 th convolution layer and the 15 th convolution layer;
constructing an deconvolution layer through upsampling, initializing parameters of the deconvolution layer through bilinear interpolation, and performing iterative updating in a training learning upsampling function; normalizing the output image to [0,1] through a sigmoid threshold function in a saliency target detection task, and learning saliency characteristics; in the semantic segmentation task, the feature map of the last convolutional layer is up-sampled by the anti-convolutional layer, and the up-sampling result is cut to make the size of the output image the same as that of the input image;
step 2: neural network pre-training and adding hash layer fine tuning
Step 2.1: multi-task significance target detection model pre-training
The FCNN pre-training is expanded together through a significance detection task and a segmentation task; χ represents N1A set of training images with width and height of W and Q respectively, wherein Xi is the ith image and YijkRepresenting the corresponding pixel level true segmentation map of the ith image with width height j and k, respectively, where i is 1 … N1J is 1 … W, k is 1 … Q; z represents N2Set of training images, ZnFor the nth image, N is 1 … N2It has corresponding true binary image M with significant objectn;θsTo share convolutional layer parameters, θhTo divide the task parameter, θfIs a significance task parameter; the formula (1) and the formula (2) are cross entropy cost functions J of the segmentation tasks respectively1(χ;θsh) And the squared Euclidean distance cost function J of the significance detection task2(Z;θsf) FCNN is trained by minimizing two cost functions:
Figure FDA0002557080350000011
Figure FDA0002557080350000012
in the formula (1), the first and second groups,
Figure FDA0002557080350000013
is an indicator function, hcjkIs an element (j, k) of the confidence segmentation map of class C, C1 … C, h (Xi; θ)sh) Is a semantic segmentation function, and returns confidence segmentation graphs of C target classes in total, wherein C is f (Z) in an image class formula (2) contained in a pre-training data setn;θsf) Is a saliency map output function, and F represents F-norm operation;
next, minimizing the cost function by using a random gradient descent SGD method on the basis of regularizing all training samples; as the data set used for pre-training does not have segmentation and significance labeling at the same time, the segmentation task and the significance detection task are performed alternately; the training process needs to normalize the sizes of all original images; the learning rate is 0.001 +/-0.01; the reference value of the momentum parameter is [0.9,1.0], and the reference value of the weight attenuation factor is 0.0005 plus or minus 0.0002; the random gradient descent learning process is carried out for more than 80000 iterations; the detailed pre-training process is as follows:
1) sharing full convolution parameters
Figure FDA0002557080350000021
Initializing based on VGGNet;
2) randomly initializing segmentation task parameters by normal distribution
Figure FDA0002557080350000022
And salient task parameters
Figure FDA0002557080350000023
3) According to
Figure FDA0002557080350000024
And
Figure FDA0002557080350000025
the SGD is used for training the segmentation network, and the two parameters are updated to
Figure FDA0002557080350000026
And
Figure FDA0002557080350000027
4) according to
Figure FDA0002557080350000028
And
Figure FDA0002557080350000029
training significance network by SGD and updating relevant parameters to
Figure FDA00025570803500000210
And
Figure FDA00025570803500000211
5) according to
Figure FDA00025570803500000212
And
Figure FDA00025570803500000213
training the segmentation network by using SGD to obtain
Figure FDA00025570803500000214
And
Figure FDA00025570803500000215
6) according to
Figure FDA00025570803500000216
And
Figure FDA00025570803500000217
training significance network by SGD and updating relevant parameters to
Figure FDA00025570803500000218
And
Figure FDA00025570803500000219
7) repeating the steps 3) -6) three times to obtain a pre-training final parameter thetas,θh,θf
Step 2.2: adding a hash layer to fine-tune the network for the target domain
Inserting a full connection layer containing s neurons, namely a hash layer H, between a pre-trained penultimate layer of the network and a final task layer, mapping high-dimensional features to a low-dimensional space, and generating a binary hash code for storage; the H weight of the Hash layer is initialized by constructing a Hash value through random projection, the neuron activation function adopts a sigmoid function to enable the output value to be between 0 and 1, and the number of the neurons is the code length of a target binary code;
the fine tuning process adjusts the network weight through a back propagation algorithm; network fine tuning is to adjust the network weight after the tenth convolutional layer; compared with the data set of the pre-training network, the data volume of the data set for the fine-tuning network can be reduced by 10-50%, compared with the pre-training network parameters, the iteration times and the learning rate of the network parameters in the fine-tuning process are reduced by 1-10%, and the momentum parameters and the weight attenuation factors are kept unchanged;
the detailed trimming process is as follows:
1) sharing full convolution parameters
Figure FDA00025570803500000220
Segmenting task parameters
Figure FDA00025570803500000221
And salient task parameters
Figure FDA00025570803500000222
Through a pre-training processTo;
2) according to
Figure FDA00025570803500000223
And
Figure FDA00025570803500000224
the SGD is used for training the segmentation network, and the two parameters are updated to
Figure FDA00025570803500000225
And
Figure FDA00025570803500000226
3) according to
Figure FDA00025570803500000227
And
Figure FDA00025570803500000228
training significance network by SGD and updating relevant parameters to
Figure FDA00025570803500000229
And
Figure FDA00025570803500000230
4) according to
Figure FDA00025570803500000231
And
Figure FDA00025570803500000232
training the segmentation network by using SGD to obtain
Figure FDA00025570803500000233
And
Figure FDA00025570803500000234
5) according to
Figure FDA00025570803500000235
And
Figure FDA00025570803500000236
training significance network by SGD and updating relevant parameters to
Figure FDA00025570803500000237
And
Figure FDA00025570803500000238
6) repeating the above steps 3) -6) three times to obtain the final parameter thetas,θh,θf
And step 3: multi-level depth retrieval
Step 3.1: coarse search
Step 3.1.1: generating binary hash codes
An image I to be inquiredqInputting the data into a trimmed neural network, extracting the output of the hash layer as an image signature, and expressing the image signature by out (H); for each binary bit r being 1 … s, binary codes are obtained according to threshold value binary activation values;
Figure FDA0002557080350000031
where s is the number of neurons in the hash layer, and the initial value setting range is [40,100 ]];={I1,I2,…,ItDenotes a data set for retrieval containing t images; the corresponding binary code of each image is expressed asH={H1,H2,…,HtWhere m is 1 … t, Hm∈{0,1}sThe s-bit binary code values generated by the s neurons are respectively 0 or 1;
step 3.1.2: hamming distance metric similarity
Different characters with Hamming distance between two equal-length character strings being corresponding positions of the two character stringsThe number of symbols; for an image I to be inquiredqAnd its binary code HqIf H is presentqAnd HiHIf the hamming distance between the candidate pictures is less than the set threshold, a candidate pool P containing m candidate pictures is defined as { I ═ I }c1,Ic2,…,IcmAnd (4) considering that the two images are similar when the Hamming distance is less than 5;
step 3.2: fine search
Step 3.2.1: salient feature extraction
An image I to be inquiredqRespectively mapping the two-dimensional remote sensing image characteristic graphs generated by 13 th and 15 th layers of convolution layers of the neural network into one-dimensional vectors for storage; respectively comparing retrieval results adopting different feature vectors in a subsequent retrieval process to determine which layer of convolution generated feature map is finally selected to extract the salient features of the remote sensing image;
step 3.2.2: euclidean distance metric similarity
For a query image IqAnd a candidate pool P, using the extracted significant feature vector to select k images before ranking from the candidate pool P; vqAnd
Figure FDA0002557080350000032
representing query images q and I, respectivelyciThe feature vector of (2); definition IqAnd Euclidean distance s between the corresponding characteristic vectors of the ith image in the candidate pool PiAs the similarity level between them, as shown in formula (4);
Figure FDA0002557080350000033
the smaller the Euclidean distance is, the greater the similarity between the two images is; each candidate graph IciSorting in an ascending order according to the similarity with the query image, wherein the image of k before the ranking is a retrieval result;
step 3.3: evaluation of search results
Evaluating the retrieval result by using the ranking-based evaluation standard; for one query image q and k retrieval result images before the obtained ranking, Precision is calculated according to the following formula:
Figure FDA0002557080350000041
wherein Precision @ k represents the average accuracy rate from the first correct result to the kth correct result by setting a threshold k until the kth correct result is retrieved; rel (i) represents the correlation between the query image q and the ith image, and Rel (i) is equal to {0,1}, wherein 1 represents that the query image q and the ith image have the same classification, namely are correlated, and 0 is not correlated.
CN201710087670.5A 2017-02-18 2017-02-18 Remote sensing image rapid retrieval method based on depth significance Active CN106909924B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710087670.5A CN106909924B (en) 2017-02-18 2017-02-18 Remote sensing image rapid retrieval method based on depth significance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710087670.5A CN106909924B (en) 2017-02-18 2017-02-18 Remote sensing image rapid retrieval method based on depth significance

Publications (2)

Publication Number Publication Date
CN106909924A CN106909924A (en) 2017-06-30
CN106909924B true CN106909924B (en) 2020-08-28

Family

ID=59207582

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710087670.5A Active CN106909924B (en) 2017-02-18 2017-02-18 Remote sensing image rapid retrieval method based on depth significance

Country Status (1)

Country Link
CN (1) CN106909924B (en)

Families Citing this family (66)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107291945B (en) * 2017-07-12 2020-03-31 上海媒智科技有限公司 High-precision clothing image retrieval method and system based on visual attention model
CN107463932B (en) * 2017-07-13 2020-07-10 央视国际网络无锡有限公司 Method for extracting picture features by using binary bottleneck neural network
US11270194B2 (en) * 2017-07-26 2022-03-08 International Business Machines Corporation System and method for constructing synaptic weights for artificial neural networks from signed analog conductance-pairs of varying significance
CN107392925B (en) * 2017-08-01 2020-07-07 西安电子科技大学 Remote sensing image ground object classification method based on super-pixel coding and convolutional neural network
CN107480261B (en) * 2017-08-16 2020-06-16 上海荷福人工智能科技(集团)有限公司 Fine-grained face image fast retrieval method based on deep learning
CN109410211A (en) * 2017-08-18 2019-03-01 北京猎户星空科技有限公司 The dividing method and device of target object in a kind of image
CN109657522A (en) * 2017-10-10 2019-04-19 北京京东尚科信息技术有限公司 Detect the method and apparatus that can travel region
CN107729992B (en) * 2017-10-27 2020-12-29 深圳市未来媒体技术研究院 Deep learning method based on back propagation
US11232344B2 (en) * 2017-10-31 2022-01-25 General Electric Company Multi-task feature selection neural networks
CN108090117B (en) * 2017-11-06 2019-03-19 北京三快在线科技有限公司 A kind of image search method and device, electronic equipment
WO2019136591A1 (en) * 2018-01-09 2019-07-18 深圳大学 Salient object detection method and system for weak supervision-based spatio-temporal cascade neural network
CN108446312B (en) * 2018-02-06 2020-04-21 西安电子科技大学 Optical remote sensing image retrieval method based on deep convolution semantic net
CN108257139B (en) * 2018-02-26 2020-09-08 中国科学院大学 RGB-D three-dimensional object detection method based on deep learning
CN108427738B (en) * 2018-03-01 2022-03-25 中山大学 Rapid image retrieval method based on deep learning
CN108287926A (en) * 2018-03-02 2018-07-17 宿州学院 A kind of multi-source heterogeneous big data acquisition of Agro-ecology, processing and analysis framework
US11618438B2 (en) * 2018-03-26 2023-04-04 International Business Machines Corporation Three-dimensional object localization for obstacle avoidance using one-shot convolutional neural network
CN110414301B (en) * 2018-04-28 2023-06-23 中山大学 Train carriage crowd density estimation method based on double cameras
CN108647655B (en) * 2018-05-16 2022-07-12 北京工业大学 Low-altitude aerial image power line foreign matter detection method based on light convolutional neural network
CN109033505A (en) * 2018-06-06 2018-12-18 东北大学 A kind of ultrafast cold temprature control method based on deep learning
CN108829826B (en) * 2018-06-14 2020-08-07 清华大学深圳研究生院 Image retrieval method based on deep learning and semantic segmentation
CN109063569B (en) * 2018-07-04 2021-08-24 北京航空航天大学 Semantic level change detection method based on remote sensing image
CN109191426A (en) * 2018-07-24 2019-01-11 江南大学 A kind of flat image conspicuousness detection method
CN109101907B (en) * 2018-07-28 2020-10-30 华中科技大学 Vehicle-mounted image semantic segmentation system based on bilateral segmentation network
CN109389128B (en) 2018-08-24 2021-08-27 中国石油天然气股份有限公司 Automatic extraction method and device for electric imaging logging image characteristics
CN109035315A (en) * 2018-08-28 2018-12-18 武汉大学 Merge the remote sensing image registration method and system of SIFT feature and CNN feature
CN110866425A (en) * 2018-08-28 2020-03-06 天津理工大学 Pedestrian identification method based on light field camera and depth migration learning
CN109389051A (en) * 2018-09-20 2019-02-26 华南农业大学 A kind of building remote sensing images recognition methods based on convolutional neural networks
CN109284741A (en) * 2018-10-30 2019-01-29 武汉大学 A kind of extensive Remote Sensing Image Retrieval method and system based on depth Hash network
CN109522821A (en) * 2018-10-30 2019-03-26 武汉大学 A kind of extensive across source Remote Sensing Image Retrieval method based on cross-module state depth Hash network
CN109522435B (en) * 2018-11-15 2022-05-20 中国银联股份有限公司 Image retrieval method and device
CN109639964A (en) * 2018-11-26 2019-04-16 北京达佳互联信息技术有限公司 Image processing method, processing unit and computer readable storage medium
US11593655B2 (en) * 2018-11-30 2023-02-28 Baidu Usa Llc Predicting deep learning scaling
CN109753576A (en) * 2018-12-25 2019-05-14 上海七印信息科技有限公司 A kind of method for retrieving similar images
CN111368109B (en) * 2018-12-26 2023-04-28 北京眼神智能科技有限公司 Remote sensing image retrieval method, remote sensing image retrieval device, computer readable storage medium and computer readable storage device
CN109766938A (en) * 2018-12-28 2019-05-17 武汉大学 Remote sensing image multi-class targets detection method based on scene tag constraint depth network
CN109766467B (en) * 2018-12-28 2019-12-13 珠海大横琴科技发展有限公司 Remote sensing image retrieval method and system based on image segmentation and improved VLAD
CN109670057B (en) * 2019-01-03 2021-06-29 电子科技大学 Progressive end-to-end depth feature quantization system and method
CN109902192B (en) * 2019-01-15 2020-10-23 华南师范大学 Remote sensing image retrieval method, system, equipment and medium based on unsupervised depth regression
CN109886221B (en) * 2019-02-26 2021-02-02 浙江水利水电学院 Sand production ship identification method based on image significance detection
CN109919059B (en) * 2019-02-26 2021-01-26 四川大学 Salient object detection method based on deep network layering and multi-task training
CN109919108B (en) * 2019-03-11 2022-12-06 西安电子科技大学 Remote sensing image rapid target detection method based on deep hash auxiliary network
CN110020658B (en) * 2019-03-28 2022-09-30 大连理工大学 Salient object detection method based on multitask deep learning
CN110263799A (en) * 2019-06-26 2019-09-20 山东浪潮人工智能研究院有限公司 A kind of image classification method and device based on the study of depth conspicuousness similar diagram
CN110334765B (en) * 2019-07-05 2023-03-24 西安电子科技大学 Remote sensing image classification method based on attention mechanism multi-scale deep learning
CN110399847B (en) * 2019-07-30 2021-11-09 北京字节跳动网络技术有限公司 Key frame extraction method and device and electronic equipment
CN110414513A (en) * 2019-07-31 2019-11-05 电子科技大学 Vision significance detection method based on semantically enhancement convolutional neural networks
CN110633633B (en) * 2019-08-08 2022-04-05 北京工业大学 Remote sensing image road extraction method based on self-adaptive threshold
CN110580503A (en) * 2019-08-22 2019-12-17 江苏和正特种装备有限公司 AI-based double-spectrum target automatic identification method
CN110765886B (en) * 2019-09-29 2022-05-03 深圳大学 Road target detection method and device based on convolutional neural network
CN110852295B (en) * 2019-10-15 2023-08-25 深圳龙岗智能视听研究院 Video behavior recognition method based on multitasking supervised learning
CN112712090A (en) * 2019-10-24 2021-04-27 北京易真学思教育科技有限公司 Image processing method, device, equipment and storage medium
CN110853053A (en) * 2019-10-25 2020-02-28 天津大学 Salient object detection method taking multiple candidate objects as semantic knowledge
CN111160127B (en) * 2019-12-11 2023-07-21 中国四维测绘技术有限公司 Remote sensing image processing and detecting method based on deep convolutional neural network model
CN111695572A (en) * 2019-12-27 2020-09-22 珠海大横琴科技发展有限公司 Ship retrieval method and device based on convolutional layer feature extraction
CN111640087B (en) * 2020-04-14 2023-07-14 中国测绘科学研究院 SAR depth full convolution neural network-based image change detection method
CN112052736A (en) * 2020-08-06 2020-12-08 浙江理工大学 Cloud computing platform-based field tea tender shoot detection method
CN112102245A (en) * 2020-08-17 2020-12-18 清华大学 Grape fetus slice image processing method and device based on deep learning
CN112541912B (en) * 2020-12-23 2024-03-12 中国矿业大学 Rapid detection method and device for salient targets in mine sudden disaster scene
CN112579816B (en) * 2020-12-29 2022-01-07 二十一世纪空间技术应用股份有限公司 Remote sensing image retrieval method and device, electronic equipment and storage medium
CN112667832B (en) * 2020-12-31 2022-05-13 哈尔滨工业大学 Vision-based mutual positioning method in unknown indoor environment
CN112801192B (en) * 2021-01-26 2024-03-19 北京工业大学 Extended LargeVis image feature dimension reduction method based on deep neural network
CN112926667B (en) * 2021-03-05 2022-08-30 中南民族大学 Method and device for detecting saliency target of depth fusion edge and high-level feature
CN113205481A (en) * 2021-03-19 2021-08-03 浙江科技学院 Salient object detection method based on stepped progressive neural network
CN113326926B (en) * 2021-06-30 2023-05-09 上海理工大学 Fully-connected hash neural network for remote sensing image retrieval
CN115292530A (en) * 2022-09-30 2022-11-04 北京数慧时空信息技术有限公司 Remote sensing image overall management system
CN116894100B (en) * 2023-07-24 2024-04-09 北京和德宇航技术有限公司 Remote sensing image display control method, device and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105243154A (en) * 2015-10-27 2016-01-13 武汉大学 Remote sensing image retrieval method and system based on significant point characteristics and spare self-encodings
CN105550709A (en) * 2015-12-14 2016-05-04 武汉大学 Remote sensing image power transmission line corridor forest region extraction method
CN106227851A (en) * 2016-07-29 2016-12-14 汤平 Based on the image search method searched for by depth of seam division that degree of depth convolutional neural networks is end-to-end
CN106250812A (en) * 2016-07-15 2016-12-21 汤平 A kind of model recognizing method based on quick R CNN deep neural network
CN106296692A (en) * 2016-08-11 2017-01-04 深圳市未来媒体技术研究院 Image significance detection method based on antagonism network
CN106295139A (en) * 2016-07-29 2017-01-04 姹ゅ钩 A kind of tongue body autodiagnosis health cloud service system based on degree of depth convolutional neural networks
CN106354735A (en) * 2015-07-22 2017-01-25 杭州海康威视数字技术股份有限公司 Image target searching method and device
CN106408001A (en) * 2016-08-26 2017-02-15 西安电子科技大学 Rapid area-of-interest detection method based on depth kernelized hashing

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9218563B2 (en) * 2012-10-25 2015-12-22 Brain Corporation Spiking neuron sensory processing apparatus and methods for saliency detection
US9373058B2 (en) * 2014-05-29 2016-06-21 International Business Machines Corporation Scene understanding using a neurosynaptic system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106354735A (en) * 2015-07-22 2017-01-25 杭州海康威视数字技术股份有限公司 Image target searching method and device
CN105243154A (en) * 2015-10-27 2016-01-13 武汉大学 Remote sensing image retrieval method and system based on significant point characteristics and spare self-encodings
CN105550709A (en) * 2015-12-14 2016-05-04 武汉大学 Remote sensing image power transmission line corridor forest region extraction method
CN106250812A (en) * 2016-07-15 2016-12-21 汤平 A kind of model recognizing method based on quick R CNN deep neural network
CN106227851A (en) * 2016-07-29 2016-12-14 汤平 Based on the image search method searched for by depth of seam division that degree of depth convolutional neural networks is end-to-end
CN106295139A (en) * 2016-07-29 2017-01-04 姹ゅ钩 A kind of tongue body autodiagnosis health cloud service system based on degree of depth convolutional neural networks
CN106296692A (en) * 2016-08-11 2017-01-04 深圳市未来媒体技术研究院 Image significance detection method based on antagonism network
CN106408001A (en) * 2016-08-26 2017-02-15 西安电子科技大学 Rapid area-of-interest detection method based on depth kernelized hashing

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
FP-CNNH:一种基于深度卷积神经网络的快速图像哈希算法;刘冶 等;《计算机科学》;20160930;第43卷(第9期);第39-46、51页 *
基于卷积神经网络和哈希编码的图像检索方法;龚震霆 等;《智能系统学报》;20160630;第11卷(第3期);第391-400页 *
基于卷积神经网络和监督核哈希的图像检索方法;柯圣财 等;《电子学报》;20170131;第45卷(第1期);第157-163页 *

Also Published As

Publication number Publication date
CN106909924A (en) 2017-06-30

Similar Documents

Publication Publication Date Title
CN106909924B (en) Remote sensing image rapid retrieval method based on depth significance
Li et al. Automated terrain feature identification from remote sensing imagery: a deep learning approach
CN110443143B (en) Multi-branch convolutional neural network fused remote sensing image scene classification method
CN108960330B (en) Remote sensing image semantic generation method based on fast regional convolutional neural network
Xie et al. Multilevel cloud detection in remote sensing images based on deep learning
Chen et al. Target classification using the deep convolutional networks for SAR images
CN108038445B (en) SAR automatic target identification method based on multi-view deep learning framework
Sameen et al. Classification of very high resolution aerial photos using spectral‐spatial convolutional neural networks
Zhang et al. Scene classification via a gradient boosting random convolutional network framework
CN110516095B (en) Semantic migration-based weak supervision deep hash social image retrieval method and system
Zhang et al. Ensemble multiple kernel active learning for classification of multisource remote sensing data
EP3029606A2 (en) Method and apparatus for image classification with joint feature adaptation and classifier learning
CN110929080B (en) Optical remote sensing image retrieval method based on attention and generation countermeasure network
CN111680176A (en) Remote sensing image retrieval method and system based on attention and bidirectional feature fusion
CN112949647B (en) Three-dimensional scene description method and device, electronic equipment and storage medium
Sumbul et al. Informative and representative triplet selection for multilabel remote sensing image retrieval
Guo et al. Using multi-scale and hierarchical deep convolutional features for 3D semantic classification of TLS point clouds
Kang et al. Noise-tolerant deep neighborhood embedding for remotely sensed images with label noise
Polewski et al. Combining active and semisupervised learning of remote sensing data within a renyi entropy regularization framework
CN113988147A (en) Multi-label classification method and device for remote sensing image scene based on graph network, and multi-label retrieval method and device
CN113205103A (en) Lightweight tattoo detection method
Sjahputera et al. Clustering of detected changes in high-resolution satellite imagery using a stabilized competitive agglomeration algorithm
Chen et al. Supervised and adaptive feature weighting for object-based classification on satellite images
CN109583371A (en) Landmark information based on deep learning extracts and matching process
CN114579794A (en) Multi-scale fusion landmark image retrieval method and system based on feature consistency suggestion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant