CN108960330B - Remote sensing image semantic generation method based on fast regional convolutional neural network - Google Patents
Remote sensing image semantic generation method based on fast regional convolutional neural network Download PDFInfo
- Publication number
- CN108960330B CN108960330B CN201810744473.0A CN201810744473A CN108960330B CN 108960330 B CN108960330 B CN 108960330B CN 201810744473 A CN201810744473 A CN 201810744473A CN 108960330 B CN108960330 B CN 108960330B
- Authority
- CN
- China
- Prior art keywords
- image
- neural network
- text
- network
- remote sensing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/75—Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
- G06V10/757—Matching configurations of points or features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Abstract
The invention provides a remote sensing image semantic generation method based on a fast regional convolutional neural network, which mainly solves the problems that the prior art cannot obtain the relation between a target and an object in an image and cannot obtain the relation between the target and the whole image. The implementation scheme is as follows: constructing a training sample set and a testing sample set; extracting image features of the high-resolution remote sensing image by using a fast regional convolutional neural network; extracting text features of corresponding sentences by using a bidirectional cyclic neural network; matching the image characteristics with the text characteristics by using a probability-based image-text matching model; and training the long-time and short-time memory network by using the matched image-text characteristics, thereby realizing semantic generation of the high-resolution remote sensing image. The method fully considers the characteristics of complex background and various targets of the remote sensing image, improves the semantic generation result of the remote sensing image, and can be used for image retrieval or scene classification.
Description
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to an image semantic generation method which can be used for automatically describing the content of a remote sensing image.
Background
The content understanding and description of the remote sensing image can provide decision-making support for remote sensing application, and the method has wide practical application value. For example, in the field of military reconnaissance, existing research algorithms can achieve rapid identification of important military targets such as ports, airports, ships, and the like from remote sensing images. The remote sensing image content understanding and description can accurately and comprehensively understand the large-width battlefield image, so that real-time interpretation, dynamic information generation and the like of the battlefield geographic environment are realized. In the civil aspect, the understanding and description of the remote sensing image content can accurately provide easily understood information such as disaster assessment, vegetation coverage, crop growth condition, city transition and the like.
The study of understanding and describing the image content is mostly to describe natural images by means of a convenient tool of natural language processing. Most of image description work can be regarded as a retrieval process, generally, image features and text features are unified into a vector space, and then character features are retrieved through the image features or the image features are retrieved by utilizing the text features; another method is to build a database for training comprising images and their text descriptions, learn the correspondence between images and text, and use it for the generation of new image text. All these methods are implemented under a supervised framework, and the generated text is a description of the image content. For example, the Stanford Liffy team uses the deep neural network model to deduce the corresponding relationship between these vocabulary segments and image regions, and then uses it to generate a generalized language description model. In the field of computer vision, the description of images or videos works abnormally hot, and Socher et al and Wang et al study the associations between images and words for describing objects in the images.
Due to the influences of adverse factors such as complicated and various targets, difficult sample labeling and the like, the content understanding and description of the remote sensing image are developed slowly compared with the content understanding and description of the natural image. The existing achievements mostly focus on the aspects of semantic extraction and image retrieval of remote sensing images. Such as: the remote sensing image retrieval is realized by utilizing natural language, and a remote sensing image retrieval model based on semantic mining is provided by Zhang Han university and professor Zhang Han dynasty and the like; the YangJunLi, Beijing university of aerospace, and the like, proposes a concept of modeling the bottom-layer characteristics and the context information of the remote sensing image through a CRF frame. The content understanding and description of the remote sensing images are usually obtained by a statistical learning method, firstly semantic related information implicit in the images is obtained, and then the images are further analyzed according to the corresponding relation between the low-level features and the semantic features. The method can obtain auxiliary recognition shallow semantic information, but the method is not deep enough, stays in the target positioning and recognition stage, cannot obtain the relation between the targets in the image and the overall relation between the targets and the image, and influences the precision of subsequent tasks such as image detection, scene classification and the like.
Disclosure of Invention
The invention aims to provide a remote sensing image semantic generation method based on a fast regional convolutional neural network aiming at the defects of the prior art, so that visual information in a high-resolution remote sensing image is fully utilized, and text information in description sentences is combined to obtain the relation between a target and the target in the image and the relation between the target and the whole image, and the precision of tasks such as image retrieval, scene classification and the like is improved.
In order to achieve the above purpose, the implementation steps of the invention comprise the following steps:
(1) taking 60% of image-text pairs in the remote sensing data set as training samples, and taking the rest 40% of image-text pairs as test samples;
(2) extracting image features of remote sensing images in training samples by using a fast regional convolutional neural network;
(3) extracting text features of texts corresponding to the remote sensing images in the training samples by using a bidirectional cyclic neural network;
(4) matching the image characteristics obtained in the step (2) with the text characteristics obtained in the step (3) by using a probability model-based image-text matching method to obtain matched image-text characteristics;
(5) training the long-time and short-time memory network by using the matched image-text characteristics in the step (4);
(6) and (5) extracting the image characteristics of the remote sensing image in the test sample by using the fast regional convolutional neural network, and inputting the image characteristics into the trained long-time and short-time memory network to perform semantic generation to obtain a statement for describing the image content.
Compared with the prior art, the invention has the following advantages:
firstly, the method based on the fast regional convolutional neural network is adopted, so that the visual characteristics suitable for describing the high-resolution remote sensing image can be more accurately obtained.
Secondly, the method adopts the image characteristic text characteristic matching based on the probability model, so that the method can better construct the corresponding relation from the image characteristic to the text characteristic.
Drawings
FIG. 1 is a flow chart of an implementation of the present invention;
FIG. 2 is a block diagram of a fast regional convolutional neural network used in the present invention to extract image features;
FIG. 3 is a block diagram of a regional candidate network in the fast regional convolutional neural network of the present invention;
FIG. 4 is a diagram of a feature transformation module in a regional candidate network in accordance with the present invention;
FIG. 5 is a block diagram of a bi-directional recurrent neural network used in the present invention to extract text features;
FIG. 6 is a diagram of a long-term and short-term memory network for text prediction according to the present invention;
FIG. 7 is a graph comparing evaluation results of the present invention under different evaluation indexes;
Detailed Description
The invention is described in further detail below with reference to the following figures and specific examples:
referring to fig. 1, the implementation steps of the present invention are as follows.
Step 1, constructing a training sample set and a testing sample set.
Three remote sensing image semantic generation Data sets of UCM-Captions Data Set, Sydney-Captions Data Set and RSICD are downloaded from key laboratory websites of surveying and mapping remote sensing countries of Wuhan university, and 60% of image-text pairs in each Data Set are respectively used as training samples, and the rest 40% of image-text pairs are used as test samples.
Step 2, extracting the image characteristics of the remote sensing image in the training sample by using a fast regional convolution network:
the structure of the fast area convolutional network is shown in fig. 2, which includes an area candidate network and a three-layer convolutional neural network, where:
the structure of the regional candidate network is shown in FIG. 3, and the regional candidate network comprises a feature transformation module and a regional suggestion module, wherein the feature transformation module adopts VGG-16, and the structure is shown in FIG. 4; the region suggestion module has two branches: classification branches and regression branches.
The three-layer convolutional neural network is composed of two convolutional layers and a full-connection layer, the number of nodes of the convolutional layers is 256, and the number of nodes of the full-connection layer is 4096.
The specific implementation of this step is as follows:
(2a) screening out a candidate region containing a target by using a region candidate network in the fast regional convolutional network:
converting the original picture into a feature map through a feature conversion module; sliding a 3x3 window on the feature map by using a step size of 1 through a region suggestion module, and generating k candidate regions with different sizes from the center of each window, wherein k is 9 in the example;
respectively converting the k candidate regions into 256-dimensional vectors, inputting the vectors into a region suggestion module, and outputting indicating variables of 2k candidate regions by a classification branch in the module, wherein the indicating variables are 1 to represent targets and 0 to represent non-targets; outputting coordinates of 4k candidate regions by a regression branch in the module, and generating a candidate region containing a target by an indicating variable and the coordinates;
(2b) extracting (2a) image features of the candidate regions using a three-layer convolutional neural network of the fast region convolutional neural network:
according to the pixel I in the ith candidate regioniUsing the three-layer convolutional neural network, the region image features are represented as:wherein, thetacNetwork parameter, W, representing a regional candidate networkmAnd bmWeights and biases representing three layers of convolutional neural networks, respectively,WmDimension of (d) is h × 4096, h represents the dimension of the embedding space;is represented byiThe passing parameter is thetac4096-dimensional full-link layer vectors output by the area candidate network; v. ofiRepresenting the features of the image extracted from the ith candidate region, including the features of the entire image and the first 19 detected positions;
(2c) extracting image characteristics v of each candidate regioniAnd merging to obtain the image characteristics v of the pictures, wherein the dimensionality of the image characteristics v is equal to the number of the screened candidate regions.
And 3, extracting text features of sentences in the training samples by using the bidirectional cyclic neural network.
The structure of the bidirectional cyclic neural network is shown in fig. 5, in which the number of input nodes and the number of output nodes are both set to N, and the number of forward implicit units and the number of backward implicit units are both set to h.
The specific implementation of this step is as follows:
(3a) inputting a word sequence with the length of N, and converting the t-th word into an h-dimensional vector: x is the number oft=WnΦtWherein phi istRepresenting an indication column vector, representing the position index of the word in the vocabulary, and t representing the position of each word in the sentence, wherein the value of t is 1 to N; wnAn embedded matrix for a given word, which is randomly initialized;
(3b) randomly initializing forward weights W of a bidirectional recurrent neural networkrForward bias brBackward weight WlAnd a backward bias blRespectively and iteratively calculating the forward implicit unit output of the bidirectional cyclic neural networkAnd backward implicit element output
Wherein the content of the first and second substances,is the t-1 forward hidden unit output, x of the bidirectional cyclic neural networktFor the h-dimensional vector of the t-th word in (3a), t increases from 1 to N, setting
Setting for t +1 th backward hidden unit output of bidirectional cyclic neural network, t is reduced to 1 from N
(3c) Randomly initializing output layer weights W of a networkdAnd bias bdIteratively calculating the output s of the networkt:
Wherein the content of the first and second substances,andthe forward implicit cell output and the backward implicit cell output of the bidirectional recurrent neural network in (3b),
(3d) calculating a cross entropy loss function L (θ):
whereinAn indicated column vector for the desired output, i.e., the t-th word, representing the index of the word's position in the vocabulary;
(3e) optimizing L (theta) by using a random gradient descent algorithm to obtain a trained bidirectional cyclic neural network, wherein the output s of the networktI.e. the text feature of the t-th word, which contains the position of the word and context information.
And 4, matching the image features and the text features extracted from the training sample by using a probability-based image-text matching model.
(4a) Representing the image characteristics of each remote sensing image and the corresponding text characteristics describing sentences by using vectors, and uniformly transforming the image characteristics and the corresponding text characteristics into an h-dimensional space; using dot productsText feature s representing ith region image feature vi and tth wordtCalculating the matching degree S between the p-th image and the q-th sentencepq:
Wherein g ispRepresenting a series of image block areas in the p-th image, gqRepresenting a series of words in the qth sentence;
(4b) for one picture of the training set, all sentences are traversed, and all matches of the picture are calculated in sequenceMatching degree, selecting maximum matching degree SpqThe corresponding q-th sentence is used as a sentence index matched with the image;
(4c) traversing all the images of the training set, and repeating the step (4b) to obtain sentence indexes respectively matched with all the images;
(4d) according to the matched sentence index and the image index corresponding to the sentence index, the corresponding image characteristic v in (2) and the corresponding text characteristic s in (3) are retrieved from each picture1,…,st,…sNFinish the matching process, where stThe text characteristic of the t-th word in the sentence is shown, t is from 1 to N, and N is the number of words in the sentence.
Step 5, training a long-term and short-term memory network capable of text prediction:
the long and short term memory network is structured as shown in fig. 6, where the number of network input nodes and the number of network output nodes are both set to be N, and the number of hidden units is set to be h.
The specific implementation of this step is as follows:
(5a) random initialization long-time memory network input layer weight WhxHidden layer weight WhhOutput layer weight WohHidden layer bias bhAnd output layer bias boInput image feature v, corresponding text feature s1,…,st,…sNOriginal sentences in the 1 and the t-th hidden unit output h of the long-time and short-time memory network are calculated in an iterative mannertAnd output layer output yt:
ht=f(Whxst+Whhht-1+bh+Θ(t=1)⊙v),
Wherein the image feature vector v, h in (2) is input only when t is 1t-1Setting for t-1 hidden unit output of long-short time memory networkImplicit Unit size is set to 512, output vector ytA probabilistic output representing a word;
Wherein the content of the first and second substances,indicating the position index of the word in the vocabulary table for the expected output, namely the indicated column vector of the t word in the original sentence corresponding to the image;
(5c) use of stochastic gradient descent algorithm to correct cross entropy lossAnd optimizing to obtain the trained long-time and short-time memory network.
And 6, performing semantic generation on the test sample by using the trained long-time and short-time memory network to obtain a sentence for describing the image content.
(6a) Extracting the image characteristics of the remote sensing image in the test sample by using the fast regional convolutional neural network in the step (2)
(6b) T-th hidden unit output h of iterative computation long-and-short-term memory networktNetwork output ytAnd text feature s of the t +1 th wordt+1:
st+1=yt,
(6c) According to ytAnd searching words from the vocabulary table by the index corresponding to the medium maximum value, and forming a sentence by all the words which are sequentially output, namely the generated sentence for describing the image content.
The technical effects of the invention are further described by combining simulation experiments as follows:
1. simulation conditions
The algorithm simulation platform of the embodiment is as follows: the main frequency is CPU of 4.00GHz, memory of 32.0GB, video card GTX-1070, Ubuntu (64 bit) operating system, Mxnet and Python development platform.
The data set used for the simulation was: UCM-potentials Data Set, Sydney-potentials Data Set and RSICD.
The experiment uses indexes such as Bleu1, Bleu2, Bleu3, Bleu4, METEOR, ROUGE _ L and CIDER to evaluate the method.
2. Simulation content and results
Simulation 1, in order to compare the results of the invention under different evaluation indexes, the method of the invention is used for semantic generation, and the three data sets are respectively tested, and the test results are shown in table 1.
Table 1 experimental results of different data sets
Table 1 shows the results of the method of the invention on different data sets for different evaluation indices.
As can be seen from table 1, the results of the various indicators on the RSICD dataset are slightly lower than the UCM and Sydney datasets. But compared with the best semantic generation method at present, on the three data sets, except the CIDER, the results of other indexes are improved by about 0.05.
And 2, in order to verify the influence of the training data proportion on the semantic generation result of the high-resolution remote sensing image, 10% of images are used as a verification set, the proportion of the training set is increased from 10% to 80%, the semantic generation is carried out by using the method, the three data sets are respectively tested, and the test result is shown in FIG. 7.
Fig. 7 shows the results of the method of the present invention at different training set ratios for different evaluation indices, where the abscissa represents the training set ratio and the ordinate represents the corresponding result, and different metrics are represented by different shapes. Wherein fig. 7(a) shows the results on the UCM data set, fig. 7(b) shows the results on the Sydney data set, and fig. 7(c) shows the results on the RSICD data set.
As can be seen from fig. 7(a), as the proportion of the training set increases, the indexes in the result of the UCM data set increase and then gradually become stable;
as can be seen from fig. 7(b), as the training set proportion increases, the indexes in the result of the Sydney data set increase in the initial stage, and after the training proportion reaches 40%, the result almost becomes stable and does not change any more, and the reason is analyzed by the data set, mainly due to the unbalanced distribution of the data set;
as can be seen from fig. 7(c), as the proportion of the training set increases, the indexes in the results of the RSICD dataset increase, but the overall index is slightly lower because the sentence expression in the dataset is more complex.
In conclusion, the method and the device can improve the semantic generation result of the remote sensing image, but the performance of the method and the device on complex data sets and individual indexes needs to be improved.
Claims (5)
1. A remote sensing image semantic generation method based on a fast regional convolutional neural network is characterized by comprising the following steps:
(1) taking 60% of image-text pairs in the remote sensing data set as training samples, and taking the rest 40% of image-text pairs as test samples;
(2) extracting image features of remote sensing images in training samples by using a fast regional convolutional neural network;
(3) extracting text features of texts corresponding to the remote sensing images in the training samples by using a bidirectional cyclic neural network;
(4) matching the image characteristics obtained in the step (2) with the text characteristics obtained in the step (3) by using a probability model-based image-text matching method to obtain matched image-text characteristics; comprises the following steps:
(4a) representing the image characteristics of each remote sensing image and the corresponding text characteristics describing sentences by using vectors, and unifying the image characteristics and the corresponding text characteristics to the same dimensional space; using dot productsImage feature v representing the ith candidate regioniText feature s associated with the t-th wordtCalculating the matching degree S between the p-th image and the q-th sentencepq:
Wherein g ispRepresenting a series of image block areas in the p-th image, gqRepresenting a series of words in the qth sentence;
(4b) traversing all sentences aiming at one picture of the training set, sequentially calculating all matching degrees of the picture, and selecting the maximum matching degree SpqThe corresponding q-th sentence is used as a sentence index matched with the image;
(4c) traversing all the images of the training set, and repeating the step (4b) to obtain sentence indexes respectively matched with all the images;
(4d) according to the matched sentence index and the image index corresponding to the sentence index, the corresponding image characteristic v in (2) and the corresponding text in (3) are retrieved from each pictureFeature s1,…,st,…sNFinish the matching process, where stThe method is characterized in that the method is a text feature of the tth word in a sentence, t is from 1 to N, and N is the number of words in the sentence;
(5) training the long-time and short-time memory network by using the matched image-text characteristics in the step (4);
(6) and (5) extracting the image characteristics of the remote sensing image in the test sample by using the fast regional convolutional neural network, and inputting the image characteristics into the trained long-time and short-time memory network to perform semantic generation to obtain a statement for describing the image content.
2. The method of claim 1, wherein the image features of the remote sensing image are extracted by using a fast area convolution neural network in (2) by:
(2a) generating a candidate region by using a regional candidate network in the fast regional convolutional neural network to obtain a candidate region containing different targets in each high-resolution remote sensing image;
(2b) extracting (2a) the image characteristics of the candidate region by using a three-layer convolutional neural network in the fast regional convolutional neural network, and according to the pixel I in the ith candidate regioniUsing a three-layer convolutional neural network, the region image features are represented as:wherein, thetacNetwork parameter, W, representing a regional candidate networkmAnd bmParameters representing a three-layer convolutional neural network, WmDimension of (d) is h × 4096, h represents the dimension of the embedding space;is represented byiThe passing parameter is thetac4096-dimensional full-link layer vectors output by the area candidate network; v. ofiRepresenting the features of the image extracted from the ith candidate region, including the features of the entire image and the first 19 detected positions;
(2c) extracting image characteristics v of each candidate regioniMerging to obtain picturesAn image feature v having a dimension equal to the number of candidate regions.
3. The method of claim 1, wherein the extracting text features of the text corresponding to the remote sensing images in the training samples by using the bidirectional recurrent neural network in (3) comprises the following steps:
(3a) inputting a word sequence of length N, by xt=WnΦtConverting the t-th word into an h-dimension vector xtWherein phi istRepresenting an indication column vector, representing the position index of the tth word in a vocabulary table, wherein the value of t is 1 to N, and representing the position of each word in a sentence; weight WnTo specify an embedded matrix of words, W is randomly initializedn;
(3b) Randomly initializing forward weights W of a bidirectional recurrent neural networkrForward bias brBackward weight WlAnd a backward bias blRespectively and iteratively calculating the forward implicit unit output of the bidirectional cyclic neural networkAnd backward implicit element output
Wherein the content of the first and second substances,is the t-1 forward hidden unit output, x of the bidirectional cyclic neural networktH-dimensional vector of the t-th word in (3a)T is increased from 1 to N, set
Setting for t +1 th backward hidden unit output of bidirectional cyclic neural network, t is reduced to 1 from N
(3c) randomly initializing output layer weights W of a networkdAnd bias bdIteratively computing a representation s of the t-th wordt:
Wherein the content of the first and second substances,andthe forward implicit cell output and the backward implicit cell output of the bidirectional recurrent neural network in (3b),
(3d) optimizing all network parameters W using a cross entropy functionn、Wr、br、Wl、bl、WdAnd bdObtaining a trained bidirectional cyclic neural network, the output s of whichtThe method is a text feature of the tth word, and the text feature comprises the position of the tth word and context information.
4. The method of claim 1, wherein the step (5) of training the long-short term memory network by using the matched image-text features in the step (4) comprises the following steps:
(5a) random initialization long-time memory network input layer weight WhxHidden layer weight WhhOutput layer weight WohHidden layer bias bhAnd output layer bias boInput image feature v, corresponding text feature s1,…,st,…sNOriginal sentences in the 1 and the t-th hidden unit output h of the long-time and short-time memory network are calculated in an iterative mannertAnd output layer output yt:
ht=f(Whxst+Whhht-1+bh+Θ(t=1)⊙v),
Wherein the image feature vector v, h in (2) is input only when t is 1t-1Setting for t-1 hidden unit output of long-short time memory network
(5b) Calculating ytCross entropy L (θ):
wherein the content of the first and second substances,indicating the position index of the word in the vocabulary table for the expected output, namely the indicated column vector of the t word in the original sentence corresponding to the image; n is the number of words in the sentence;
(5c) using cross entropy optimization instituteSome network parameters Whx,Whh,WohAnd bh,boAnd obtaining the trained long-time and short-time memory network.
5. The method according to claim 1, wherein in the step (6), semantic generation is performed in the trained long-and-short-term memory network (5), so as to obtain a sentence describing the image content, and the method comprises the following steps:
(6a) extracting the image characteristics of the remote sensing image in the test sample by using the fast regional convolutional neural network in the step (2)
(6b) T-th hidden unit output h of iterative computation long-and-short-term memory networktNetwork output ytAnd text feature s of the t +1 th wordt+1:
st+1=yt,
Wherein is provided withInputting the image feature vector of the test sample only when t is 1WhxRepresents the input layer weight, WhhRepresenting hidden layer weights, WohRepresents the output layer weight, bhIndicating the hidden layer bias and boIndicating an output layer bias;
(6c) according to ytThe index corresponding to the maximum value in the vocabulary table is used for searching words, and the words are sequentially searchedAll the output words form a sentence, which is the generated sentence describing the image content.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810744473.0A CN108960330B (en) | 2018-07-09 | 2018-07-09 | Remote sensing image semantic generation method based on fast regional convolutional neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810744473.0A CN108960330B (en) | 2018-07-09 | 2018-07-09 | Remote sensing image semantic generation method based on fast regional convolutional neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108960330A CN108960330A (en) | 2018-12-07 |
CN108960330B true CN108960330B (en) | 2021-09-10 |
Family
ID=64483489
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810744473.0A Active CN108960330B (en) | 2018-07-09 | 2018-07-09 | Remote sensing image semantic generation method based on fast regional convolutional neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108960330B (en) |
Families Citing this family (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109740471B (en) * | 2018-12-24 | 2021-06-22 | 中国科学院西安光学精密机械研究所 | Remote sensing image description method based on joint latent semantic embedding |
CN109784223B (en) * | 2018-12-28 | 2020-09-01 | 珠海大横琴科技发展有限公司 | Multi-temporal remote sensing image matching method and system based on convolutional neural network |
CN111476838A (en) * | 2019-01-23 | 2020-07-31 | 华为技术有限公司 | Image analysis method and system |
CN111753822A (en) * | 2019-03-29 | 2020-10-09 | 北京市商汤科技开发有限公司 | Text recognition method and device, electronic equipment and storage medium |
CN110287355B (en) * | 2019-05-16 | 2021-06-22 | 中国科学院西安光学精密机械研究所 | Remote sensing image description method based on retrieval topic memory network |
CN110232413A (en) * | 2019-05-31 | 2019-09-13 | 华北电力大学(保定) | Insulator image, semantic based on GRU network describes method, system, device |
CN110363303B (en) * | 2019-06-14 | 2023-07-07 | 平安科技(深圳)有限公司 | Memory training method and device for intelligent distribution model and computer readable storage medium |
CN110378335B (en) * | 2019-06-17 | 2021-11-19 | 杭州电子科技大学 | Information analysis method and model based on neural network |
CN110418210B (en) * | 2019-07-12 | 2021-09-10 | 东南大学 | Video description generation method based on bidirectional cyclic neural network and depth output |
US20210073317A1 (en) * | 2019-09-05 | 2021-03-11 | International Business Machines Corporation | Performing dot product operations using a memristive crossbar array |
CN110991284B (en) * | 2019-11-22 | 2022-10-18 | 北京航空航天大学 | Optical remote sensing image statement description generation method based on scene pre-classification |
CN111126479A (en) * | 2019-12-20 | 2020-05-08 | 山东浪潮人工智能研究院有限公司 | Image description generation method and system based on unsupervised uniqueness optimization |
CN112070069A (en) * | 2020-11-10 | 2020-12-11 | 支付宝(杭州)信息技术有限公司 | Method and device for identifying remote sensing image |
CN112861882B (en) * | 2021-03-10 | 2023-05-09 | 齐鲁工业大学 | Image-text matching method and system based on frequency self-adaption |
CN113298151A (en) * | 2021-05-26 | 2021-08-24 | 中国电子科技集团公司第五十四研究所 | Remote sensing image semantic description method based on multi-level feature fusion |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105868691A (en) * | 2016-03-08 | 2016-08-17 | 中国石油大学(华东) | Urban vehicle tracking method based on rapid region convolutional neural network |
CN107657008A (en) * | 2017-09-25 | 2018-02-02 | 中国科学院计算技术研究所 | Across media training and search method based on depth discrimination sequence study |
CN107729987A (en) * | 2017-09-19 | 2018-02-23 | 东华大学 | The automatic describing method of night vision image based on depth convolution loop neutral net |
CN108073941A (en) * | 2016-11-17 | 2018-05-25 | 江南大学 | A kind of image, semantic generation method based on deep learning |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9940577B2 (en) * | 2015-07-07 | 2018-04-10 | Adobe Systems Incorporated | Finding semantic parts in images |
-
2018
- 2018-07-09 CN CN201810744473.0A patent/CN108960330B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105868691A (en) * | 2016-03-08 | 2016-08-17 | 中国石油大学(华东) | Urban vehicle tracking method based on rapid region convolutional neural network |
CN108073941A (en) * | 2016-11-17 | 2018-05-25 | 江南大学 | A kind of image, semantic generation method based on deep learning |
CN107729987A (en) * | 2017-09-19 | 2018-02-23 | 东华大学 | The automatic describing method of night vision image based on depth convolution loop neutral net |
CN107657008A (en) * | 2017-09-25 | 2018-02-02 | 中国科学院计算技术研究所 | Across media training and search method based on depth discrimination sequence study |
Non-Patent Citations (2)
Title |
---|
Identity-Aware Textual-Visual Matching with Latent Co-attention;Shuang Li 等;《2017 IEEE International Conference on Computer Vision (ICCV)》;20171029;全文 * |
图像理解中的卷积神经网络;常亮 等;《自动化学报》;20160630;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN108960330A (en) | 2018-12-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108960330B (en) | Remote sensing image semantic generation method based on fast regional convolutional neural network | |
CN106909924B (en) | Remote sensing image rapid retrieval method based on depth significance | |
Yue et al. | Self-supervised learning with adaptive distillation for hyperspectral image classification | |
US10248664B1 (en) | Zero-shot sketch-based image retrieval techniques using neural networks for sketch-image recognition and retrieval | |
Li et al. | Large-scale remote sensing image retrieval by deep hashing neural networks | |
Zhao et al. | Hyperspectral anomaly detection based on stacked denoising autoencoders | |
CN107480261B (en) | Fine-grained face image fast retrieval method based on deep learning | |
CN110209824B (en) | Text emotion analysis method, system and device based on combined model | |
CN110516095B (en) | Semantic migration-based weak supervision deep hash social image retrieval method and system | |
EP3029606A2 (en) | Method and apparatus for image classification with joint feature adaptation and classifier learning | |
CN110929080B (en) | Optical remote sensing image retrieval method based on attention and generation countermeasure network | |
Ye et al. | A lightweight model of VGG-16 for remote sensing image classification | |
CN111680176A (en) | Remote sensing image retrieval method and system based on attention and bidirectional feature fusion | |
CN111542841A (en) | System and method for content identification | |
CN112949740B (en) | Small sample image classification method based on multilevel measurement | |
Zeng et al. | IDLN: Iterative distribution learning network for few-shot remote sensing image scene classification | |
Hosseiny et al. | A hyperspectral anomaly detection framework based on segmentation and convolutional neural network algorithms | |
CN114419351A (en) | Image-text pre-training model training method and device and image-text prediction model training method and device | |
Kiani Shahvandi et al. | Small geodetic datasets and deep networks: attention-based residual LSTM autoencoder stacking for geodetic time series | |
Hu et al. | Saliency-based YOLO for single target detection | |
Rivas-Perea et al. | Statistical and neural pattern recognition methods for dust aerosol detection | |
Zhang et al. | Efficient history matching with dimensionality reduction methods for reservoir simulations | |
Moon et al. | Correlation between alignment-uniformity and performance of dense contrastive representations | |
Darvishnezhad et al. | A new model based on multi-aspect images and complex-valued neural network for synthetic aperture radar automatic target recognition | |
Pankaja et al. | A hybrid approach combining CUR matrix decomposition and weighted kernel sparse representation for plant leaf recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |