CN110287354A - A kind of high score remote sensing images semantic understanding method based on multi-modal neural network - Google Patents

A kind of high score remote sensing images semantic understanding method based on multi-modal neural network Download PDF

Info

Publication number
CN110287354A
CN110287354A CN201910406998.8A CN201910406998A CN110287354A CN 110287354 A CN110287354 A CN 110287354A CN 201910406998 A CN201910406998 A CN 201910406998A CN 110287354 A CN110287354 A CN 110287354A
Authority
CN
China
Prior art keywords
remote sensing
high score
neural network
sensing images
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910406998.8A
Other languages
Chinese (zh)
Inventor
卢孝强
屈博
刘康
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
XiAn Institute of Optics and Precision Mechanics of CAS
Original Assignee
XiAn Institute of Optics and Precision Mechanics of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by XiAn Institute of Optics and Precision Mechanics of CAS filed Critical XiAn Institute of Optics and Precision Mechanics of CAS
Priority to CN201910406998.8A priority Critical patent/CN110287354A/en
Publication of CN110287354A publication Critical patent/CN110287354A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/5866Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, manually generated location and time information
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/13Satellite images

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Library & Information Science (AREA)
  • Astronomy & Astrophysics (AREA)
  • Remote Sensing (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The high score remote sensing images semantic understanding method based on multi-modal neural network that the invention discloses a kind of, mainly solves the problems, such as that the analysis of current high score remote sensing images does not understand remote sensing images from high-level semantic.Implementation step is: 1) constructing high score remote sensing images-text descriptive data base;2) visual signature of all images in high score remote sensing images-text descriptive data base is extracted using the good convolutional neural networks of pre-training;3) vocabulary text library is created according to the word in all text descriptive statements of high score remote sensing images-text descriptive data base;4) the training multi-modal neural network of depth;High score remote sensing images are inputted, the corresponding text of high score remote sensing images is generated using the multi-modal neural network of trained depth and describes.

Description

A kind of high score remote sensing images semantic understanding method based on multi-modal neural network
Technical field
The invention belongs to technical field of information processing, in particular to a kind of image understanding technology can be used for disaster monitoring, army Thing is scouted and geographical national conditions are reconnoitred etc..
Background technique
With the development of China's Aerospace Technology, more and more high score satellites are launched into space, and remote sensing images obtain Taking becomes to be more easier, and the resolution ratio of remote sensing images is also being continuously improved, it includes effective information also become increasingly It is abundant.Therefore, Hi-spatial resolution remote sensing image roading, military surveillance, in terms of will play it is huge Effect.In order to rationally utilize these data, the visual information and semantic information of image, the language of high score remote sensing images are fully considered Reason and good sense solution is a very important research direction.
However, the research work of high spatial resolution (hereinafter referred to as high score) remote sensing images is concentrated mainly on following four at present A aspect:
(1) target detection: some interested targets in high score picture, such as aircraft, oil storage tank are detected automatically;
(2) image classification: by the texture and spatial information of types of ground objects various in analysis picture, by each of picture Pixel is assigned in different classifications;
(3) image segmentation: by high score image segmentation at semantic continuous (have identical property, indicate that differently species are other) Region;
(4) scene classification: the scene that every high score picture of identification is included, such as airport, harbour obtain every picture Scene type label.
These above-mentioned work can only often detect in image comprising certain target or acquisition each pixel or whole of image The class label of a image detailed can not point out relationship mutual between the attribute of target, feature and target in picture, Thus target completely is not understood in semantic level.In addition, piece image visual information can rough reflection picture Main contents, and the information for the description image that corresponding text information then can be more careful.
Natural image is concentrated mainly in conjunction with the work that the image, semantic of image vision information and text information understands at present Field can probably be divided into following a few classes:
First is that based on picture-text embedded structure method, the convolutional neural networks that this method uses pre-training good first The feature vector of model extraction image is then mapped the corresponding description text of picture by the good language model of a pre-training To in feature space identical with characteristics of image, image and text are then found by calculating the similitude of both feature vectors Relationship between this is finally that new test picture generates verbal description by the mapping relations learnt.Specific method referring to With reference to text " R Kiros, R Salakhutdinov, and S Zemel, Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models,arXiv preprint arXiv: 1411.2539,2014”。
Second is that the method based on target detection, this method is correlation between each target and description target first A detector is respectively trained out in word, then picture is detected using these trained detectors, to obtain several Then these words are formed several sentences by a language model, finally to obtained these sentences and picture by word Similarity score sequence, the highest sentence of selected and sorted is as final result.Specific method is referring to bibliography " H Fang, S Gupta,F Iandola,et al.,From Captions to Visual Concepts and Back,in Proc.IEEE Conference on Computer Vision and Pattern Recognition,pages:1473-1482,2015”。
Third is that the method based on deep learning, this method is first in the training stage, using depth convolutional neural networks (Convolutional Neural Network, CNN) extracts picture feature, then that picture feature and text information is common It is input to training network in Recognition with Recurrent Neural Network (Recurrent Neural Network, RNN).Finally, will be surveyed in test Attempt piece to input in trained deep neural network, to generate the verbal description of the picture.Specific method is detailed in bibliography “O Vinyal,A Toshev,S Bengio,et al..,Show and Tell:A Neural Image Caption Generator,in Proc.IEEE Conference on Computer Vision and Pattern Recognition, pages:3156-3164,2015”。
Although these methods achieve some good achievements in terms of natural image understanding, in high score remote sensing fields, The semantic understanding of high score remote sensing images remains at blank stage, and current method does not utilize the high-level semantic of image to believe Breath, can not really understand high score remote sensing images.Therefore, how by the visual information of high score remote sensing images and text information knot It altogether, is a significantly problem to realize the understanding to image on semantic level.
Summary of the invention
It is an object of the invention to be directed to the deficiency of above-mentioned existing method, a kind of height based on multi-modal neural network is proposed Divide remote sensing images semantic understanding method, model of this method based on multi-modal neural network, while considering the vision of remote sensing images And semantic information, the correlation in image between the attribute and target of target is paid close attention to, to reach right on high-level semantic The understanding of remote sensing images.
Realize that the object of the invention technical principle is as follows:
(1) text marking is carried out to high score remote sensing images image data base first, every high partial image is labeled 5 and retouches The text for stating its content forms completely new high score remote sensing images-text marking (image-captions) database;
(2) remote sensing images are then extracted by convolutional neural networks (Convolutional Neural Network, CNN) Feature;
(3) Recognition with Recurrent Neural Network will be input to together with the characteristics of image extracted text information corresponding with every image In (Recurrent Neural Network, RNN), training network parameter obtains our multi-modal neural network mould of depth Type;
(4) the corresponding text of high score remote sensing images is generated using the multi-modal neural network of trained depth to describe.
Realize that the specific technical solution of the object of the invention is as follows:
The high score remote sensing images semantic understanding method based on multi-modal neural network that the present invention provides a kind of, including it is following Step:
1) high score remote sensing images-text descriptive data base is constructed;
High score remote sensing images-text the descriptive data base includes several high score remote sensing images and corresponding every high score The a plurality of text descriptive statement of remote sensing images;
2) all images in high score remote sensing images-text descriptive data base are extracted using the good convolutional neural networks of pre-training Visual signature;
Specific formula for calculation are as follows:
b0=CNN (I);
Wherein I is the image data in high score remote sensing images-text descriptive data base, b0For the visual signature of image;
3) according to the word creation vocabulary text in all text descriptive statements of high score remote sensing images-text descriptive data base This library;
Each word is indicated by a vector in the vocabulary text library, and START vector sum END vector, which is added, indicates sentence The starting and termination of son;
4) the training multi-modal neural network of depth;In the multi-modal neural network of depth all moment corresponding one it is defeated Enter layer, a hidden layer and an output layer;
4.1) at the t=1 moment, by the visual signature b of image in step 2)0And the START vector input in step 3) To the hidden layer of the multi-modal neural network of t=1 moment depth, the hidden layer output at depth multi-modal neural network t=1 moment is obtained h1
h1=g (λ1w1+b0);
Wherein, w1For the START vector of input;
G is nonlinear function;
λ1, λ2For network weight parameter to be trained;
4.2) hidden layer at t=1 moment is then exported into h1, it is input to the output at depth multi-modal neural network t=1 moment Then layer calculates the probability distribution of t=1 moment all words by Softmax function;
p(w2)=softmax (λ3h1);
Wherein, λ3For network weight parameter to be trained;
4.3) the highest word w of t=1 moment probability distribution is chosen2, prediction word as the t=1 moment;
4.4) at the moment of t > 1, by last moment predict in word correspond to word vector and last moment network Hidden layer output, while it being input to the hidden layer of the multi-modal neural network of current time depth, it obtains the multi-modal neural network of depth and works as The hidden layer at preceding moment exports;
ht=g (λ1wt2ht-1);
Wherein, htIt is exported for the hidden layer at depth multi-modal neural network current time;
wtFor the input word vector at current time;
4.5) hidden layer at current time is input to the output layer at depth multi-modal neural network current time, is passed through The probability distribution of Softmax function calculating current time all words;
p(wt+1)=softmax (λ3ht);
4.6) the highest word of probability distribution at current time, the prediction word as current time are chosen;
4.7) step 4.4) is repeated to step 4.6) until prediction word vector is END vector;
4.8) it to all training image texts to summing, obtains the multi-modal neural network of optimal depth and totally damages Lose function;
5) high score remote sensing images are inputted, it is corresponding to generate high score remote sensing images using the multi-modal neural network of trained depth Text description.
Further, above-mentioned steps 4.1) in, nonlinear function g is RNN or LSTM, characteristics of image b0By AlexNet or VGGNet or GoogLeNet is extracted.
Further, above-mentioned nonlinear function g is LSTM, characteristics of image b0It is extracted by VGGNet.
Further, above-mentioned steps 1) in a plurality of text descriptive statement content include target or obtain each pixel of image Or mutual relationship between the class label of whole image, the attribute of target, feature and target.
Further, above-mentioned steps 2) described in Image Visual Feature be the full articulamentum of convolutional neural networks the last layer it is defeated 4096 dimensional vectors out.
The beneficial effects of the present invention are:
The present invention compared with the conventional method, has fully considered in high score remote sensing images between the attribute and target of target Correlation, while the vision and semantic information of high score remote sensing images is utilized, to realize in terms of high-level semantic to distant The understanding for feeling image can be used in disaster monitoring, military surveillance and geographical national conditions prospecting etc..
Detailed description of the invention
Fig. 1 is overall flow figure of the invention;
Fig. 2 is the building of high score remote sensing images-text descriptive data base;
Fig. 3 is the multi-modal neural network structure of depth;
Fig. 4 is to adopt the high score remote sensing images text generation result figure being obtained by the present invention.
Specific embodiment
With reference to the accompanying drawing, to the realization step and contrast test verification process of semantic understanding method provided by the invention It is further described:
Referring to Fig.1, realize that step of the invention is as follows:
Step 1): building high score remote sensing images-text describes (image-captions) database;
High score remote sensing images-text descriptive data base includes several high score remote sensing images and corresponding every high score remote sensing The a plurality of text descriptive statement of image;
In the present embodiment on the basis of existing high score Remote Sensing Database Sydney database and UCM database, building UCM-captions database and Sydney-captions database are as high score remote sensing images-text description (image- Captions) database;
Wherein Sydney-captions database includes the high score remote sensing images that 613 resolution ratio are 0.3 meter/pixel, The corresponding 5 word texts description of every image, amounts to 3065 texts;UCM-captions database includes that 2100 resolution ratio are The high score remote sensing images of 0.5 meter/pixel, the corresponding 5 word texts description of every image, amount to 10500 texts (reference Fig. 2) this Two high score remote sensing images-text database foundation, be subsequent high score remote sensing images semantic understanding model training and The solution of semantic understanding problem lays the foundation.
It should be understood that in the present embodiment in high score remote sensing images-text description (image-captions) database Every image be all made of the text descriptions of 5 words, but be not limited only to 5 words, only needing the content of text description includes target Or obtain relationship mutual between each pixel of image or the class label of whole image, the attribute of target, feature and target ?.
Step 2): institute in high score remote sensing images-text descriptive data base is extracted using the good convolutional neural networks of pre-training There is the visual signature of image;Wherein, the visual signature of image is the 4096 of the full articulamentum output of convolutional neural networks the last layer Dimensional vector;
Specific formula for calculation are as follows:
b0=CNN (I);
Wherein I is the image in high score remote sensing images-text descriptive data base, b0For the visual signature of image;
Step 3): vocabulary text is created according to the word in all texts of high score remote sensing images-text descriptive data base Library;
Each word is indicated by a vector in vocabulary text library, and START vector sum END vector, which is added, indicates sentence Starting and termination;
Step 4): referring to Fig. 3, the training multi-modal neural network of depth;In the multi-modal neural network of depth institute sometimes Carve corresponding input layer, a hidden layer and an output layer;
Step 4.1) is at the t=1 moment, by the visual signature b of image in step 2)0And the START vector in step 3) It is input to the hidden layer of the multi-modal neural network of t=1 moment depth, the hidden layer for obtaining the depth multi-modal neural network t=1 moment is defeated H out1
h1=g (λ1w1+b0);
Wherein, w1For the START vector of input;
G is nonlinear function;
λ1, λ2For network weight parameter to be trained;
The visual signature b of image0It is extracted by AlexNet or VGGNet or GoogLeNet;
The hidden layer at t=1 moment is then exported h by step 4.2)1, it is input to the depth multi-modal neural network t=1 moment Then output layer calculates the probability distribution of t=1 moment all words by Softmax function;
p(w2)=softmax (λ3h1);
Wherein, λ3For network weight parameter to be trained;
Step 4.3) chooses the highest word w of t=1 moment probability distribution2, prediction word as the t=1 moment;
Step 4.4) at the moment of t > 1, by last moment predict in word correspond to word vector and last moment The output of network hidden layer, while it being input to the hidden layer of the multi-modal neural network of current time depth, obtain the multi-modal nerve net of depth The hidden layer at network current time exports;
ht=g (λ1wt2ht-1);
Wherein, htIt is exported for the hidden layer at depth multi-modal neural network current time;
wtFor the input word vector at current time;
The hidden layer at current time is input to the output layer at depth multi-modal neural network current time by step 4.5), The probability distribution of current time all words is calculated by Softmax function;
p(wt+1)=softmax (λ3ht);
Step 4.6) chooses the highest word of probability distribution at current time, the prediction word as current time;
Step 4.7) repeats step 4.4) to step 4.6) until prediction word vector is END vector;
Step 4.8), to summing, it is total to obtain the multi-modal neural network of optimal depth to all training image texts Bulk diffusion function;
Step 5): input high score remote sensing images generate high score remote sensing figure using the multi-modal neural network of trained depth As corresponding text describes.
Following emulation testing is described further the effect of this method.
1, simulated conditions
It is Intel (R) Xeon E5-2697,2.60GHZ, memory that the present embodiment emulation testing, which is in central processing unit, On 128G, (SuSE) Linux OS, carried out with MATLAB software and PyCharm software.
The emulation experiment data of this experiment be US Geological Survey (the U.S.Geological Survey, USGS the base for the Sydney database that the UCM database) provided and Mapping remote sensing technology National Key Laboratory, Wuhan University announce On plinth, the description of five texts is manually marked for every remote sensing image, thus obtain final UCM-captions database and Sydney-captions database is as experimental data base.
2, emulation content
The semantic understanding of remote sensing images is carried out with the method for the present invention as follows:
Firstly, using BLEU-n to text generation result, tri- scores of METEOR, CIDEr are evaluated.
Wherein BLEU-n score is that occurred in referenced text according to the n tuple for generating continuous n word composition in text Number is that (B-1, B-2, B-3, B-4 are an evaluation index similar to accuracy rate divided by the number of n tuple in referenced text The abbreviation of BLEU-n);
METEOR score is the reconciliation for both calculating the accurate rate and recall rate that generate n tuple in text simultaneously, and asking Average is final score;
CIDEr score considers to generate each vocabulary considering to generate in text accurate rate and on the basis of recall rate Significance level.The emphasis of three kinds of score evaluations is different, combines and can be well reflected the superiority and inferiority degree for generating text.
Then, final generation text is being obtained with the step of the present embodiment on UCM-captions database, and used BLEU-n, METEOR, CIDEr index are assessed.
Table 1 is text generation effect on UCM-captions database
Subsequently, it is tested on Sydney-captions database using identical experimental procedure, experimental result is such as Shown in table 2:
Table 2 is text generation effect on Sydney-captions database
It can be seen that method of the invention from the result of Tables 1 and 2 and achieve relatively good effect, generate the matter of text It measures also relatively high.And by obtained index, we are can be found that: raw with the raising of the ability in feature extraction of CNN network At text effect it is better;LSTM effect will generally be better than RNN;Wherein the combination of network of VGG-19+LSTM obtains most preferably Effect.
The result of partial visual is as shown in Figure 4, it can be seen that the text that our method generates largely all compares conjunction Reason, it might even be possible to tell the number (such as storage tanks) of target in image.

Claims (5)

1. a kind of high score remote sensing images semantic understanding method based on multi-modal neural network, which is characterized in that including following step It is rapid:
1) high score remote sensing images-text descriptive data base is constructed;
High score remote sensing images-text the descriptive data base includes several high score remote sensing images and corresponding every high score remote sensing The a plurality of text descriptive statement of image;
2) view of all images in high score remote sensing images-text descriptive data base is extracted using the good convolutional neural networks of pre-training Feel feature;
Specific formula for calculation are as follows:
b0=CNN (I);
Wherein I is the image in high score remote sensing images-text descriptive data base, b0For the visual signature of image;
3) vocabulary text is created according to the word in all text descriptive statements of high score remote sensing images-text descriptive data base Library;
Each word is indicated by a vector in the vocabulary text library, and START vector sum END vector, which is added, indicates sentence Starting and termination;
4) the training multi-modal neural network of depth;The corresponding input of all moment in the multi-modal neural network of depth Layer, a hidden layer and an output layer;
4.1) at the t=1 moment, by the visual signature b of image in step 2)0And the START vector in step 3) is input to t=1 The hidden layer of the multi-modal neural network of moment depth obtains the hidden layer output h at depth multi-modal neural network t=1 moment1
h1=g (λ1w1+b0);
Wherein, w1For the START vector of input;
G is nonlinear function;
λ1, λ2For network weight parameter to be trained;
4.2) hidden layer at t=1 moment is then exported into h1, it is input to the output layer at depth multi-modal neural network t=1 moment, so The probability distribution of t=1 moment all words is calculated by Softmax function afterwards;
p(w2)=softmax (λ3h1);
Wherein, λ3For network weight parameter to be trained;
4.3) the highest word w of t=1 moment probability distribution is chosen2, prediction word as the t=1 moment;
4.4) at the moment of t > 1, by last moment predict in word correspond to word vector and last moment network hidden layer Output, while being input to the hidden layer of the multi-modal neural network of current time depth, obtain the multi-modal neural network of depth it is current when The hidden layer at quarter exports;
ht=g (λ1wt2ht-1);
Wherein, htIt is exported for the hidden layer at depth multi-modal neural network current time;
wtFor the input word vector at current time;
4.5) hidden layer at current time is input to the output layer at depth multi-modal neural network current time, is passed through The probability distribution of Softmax function calculating current time all words;
p(wt+1)=softmax (λ3ht);
4.6) the highest word of probability distribution at current time, the prediction word as current time are chosen;
4.7) step 4.4) is repeated to step 4.6) until prediction word vector is END vector;
4.8) to all training image texts to summing, the multi-modal neural network overall loss letter of optimal depth is obtained Number;
5) high score remote sensing images are inputted, generate the corresponding text of high score remote sensing images using the multi-modal neural network of trained depth This description.
2. the high score remote sensing images semantic understanding method according to claim 1 based on multi-modal neural network, feature Be: in the step 4.1), nonlinear function g is RNN or LSTM, the visual signature b of image0By AlexNet or VGGNet Or GoogLeNet is extracted.
3. the high score remote sensing images semantic understanding method according to claim 1 based on multi-modal neural network, feature Be: the nonlinear function g is LSTM, the visual signature b of image0It is extracted by VGGNet.
4. the high score remote sensing images semantic understanding method according to claim 1 based on multi-modal neural network, feature Be: the content of a plurality of text descriptive statement includes target or each pixel of acquisition image or whole image in the step 1) Mutual relationship between class label, the attribute of target, feature and target.
5. the high score remote sensing images semantic understanding method according to claim 1 based on multi-modal neural network, feature Be: the visual signature of image described in the step 2) is 4096 dimensions of the full articulamentum output of convolutional neural networks the last layer Vector.
CN201910406998.8A 2019-05-16 2019-05-16 A kind of high score remote sensing images semantic understanding method based on multi-modal neural network Pending CN110287354A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910406998.8A CN110287354A (en) 2019-05-16 2019-05-16 A kind of high score remote sensing images semantic understanding method based on multi-modal neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910406998.8A CN110287354A (en) 2019-05-16 2019-05-16 A kind of high score remote sensing images semantic understanding method based on multi-modal neural network

Publications (1)

Publication Number Publication Date
CN110287354A true CN110287354A (en) 2019-09-27

Family

ID=68002116

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910406998.8A Pending CN110287354A (en) 2019-05-16 2019-05-16 A kind of high score remote sensing images semantic understanding method based on multi-modal neural network

Country Status (1)

Country Link
CN (1) CN110287354A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110929640A (en) * 2019-11-20 2020-03-27 西安电子科技大学 Wide remote sensing description generation method based on target detection
CN110991284A (en) * 2019-11-22 2020-04-10 北京航空航天大学 Optical remote sensing image statement description generation method based on scene pre-classification
CN111445018A (en) * 2020-03-27 2020-07-24 国网甘肃省电力公司电力科学研究院 Ultraviolet imaging real-time information processing method based on accelerated convolutional neural network algorithm
CN111582241A (en) * 2020-06-01 2020-08-25 腾讯科技(深圳)有限公司 Video subtitle recognition method, device, equipment and storage medium
CN112949732A (en) * 2021-03-12 2021-06-11 中国人民解放军海军航空大学 Semantic annotation method and system based on self-adaptive multi-mode remote sensing image fusion
CN113298151A (en) * 2021-05-26 2021-08-24 中国电子科技集团公司第五十四研究所 Remote sensing image semantic description method based on multi-level feature fusion
CN113989297A (en) * 2021-09-23 2022-01-28 杭州电子科技大学 Method for segmenting tumor region by multi-modal eyelid tumor data fusion

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106650756A (en) * 2016-12-28 2017-05-10 广东顺德中山大学卡内基梅隆大学国际联合研究院 Image text description method based on knowledge transfer multi-modal recurrent neural network
CN107391609A (en) * 2017-07-01 2017-11-24 南京理工大学 A kind of Image Description Methods of two-way multi-modal Recursive Networks
US20180089541A1 (en) * 2016-09-27 2018-03-29 Facebook, Inc. Training Image-Recognition Systems Using a Joint Embedding Model on Online Social Networks

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180089541A1 (en) * 2016-09-27 2018-03-29 Facebook, Inc. Training Image-Recognition Systems Using a Joint Embedding Model on Online Social Networks
CN106650756A (en) * 2016-12-28 2017-05-10 广东顺德中山大学卡内基梅隆大学国际联合研究院 Image text description method based on knowledge transfer multi-modal recurrent neural network
CN107391609A (en) * 2017-07-01 2017-11-24 南京理工大学 A kind of Image Description Methods of two-way multi-modal Recursive Networks

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
BO QU 等: ""Deep semantic understanding of high resolution remote sensing image"", 《2016 INTERNATIONAL CONFERENCE ON COMPUTER, INFORMATION AND TELECOMMUNICATION SYSTEMS (CITS)》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110929640A (en) * 2019-11-20 2020-03-27 西安电子科技大学 Wide remote sensing description generation method based on target detection
CN110929640B (en) * 2019-11-20 2023-04-07 西安电子科技大学 Wide remote sensing description generation method based on target detection
CN110991284A (en) * 2019-11-22 2020-04-10 北京航空航天大学 Optical remote sensing image statement description generation method based on scene pre-classification
CN110991284B (en) * 2019-11-22 2022-10-18 北京航空航天大学 Optical remote sensing image statement description generation method based on scene pre-classification
CN111445018A (en) * 2020-03-27 2020-07-24 国网甘肃省电力公司电力科学研究院 Ultraviolet imaging real-time information processing method based on accelerated convolutional neural network algorithm
CN111445018B (en) * 2020-03-27 2023-11-14 国网甘肃省电力公司电力科学研究院 Ultraviolet imaging real-time information processing method based on accelerating convolutional neural network algorithm
CN111582241A (en) * 2020-06-01 2020-08-25 腾讯科技(深圳)有限公司 Video subtitle recognition method, device, equipment and storage medium
CN112949732A (en) * 2021-03-12 2021-06-11 中国人民解放军海军航空大学 Semantic annotation method and system based on self-adaptive multi-mode remote sensing image fusion
CN112949732B (en) * 2021-03-12 2022-04-22 中国人民解放军海军航空大学 Semantic annotation method and system based on self-adaptive multi-mode remote sensing image fusion
CN113298151A (en) * 2021-05-26 2021-08-24 中国电子科技集团公司第五十四研究所 Remote sensing image semantic description method based on multi-level feature fusion
CN113989297A (en) * 2021-09-23 2022-01-28 杭州电子科技大学 Method for segmenting tumor region by multi-modal eyelid tumor data fusion

Similar Documents

Publication Publication Date Title
CN110287354A (en) A kind of high score remote sensing images semantic understanding method based on multi-modal neural network
CN111476294B (en) Zero sample image identification method and system based on generation countermeasure network
Cheng et al. Perturbation-seeking generative adversarial networks: A defense framework for remote sensing image scene classification
Yuan et al. Exploring a fine-grained multiscale method for cross-modal remote sensing image retrieval
Qiu et al. Geometric back-projection network for point cloud classification
Li et al. Superpixel-based reweighted low-rank and total variation sparse unmixing for hyperspectral remote sensing imagery
CN106909924B (en) Remote sensing image rapid retrieval method based on depth significance
Zhang et al. A GANs-based deep learning framework for automatic subsurface object recognition from ground penetrating radar data
CN108960330A (en) Remote sensing images semanteme generation method based on fast area convolutional neural networks
Wu et al. Scene attention mechanism for remote sensing image caption generation
CN112579816B (en) Remote sensing image retrieval method and device, electronic equipment and storage medium
CN105989336A (en) Scene recognition method based on deconvolution deep network learning with weight
Xu et al. Txt2Img-MHN: Remote sensing image generation from text using modern Hopfield networks
Bragilevsky et al. Deep learning for Amazon satellite image analysis
Wang et al. Boosting lightweight CNNs through network pruning and knowledge distillation for SAR target recognition
Xiu et al. 3D semantic segmentation for high-resolution aerial survey derived point clouds using deep learning
CN112182275A (en) Trademark approximate retrieval system and method based on multi-dimensional feature fusion
CN109766752A (en) A kind of object matching and localization method and system, computer based on deep learning
CN105046286B (en) L is generated and combined based on automatic view1,2The supervision multiple view feature selection approach of norm minimum
Tu et al. Detection of damaged rooftop areas from high-resolution aerial images based on visual bag-of-words model
CN114332288A (en) Method for generating text generation image of confrontation network based on phrase driving and network
Chen et al. Class-aware domain adaptation for coastal land cover mapping using optical remote sensing imagery
CN112766381B (en) Attribute-guided SAR image generation method under limited sample
CN108985385A (en) Based on the quick Weakly supervised object detection method for generating confrontation study
Tan et al. Review of Zero-Shot Remote Sensing Image Scene Classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190927

RJ01 Rejection of invention patent application after publication