CN110287354A - A kind of high score remote sensing images semantic understanding method based on multi-modal neural network - Google Patents
A kind of high score remote sensing images semantic understanding method based on multi-modal neural network Download PDFInfo
- Publication number
- CN110287354A CN110287354A CN201910406998.8A CN201910406998A CN110287354A CN 110287354 A CN110287354 A CN 110287354A CN 201910406998 A CN201910406998 A CN 201910406998A CN 110287354 A CN110287354 A CN 110287354A
- Authority
- CN
- China
- Prior art keywords
- remote sensing
- high score
- neural network
- sensing images
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 53
- 238000000034 method Methods 0.000 title claims abstract description 30
- 230000000007 visual effect Effects 0.000 claims abstract description 20
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 18
- 238000012549 training Methods 0.000 claims abstract description 17
- 239000013598 vector Substances 0.000 claims description 32
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000004458 analytical method Methods 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 10
- 230000000694 effects Effects 0.000 description 8
- 238000012360 testing method Methods 0.000 description 5
- 230000000306 recurrent effect Effects 0.000 description 4
- 235000019987 cider Nutrition 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000003709 image segmentation Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 230000001537 neural effect Effects 0.000 description 2
- 238000003909 pattern recognition Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000001755 vocal effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000009792 diffusion process Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 210000004218 nerve net Anatomy 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/5866—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, manually generated location and time information
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
- G06V20/13—Satellite images
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Library & Information Science (AREA)
- Astronomy & Astrophysics (AREA)
- Remote Sensing (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The high score remote sensing images semantic understanding method based on multi-modal neural network that the invention discloses a kind of, mainly solves the problems, such as that the analysis of current high score remote sensing images does not understand remote sensing images from high-level semantic.Implementation step is: 1) constructing high score remote sensing images-text descriptive data base;2) visual signature of all images in high score remote sensing images-text descriptive data base is extracted using the good convolutional neural networks of pre-training;3) vocabulary text library is created according to the word in all text descriptive statements of high score remote sensing images-text descriptive data base;4) the training multi-modal neural network of depth;High score remote sensing images are inputted, the corresponding text of high score remote sensing images is generated using the multi-modal neural network of trained depth and describes.
Description
Technical field
The invention belongs to technical field of information processing, in particular to a kind of image understanding technology can be used for disaster monitoring, army
Thing is scouted and geographical national conditions are reconnoitred etc..
Background technique
With the development of China's Aerospace Technology, more and more high score satellites are launched into space, and remote sensing images obtain
Taking becomes to be more easier, and the resolution ratio of remote sensing images is also being continuously improved, it includes effective information also become increasingly
It is abundant.Therefore, Hi-spatial resolution remote sensing image roading, military surveillance, in terms of will play it is huge
Effect.In order to rationally utilize these data, the visual information and semantic information of image, the language of high score remote sensing images are fully considered
Reason and good sense solution is a very important research direction.
However, the research work of high spatial resolution (hereinafter referred to as high score) remote sensing images is concentrated mainly on following four at present
A aspect:
(1) target detection: some interested targets in high score picture, such as aircraft, oil storage tank are detected automatically;
(2) image classification: by the texture and spatial information of types of ground objects various in analysis picture, by each of picture
Pixel is assigned in different classifications;
(3) image segmentation: by high score image segmentation at semantic continuous (have identical property, indicate that differently species are other)
Region;
(4) scene classification: the scene that every high score picture of identification is included, such as airport, harbour obtain every picture
Scene type label.
These above-mentioned work can only often detect in image comprising certain target or acquisition each pixel or whole of image
The class label of a image detailed can not point out relationship mutual between the attribute of target, feature and target in picture,
Thus target completely is not understood in semantic level.In addition, piece image visual information can rough reflection picture
Main contents, and the information for the description image that corresponding text information then can be more careful.
Natural image is concentrated mainly in conjunction with the work that the image, semantic of image vision information and text information understands at present
Field can probably be divided into following a few classes:
First is that based on picture-text embedded structure method, the convolutional neural networks that this method uses pre-training good first
The feature vector of model extraction image is then mapped the corresponding description text of picture by the good language model of a pre-training
To in feature space identical with characteristics of image, image and text are then found by calculating the similitude of both feature vectors
Relationship between this is finally that new test picture generates verbal description by the mapping relations learnt.Specific method referring to
With reference to text " R Kiros, R Salakhutdinov, and S Zemel, Unifying Visual-Semantic
Embeddings with Multimodal Neural Language Models,arXiv preprint arXiv:
1411.2539,2014”。
Second is that the method based on target detection, this method is correlation between each target and description target first
A detector is respectively trained out in word, then picture is detected using these trained detectors, to obtain several
Then these words are formed several sentences by a language model, finally to obtained these sentences and picture by word
Similarity score sequence, the highest sentence of selected and sorted is as final result.Specific method is referring to bibliography " H Fang, S
Gupta,F Iandola,et al.,From Captions to Visual Concepts and Back,in Proc.IEEE
Conference on Computer Vision and Pattern Recognition,pages:1473-1482,2015”。
Third is that the method based on deep learning, this method is first in the training stage, using depth convolutional neural networks
(Convolutional Neural Network, CNN) extracts picture feature, then that picture feature and text information is common
It is input to training network in Recognition with Recurrent Neural Network (Recurrent Neural Network, RNN).Finally, will be surveyed in test
Attempt piece to input in trained deep neural network, to generate the verbal description of the picture.Specific method is detailed in bibliography
“O Vinyal,A Toshev,S Bengio,et al..,Show and Tell:A Neural Image Caption
Generator,in Proc.IEEE Conference on Computer Vision and Pattern Recognition,
pages:3156-3164,2015”。
Although these methods achieve some good achievements in terms of natural image understanding, in high score remote sensing fields,
The semantic understanding of high score remote sensing images remains at blank stage, and current method does not utilize the high-level semantic of image to believe
Breath, can not really understand high score remote sensing images.Therefore, how by the visual information of high score remote sensing images and text information knot
It altogether, is a significantly problem to realize the understanding to image on semantic level.
Summary of the invention
It is an object of the invention to be directed to the deficiency of above-mentioned existing method, a kind of height based on multi-modal neural network is proposed
Divide remote sensing images semantic understanding method, model of this method based on multi-modal neural network, while considering the vision of remote sensing images
And semantic information, the correlation in image between the attribute and target of target is paid close attention to, to reach right on high-level semantic
The understanding of remote sensing images.
Realize that the object of the invention technical principle is as follows:
(1) text marking is carried out to high score remote sensing images image data base first, every high partial image is labeled 5 and retouches
The text for stating its content forms completely new high score remote sensing images-text marking (image-captions) database;
(2) remote sensing images are then extracted by convolutional neural networks (Convolutional Neural Network, CNN)
Feature;
(3) Recognition with Recurrent Neural Network will be input to together with the characteristics of image extracted text information corresponding with every image
In (Recurrent Neural Network, RNN), training network parameter obtains our multi-modal neural network mould of depth
Type;
(4) the corresponding text of high score remote sensing images is generated using the multi-modal neural network of trained depth to describe.
Realize that the specific technical solution of the object of the invention is as follows:
The high score remote sensing images semantic understanding method based on multi-modal neural network that the present invention provides a kind of, including it is following
Step:
1) high score remote sensing images-text descriptive data base is constructed;
High score remote sensing images-text the descriptive data base includes several high score remote sensing images and corresponding every high score
The a plurality of text descriptive statement of remote sensing images;
2) all images in high score remote sensing images-text descriptive data base are extracted using the good convolutional neural networks of pre-training
Visual signature;
Specific formula for calculation are as follows:
b0=CNN (I);
Wherein I is the image data in high score remote sensing images-text descriptive data base, b0For the visual signature of image;
3) according to the word creation vocabulary text in all text descriptive statements of high score remote sensing images-text descriptive data base
This library;
Each word is indicated by a vector in the vocabulary text library, and START vector sum END vector, which is added, indicates sentence
The starting and termination of son;
4) the training multi-modal neural network of depth;In the multi-modal neural network of depth all moment corresponding one it is defeated
Enter layer, a hidden layer and an output layer;
4.1) at the t=1 moment, by the visual signature b of image in step 2)0And the START vector input in step 3)
To the hidden layer of the multi-modal neural network of t=1 moment depth, the hidden layer output at depth multi-modal neural network t=1 moment is obtained
h1;
h1=g (λ1w1+b0);
Wherein, w1For the START vector of input;
G is nonlinear function;
λ1, λ2For network weight parameter to be trained;
4.2) hidden layer at t=1 moment is then exported into h1, it is input to the output at depth multi-modal neural network t=1 moment
Then layer calculates the probability distribution of t=1 moment all words by Softmax function;
p(w2)=softmax (λ3h1);
Wherein, λ3For network weight parameter to be trained;
4.3) the highest word w of t=1 moment probability distribution is chosen2, prediction word as the t=1 moment;
4.4) at the moment of t > 1, by last moment predict in word correspond to word vector and last moment network
Hidden layer output, while it being input to the hidden layer of the multi-modal neural network of current time depth, it obtains the multi-modal neural network of depth and works as
The hidden layer at preceding moment exports;
ht=g (λ1wt+λ2ht-1);
Wherein, htIt is exported for the hidden layer at depth multi-modal neural network current time;
wtFor the input word vector at current time;
4.5) hidden layer at current time is input to the output layer at depth multi-modal neural network current time, is passed through
The probability distribution of Softmax function calculating current time all words;
p(wt+1)=softmax (λ3ht);
4.6) the highest word of probability distribution at current time, the prediction word as current time are chosen;
4.7) step 4.4) is repeated to step 4.6) until prediction word vector is END vector;
4.8) it to all training image texts to summing, obtains the multi-modal neural network of optimal depth and totally damages
Lose function;
5) high score remote sensing images are inputted, it is corresponding to generate high score remote sensing images using the multi-modal neural network of trained depth
Text description.
Further, above-mentioned steps 4.1) in, nonlinear function g is RNN or LSTM, characteristics of image b0By AlexNet or
VGGNet or GoogLeNet is extracted.
Further, above-mentioned nonlinear function g is LSTM, characteristics of image b0It is extracted by VGGNet.
Further, above-mentioned steps 1) in a plurality of text descriptive statement content include target or obtain each pixel of image
Or mutual relationship between the class label of whole image, the attribute of target, feature and target.
Further, above-mentioned steps 2) described in Image Visual Feature be the full articulamentum of convolutional neural networks the last layer it is defeated
4096 dimensional vectors out.
The beneficial effects of the present invention are:
The present invention compared with the conventional method, has fully considered in high score remote sensing images between the attribute and target of target
Correlation, while the vision and semantic information of high score remote sensing images is utilized, to realize in terms of high-level semantic to distant
The understanding for feeling image can be used in disaster monitoring, military surveillance and geographical national conditions prospecting etc..
Detailed description of the invention
Fig. 1 is overall flow figure of the invention;
Fig. 2 is the building of high score remote sensing images-text descriptive data base;
Fig. 3 is the multi-modal neural network structure of depth;
Fig. 4 is to adopt the high score remote sensing images text generation result figure being obtained by the present invention.
Specific embodiment
With reference to the accompanying drawing, to the realization step and contrast test verification process of semantic understanding method provided by the invention
It is further described:
Referring to Fig.1, realize that step of the invention is as follows:
Step 1): building high score remote sensing images-text describes (image-captions) database;
High score remote sensing images-text descriptive data base includes several high score remote sensing images and corresponding every high score remote sensing
The a plurality of text descriptive statement of image;
In the present embodiment on the basis of existing high score Remote Sensing Database Sydney database and UCM database, building
UCM-captions database and Sydney-captions database are as high score remote sensing images-text description (image-
Captions) database;
Wherein Sydney-captions database includes the high score remote sensing images that 613 resolution ratio are 0.3 meter/pixel,
The corresponding 5 word texts description of every image, amounts to 3065 texts;UCM-captions database includes that 2100 resolution ratio are
The high score remote sensing images of 0.5 meter/pixel, the corresponding 5 word texts description of every image, amount to 10500 texts (reference Fig. 2) this
Two high score remote sensing images-text database foundation, be subsequent high score remote sensing images semantic understanding model training and
The solution of semantic understanding problem lays the foundation.
It should be understood that in the present embodiment in high score remote sensing images-text description (image-captions) database
Every image be all made of the text descriptions of 5 words, but be not limited only to 5 words, only needing the content of text description includes target
Or obtain relationship mutual between each pixel of image or the class label of whole image, the attribute of target, feature and target
?.
Step 2): institute in high score remote sensing images-text descriptive data base is extracted using the good convolutional neural networks of pre-training
There is the visual signature of image;Wherein, the visual signature of image is the 4096 of the full articulamentum output of convolutional neural networks the last layer
Dimensional vector;
Specific formula for calculation are as follows:
b0=CNN (I);
Wherein I is the image in high score remote sensing images-text descriptive data base, b0For the visual signature of image;
Step 3): vocabulary text is created according to the word in all texts of high score remote sensing images-text descriptive data base
Library;
Each word is indicated by a vector in vocabulary text library, and START vector sum END vector, which is added, indicates sentence
Starting and termination;
Step 4): referring to Fig. 3, the training multi-modal neural network of depth;In the multi-modal neural network of depth institute sometimes
Carve corresponding input layer, a hidden layer and an output layer;
Step 4.1) is at the t=1 moment, by the visual signature b of image in step 2)0And the START vector in step 3)
It is input to the hidden layer of the multi-modal neural network of t=1 moment depth, the hidden layer for obtaining the depth multi-modal neural network t=1 moment is defeated
H out1;
h1=g (λ1w1+b0);
Wherein, w1For the START vector of input;
G is nonlinear function;
λ1, λ2For network weight parameter to be trained;
The visual signature b of image0It is extracted by AlexNet or VGGNet or GoogLeNet;
The hidden layer at t=1 moment is then exported h by step 4.2)1, it is input to the depth multi-modal neural network t=1 moment
Then output layer calculates the probability distribution of t=1 moment all words by Softmax function;
p(w2)=softmax (λ3h1);
Wherein, λ3For network weight parameter to be trained;
Step 4.3) chooses the highest word w of t=1 moment probability distribution2, prediction word as the t=1 moment;
Step 4.4) at the moment of t > 1, by last moment predict in word correspond to word vector and last moment
The output of network hidden layer, while it being input to the hidden layer of the multi-modal neural network of current time depth, obtain the multi-modal nerve net of depth
The hidden layer at network current time exports;
ht=g (λ1wt+λ2ht-1);
Wherein, htIt is exported for the hidden layer at depth multi-modal neural network current time;
wtFor the input word vector at current time;
The hidden layer at current time is input to the output layer at depth multi-modal neural network current time by step 4.5),
The probability distribution of current time all words is calculated by Softmax function;
p(wt+1)=softmax (λ3ht);
Step 4.6) chooses the highest word of probability distribution at current time, the prediction word as current time;
Step 4.7) repeats step 4.4) to step 4.6) until prediction word vector is END vector;
Step 4.8), to summing, it is total to obtain the multi-modal neural network of optimal depth to all training image texts
Bulk diffusion function;
Step 5): input high score remote sensing images generate high score remote sensing figure using the multi-modal neural network of trained depth
As corresponding text describes.
Following emulation testing is described further the effect of this method.
1, simulated conditions
It is Intel (R) Xeon E5-2697,2.60GHZ, memory that the present embodiment emulation testing, which is in central processing unit,
On 128G, (SuSE) Linux OS, carried out with MATLAB software and PyCharm software.
The emulation experiment data of this experiment be US Geological Survey (the U.S.Geological Survey,
USGS the base for the Sydney database that the UCM database) provided and Mapping remote sensing technology National Key Laboratory, Wuhan University announce
On plinth, the description of five texts is manually marked for every remote sensing image, thus obtain final UCM-captions database and
Sydney-captions database is as experimental data base.
2, emulation content
The semantic understanding of remote sensing images is carried out with the method for the present invention as follows:
Firstly, using BLEU-n to text generation result, tri- scores of METEOR, CIDEr are evaluated.
Wherein BLEU-n score is that occurred in referenced text according to the n tuple for generating continuous n word composition in text
Number is that (B-1, B-2, B-3, B-4 are an evaluation index similar to accuracy rate divided by the number of n tuple in referenced text
The abbreviation of BLEU-n);
METEOR score is the reconciliation for both calculating the accurate rate and recall rate that generate n tuple in text simultaneously, and asking
Average is final score;
CIDEr score considers to generate each vocabulary considering to generate in text accurate rate and on the basis of recall rate
Significance level.The emphasis of three kinds of score evaluations is different, combines and can be well reflected the superiority and inferiority degree for generating text.
Then, final generation text is being obtained with the step of the present embodiment on UCM-captions database, and used
BLEU-n, METEOR, CIDEr index are assessed.
Table 1 is text generation effect on UCM-captions database
Subsequently, it is tested on Sydney-captions database using identical experimental procedure, experimental result is such as
Shown in table 2:
Table 2 is text generation effect on Sydney-captions database
It can be seen that method of the invention from the result of Tables 1 and 2 and achieve relatively good effect, generate the matter of text
It measures also relatively high.And by obtained index, we are can be found that: raw with the raising of the ability in feature extraction of CNN network
At text effect it is better;LSTM effect will generally be better than RNN;Wherein the combination of network of VGG-19+LSTM obtains most preferably
Effect.
The result of partial visual is as shown in Figure 4, it can be seen that the text that our method generates largely all compares conjunction
Reason, it might even be possible to tell the number (such as storage tanks) of target in image.
Claims (5)
1. a kind of high score remote sensing images semantic understanding method based on multi-modal neural network, which is characterized in that including following step
It is rapid:
1) high score remote sensing images-text descriptive data base is constructed;
High score remote sensing images-text the descriptive data base includes several high score remote sensing images and corresponding every high score remote sensing
The a plurality of text descriptive statement of image;
2) view of all images in high score remote sensing images-text descriptive data base is extracted using the good convolutional neural networks of pre-training
Feel feature;
Specific formula for calculation are as follows:
b0=CNN (I);
Wherein I is the image in high score remote sensing images-text descriptive data base, b0For the visual signature of image;
3) vocabulary text is created according to the word in all text descriptive statements of high score remote sensing images-text descriptive data base
Library;
Each word is indicated by a vector in the vocabulary text library, and START vector sum END vector, which is added, indicates sentence
Starting and termination;
4) the training multi-modal neural network of depth;The corresponding input of all moment in the multi-modal neural network of depth
Layer, a hidden layer and an output layer;
4.1) at the t=1 moment, by the visual signature b of image in step 2)0And the START vector in step 3) is input to t=1
The hidden layer of the multi-modal neural network of moment depth obtains the hidden layer output h at depth multi-modal neural network t=1 moment1;
h1=g (λ1w1+b0);
Wherein, w1For the START vector of input;
G is nonlinear function;
λ1, λ2For network weight parameter to be trained;
4.2) hidden layer at t=1 moment is then exported into h1, it is input to the output layer at depth multi-modal neural network t=1 moment, so
The probability distribution of t=1 moment all words is calculated by Softmax function afterwards;
p(w2)=softmax (λ3h1);
Wherein, λ3For network weight parameter to be trained;
4.3) the highest word w of t=1 moment probability distribution is chosen2, prediction word as the t=1 moment;
4.4) at the moment of t > 1, by last moment predict in word correspond to word vector and last moment network hidden layer
Output, while being input to the hidden layer of the multi-modal neural network of current time depth, obtain the multi-modal neural network of depth it is current when
The hidden layer at quarter exports;
ht=g (λ1wt+λ2ht-1);
Wherein, htIt is exported for the hidden layer at depth multi-modal neural network current time;
wtFor the input word vector at current time;
4.5) hidden layer at current time is input to the output layer at depth multi-modal neural network current time, is passed through
The probability distribution of Softmax function calculating current time all words;
p(wt+1)=softmax (λ3ht);
4.6) the highest word of probability distribution at current time, the prediction word as current time are chosen;
4.7) step 4.4) is repeated to step 4.6) until prediction word vector is END vector;
4.8) to all training image texts to summing, the multi-modal neural network overall loss letter of optimal depth is obtained
Number;
5) high score remote sensing images are inputted, generate the corresponding text of high score remote sensing images using the multi-modal neural network of trained depth
This description.
2. the high score remote sensing images semantic understanding method according to claim 1 based on multi-modal neural network, feature
Be: in the step 4.1), nonlinear function g is RNN or LSTM, the visual signature b of image0By AlexNet or VGGNet
Or GoogLeNet is extracted.
3. the high score remote sensing images semantic understanding method according to claim 1 based on multi-modal neural network, feature
Be: the nonlinear function g is LSTM, the visual signature b of image0It is extracted by VGGNet.
4. the high score remote sensing images semantic understanding method according to claim 1 based on multi-modal neural network, feature
Be: the content of a plurality of text descriptive statement includes target or each pixel of acquisition image or whole image in the step 1)
Mutual relationship between class label, the attribute of target, feature and target.
5. the high score remote sensing images semantic understanding method according to claim 1 based on multi-modal neural network, feature
Be: the visual signature of image described in the step 2) is 4096 dimensions of the full articulamentum output of convolutional neural networks the last layer
Vector.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910406998.8A CN110287354A (en) | 2019-05-16 | 2019-05-16 | A kind of high score remote sensing images semantic understanding method based on multi-modal neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910406998.8A CN110287354A (en) | 2019-05-16 | 2019-05-16 | A kind of high score remote sensing images semantic understanding method based on multi-modal neural network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110287354A true CN110287354A (en) | 2019-09-27 |
Family
ID=68002116
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910406998.8A Pending CN110287354A (en) | 2019-05-16 | 2019-05-16 | A kind of high score remote sensing images semantic understanding method based on multi-modal neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110287354A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110929640A (en) * | 2019-11-20 | 2020-03-27 | 西安电子科技大学 | Wide remote sensing description generation method based on target detection |
CN110991284A (en) * | 2019-11-22 | 2020-04-10 | 北京航空航天大学 | Optical remote sensing image statement description generation method based on scene pre-classification |
CN111445018A (en) * | 2020-03-27 | 2020-07-24 | 国网甘肃省电力公司电力科学研究院 | Ultraviolet imaging real-time information processing method based on accelerated convolutional neural network algorithm |
CN111582241A (en) * | 2020-06-01 | 2020-08-25 | 腾讯科技(深圳)有限公司 | Video subtitle recognition method, device, equipment and storage medium |
CN112949732A (en) * | 2021-03-12 | 2021-06-11 | 中国人民解放军海军航空大学 | Semantic annotation method and system based on self-adaptive multi-mode remote sensing image fusion |
CN113298151A (en) * | 2021-05-26 | 2021-08-24 | 中国电子科技集团公司第五十四研究所 | Remote sensing image semantic description method based on multi-level feature fusion |
CN113989297A (en) * | 2021-09-23 | 2022-01-28 | 杭州电子科技大学 | Method for segmenting tumor region by multi-modal eyelid tumor data fusion |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106650756A (en) * | 2016-12-28 | 2017-05-10 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | Image text description method based on knowledge transfer multi-modal recurrent neural network |
CN107391609A (en) * | 2017-07-01 | 2017-11-24 | 南京理工大学 | A kind of Image Description Methods of two-way multi-modal Recursive Networks |
US20180089541A1 (en) * | 2016-09-27 | 2018-03-29 | Facebook, Inc. | Training Image-Recognition Systems Using a Joint Embedding Model on Online Social Networks |
-
2019
- 2019-05-16 CN CN201910406998.8A patent/CN110287354A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180089541A1 (en) * | 2016-09-27 | 2018-03-29 | Facebook, Inc. | Training Image-Recognition Systems Using a Joint Embedding Model on Online Social Networks |
CN106650756A (en) * | 2016-12-28 | 2017-05-10 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | Image text description method based on knowledge transfer multi-modal recurrent neural network |
CN107391609A (en) * | 2017-07-01 | 2017-11-24 | 南京理工大学 | A kind of Image Description Methods of two-way multi-modal Recursive Networks |
Non-Patent Citations (1)
Title |
---|
BO QU 等: ""Deep semantic understanding of high resolution remote sensing image"", 《2016 INTERNATIONAL CONFERENCE ON COMPUTER, INFORMATION AND TELECOMMUNICATION SYSTEMS (CITS)》 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110929640A (en) * | 2019-11-20 | 2020-03-27 | 西安电子科技大学 | Wide remote sensing description generation method based on target detection |
CN110929640B (en) * | 2019-11-20 | 2023-04-07 | 西安电子科技大学 | Wide remote sensing description generation method based on target detection |
CN110991284A (en) * | 2019-11-22 | 2020-04-10 | 北京航空航天大学 | Optical remote sensing image statement description generation method based on scene pre-classification |
CN110991284B (en) * | 2019-11-22 | 2022-10-18 | 北京航空航天大学 | Optical remote sensing image statement description generation method based on scene pre-classification |
CN111445018A (en) * | 2020-03-27 | 2020-07-24 | 国网甘肃省电力公司电力科学研究院 | Ultraviolet imaging real-time information processing method based on accelerated convolutional neural network algorithm |
CN111445018B (en) * | 2020-03-27 | 2023-11-14 | 国网甘肃省电力公司电力科学研究院 | Ultraviolet imaging real-time information processing method based on accelerating convolutional neural network algorithm |
CN111582241A (en) * | 2020-06-01 | 2020-08-25 | 腾讯科技(深圳)有限公司 | Video subtitle recognition method, device, equipment and storage medium |
CN112949732A (en) * | 2021-03-12 | 2021-06-11 | 中国人民解放军海军航空大学 | Semantic annotation method and system based on self-adaptive multi-mode remote sensing image fusion |
CN112949732B (en) * | 2021-03-12 | 2022-04-22 | 中国人民解放军海军航空大学 | Semantic annotation method and system based on self-adaptive multi-mode remote sensing image fusion |
CN113298151A (en) * | 2021-05-26 | 2021-08-24 | 中国电子科技集团公司第五十四研究所 | Remote sensing image semantic description method based on multi-level feature fusion |
CN113989297A (en) * | 2021-09-23 | 2022-01-28 | 杭州电子科技大学 | Method for segmenting tumor region by multi-modal eyelid tumor data fusion |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110287354A (en) | A kind of high score remote sensing images semantic understanding method based on multi-modal neural network | |
CN111476294B (en) | Zero sample image identification method and system based on generation countermeasure network | |
Cheng et al. | Perturbation-seeking generative adversarial networks: A defense framework for remote sensing image scene classification | |
Yuan et al. | Exploring a fine-grained multiscale method for cross-modal remote sensing image retrieval | |
Qiu et al. | Geometric back-projection network for point cloud classification | |
Li et al. | Superpixel-based reweighted low-rank and total variation sparse unmixing for hyperspectral remote sensing imagery | |
CN106909924B (en) | Remote sensing image rapid retrieval method based on depth significance | |
Zhang et al. | A GANs-based deep learning framework for automatic subsurface object recognition from ground penetrating radar data | |
CN108960330A (en) | Remote sensing images semanteme generation method based on fast area convolutional neural networks | |
Wu et al. | Scene attention mechanism for remote sensing image caption generation | |
CN112579816B (en) | Remote sensing image retrieval method and device, electronic equipment and storage medium | |
CN105989336A (en) | Scene recognition method based on deconvolution deep network learning with weight | |
Xu et al. | Txt2Img-MHN: Remote sensing image generation from text using modern Hopfield networks | |
Bragilevsky et al. | Deep learning for Amazon satellite image analysis | |
Wang et al. | Boosting lightweight CNNs through network pruning and knowledge distillation for SAR target recognition | |
Xiu et al. | 3D semantic segmentation for high-resolution aerial survey derived point clouds using deep learning | |
CN112182275A (en) | Trademark approximate retrieval system and method based on multi-dimensional feature fusion | |
CN109766752A (en) | A kind of object matching and localization method and system, computer based on deep learning | |
CN105046286B (en) | L is generated and combined based on automatic view1,2The supervision multiple view feature selection approach of norm minimum | |
Tu et al. | Detection of damaged rooftop areas from high-resolution aerial images based on visual bag-of-words model | |
CN114332288A (en) | Method for generating text generation image of confrontation network based on phrase driving and network | |
Chen et al. | Class-aware domain adaptation for coastal land cover mapping using optical remote sensing imagery | |
CN112766381B (en) | Attribute-guided SAR image generation method under limited sample | |
CN108985385A (en) | Based on the quick Weakly supervised object detection method for generating confrontation study | |
Tan et al. | Review of Zero-Shot Remote Sensing Image Scene Classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190927 |
|
RJ01 | Rejection of invention patent application after publication |