CN115620150A - Multi-modal image ground building identification method and device based on twin transform - Google Patents
Multi-modal image ground building identification method and device based on twin transform Download PDFInfo
- Publication number
- CN115620150A CN115620150A CN202211545426.6A CN202211545426A CN115620150A CN 115620150 A CN115620150 A CN 115620150A CN 202211545426 A CN202211545426 A CN 202211545426A CN 115620150 A CN115620150 A CN 115620150A
- Authority
- CN
- China
- Prior art keywords
- image
- neural network
- twin
- twin neural
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
- G06V20/176—Urban or other man-made structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/761—Proximity, similarity or dissimilarity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10048—Infrared image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S10/00—Systems supporting electrical power generation, transmission or distribution
- Y04S10/50—Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
Abstract
The invention discloses a twin transform-based multi-modal image ground building identification method and device, and belongs to the technical field of ground building identification. The multi-modal image ground building identification method comprises the following steps: establishing a multi-twin neural network with N transform structures, wherein the multi-twin neural network is a pseudo-twin neural network; acquiring N target images in different modes; inputting the target image into the multi-twin neural network, and outputting a recognition result by the multi-twin neural network. The invention realizes the accurate recognition of the multi-platform multi-modal ground building image.
Description
Technical Field
The invention belongs to the technical field of ground building identification, and particularly relates to a twin transform-based multi-modal image ground building identification method and device.
Background
With the continuous advance of urbanization, the land occupation ratio of modern urban buildings is larger and larger, the types of urban buildings are richer and richer, and the buildings are communicated with each other; residential areas with different internal layouts, business areas with different heights of office buildings, low houses and industrial parks with wide occupied areas, and different difficulties are generated in ground building search by various building scenes.
From the perspective of a scout image source, visible light images, infrared images, SAR radar images and the like are mainly used for ground building target scout at present, and corresponding remote sensing satellites or unmanned aerial vehicles and other equipment can be used for capturing the scout image information. The visible light image mainly comprises color and texture information of the target, has higher resolution, has more details and light and shade contrast, and describes the target more specifically and is closer to the target information seen by human eyes. But visible light is greatly affected by illumination and weather conditions during imaging; the infrared image captures the thermal radiation of the target, has strong penetrating power and strong contour capturing performance on the target, but generally has lower resolution and poorer texture. The SAR image belongs to a radar image, has the characteristics of all weather, all time and no influence of weather, has higher imaging resolution and large width, can record information such as phase, amplitude, intensity and the like, and can obtain a clear high-resolution gray scale image through certain focusing processing and other modes.
The omnibearing sensing platform for the ground building comprises a space base platform, an empty base platform, a shore base platform, a sea base platform and the like, and senses information of the ground building, the environment, the geography and the like through various sensors. However, the targets shot by the platforms have large angle and size changes, which brings great difficulty to the identification of the ground buildings.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a twin transform-based multi-modal image ground building identification method and device.
The purpose of the invention is realized by the following technical scheme:
according to the first aspect of the invention, the twin transform-based multi-modal image ground building identification method comprises the following steps:
establishing a multi-twin neural network with N transform structures, wherein the multi-twin neural network is a pseudo-twin neural network;
acquiring N target images in different modes;
inputting the target image into the multi-twin neural network, and outputting a recognition result by the multi-twin neural network;
the multi-twin neural network comprises a plurality of neural network units, the neural network units comprise an image preprocessing network, a position and image data coding network, an encoder network and a full connection layer, and the encoder network comprises L encoders connected in series;
the image preprocessing network is used for converting an input target image into a normalized image feature map;
the position and image data encoding network is used for converting the image feature map into a feature vector containing position and image data;
the encoder network is used for completing the extraction of the feature vector;
and the full connection layer is used for completing the mapping of the characteristic vector output by the encoder network to the target class and outputting the class probability of the target.
Further, the target image is an infrared image, a visible light image, a SAR radar image, a multispectral image or a laser radar image.
Further, establishing a multi-twin neural network with N transform structures, comprising:
acquiring a plurality of source images to form a data set, and labeling the data set to form training data of a multi-twin neural network;
establishing a joint loss function of the multi-twin neural network;
and training the multi-twin neural network by using the joint loss function to obtain parameters of the multi-twin neural network.
According to a second aspect of the present invention, a twin Transformer based multi-modal image ground building recognition apparatus comprises:
the model building module is used for building a multi-twin neural network with N transform structures, and the multi-twin neural network is a pseudo-twin neural network;
the image acquisition module is used for acquiring N target images in different modes;
and the target recognition module is used for inputting the target image into the multi-twin neural network to obtain a recognition result output by the multi-twin neural network.
The invention has the beneficial effects that:
(1) The method utilizes a transducer attention mechanism to realize the extraction of global effective information in a scene and the focusing attention of local feature points, then uses a multi-twin neural network to carry out feature extraction and similarity calculation on target images of multiple modes and multiple visual angles, realizes the relevance synthesis of different information sources of the same target scene, completes the integral modeling expression of the target scene, and realizes the accurate identification of multi-platform multi-mode ground building images;
(2) According to the method, a more typical pseudo-twin neural network is adopted, and a neural network model with consistent expression among different modal data is constructed by designing a loss function of the pseudo-twin neural network, so that the problem of matching of targets among different modal images is solved.
Drawings
FIG. 1 is a flow diagram of one embodiment of a method for multi-modal image ground structure identification in accordance with the present invention;
FIG. 2 is a schematic diagram of a pseudo-twin neural network;
FIG. 3 is a diagram of a transform as an encoder-decoder architecture;
FIG. 4 is a schematic diagram of a multi-twin neural network training process;
FIG. 5 is a schematic diagram of an image pre-processing network;
FIG. 6 is a schematic diagram of a plurality of modules obtained by dividing an image feature map by a location and image data encoding network;
FIG. 7 is a schematic diagram of an encoder;
FIG. 8 is a schematic diagram of a process of inputting a target image of a multi-twin neural network;
fig. 9 is a block diagram of the multi-modal image ground structure recognition apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the following embodiments, and it should be apparent that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.
Referring to fig. 1 to 4, the present invention provides a twin transform-based multi-modal image ground building recognition method and apparatus:
in a first aspect of the present invention, a twin Transformer-based multimodal image ground structure recognition method is provided, as shown in fig. 1, the multimodal image ground structure recognition method includes steps S100 to S300, which are described in detail below.
S100, establishing a multi-twin neural network with N transform structures, wherein the multi-twin neural network is a pseudo-twin neural network.
The neural network structures employed in each input image branch of the pseudo-twin neural network are different or the parameters are not shared, as shown in fig. 2. In the embodiment, a pseudo-twin neural network is adopted, and a neural network model with consistent expression among different modal data is constructed by designing a loss function of the pseudo-twin network, so that the problem of matching targets among different modal images is solved.
In this embodiment, by comparing the distance expression modes among the euclidean distance, the cosine distance, the exponential distance, and the like, a distance measurement method with the minimum distance of the same class and the maximum distance of the different class is selected as the distance measurement method of the multi-twin neural network. The twin network inputs a plurality of vectors passing through the deep neural network, and in a new vector space, the purpose can be achieved as long as the distance between the vectors can be judged, and the distance of the same type is smaller and the distance of the different type is larger.
When the Transformer is used as an encoder-decoder, it is based on the attention mechanism completely, without any convolution layer or recurrent neural network layer, and its overall structure is shown in fig. 3. The embedded representations of the input (source) sequence and the output (target) sequence, plus position coding, are input to an encoder and decoder, respectively. Different from the convolution operation which only models the relation between the neighborhood pixels, the Transformer is the global operation which can model the relation between all the pixels, has stronger modeling capability, can better extract the global characteristics of a scene by using the model, and highlights the relation between local parts and the whole. In the embodiment, a transform is used instead of a convolution operation, so that better scene feature extraction and subsequent task requirements are realized. In this embodiment, the Transformer and the twin network are combined to form a multi-twin neural network with N Transformer structures, and a consistency expression model of the extended target under the conditions of cross-modal, multi-view and different scale changes is established.
In some embodiments, the multi-twin neural network includes a plurality of neural network elements including an image pre-processing network, a position and image data encoding network, an encoder network, a fully connected layer, the encoder network including L encoders in series, as shown in fig. 4.
The image preprocessing network is used for converting an input target image into a normalized image feature map.
Specifically, the image preprocessing network is used for completing conversion from different resolutions and different channel numbers to the same image feature map. Suppose the matrix of the input image of the image preprocessing network in the ith neural network unit is: h i ×W i ×C i The image preprocessing network output size is: (M P) x (K P) x C normalized image feature map. The image pre-processing network structure is shown in fig. 5. The input image is first interpolated into: (M P) x (K P) x Ci size image, and then 1 x 1And (M) convolving the channels to obtain an image feature map of (M P) x (K P) x C.
The location and image data encoding network is used to convert the image feature map into feature vectors containing location and image data.
Specifically, the input of the position and image data encoding network is the output of the image preprocessing module, i.e. the size is: (M x P) x (K x P) x C normalized image feature map. The output of the location and image data encoding network is (M x K) a feature vector containing location and image data. The position and image data coding network divides the normalized image feature map of (M × P) × (K × P) × C into (M × K) modules as shown in fig. 6 according to the size of P × P, and each module contains the following data amount: p × P × C.
Expanding the three-dimensional image feature map of P to form a feature vector Z with the size of P C1 t . Assume that the position of the feature pattern block in the feature map is (m, n) (0)<m<M+1,0<n<K +1, m, n are integers) defines the position code X of the characteristic diagram shown below pos :
X pos =(n*M+m)/(M*K)
Combining the P X P C X1 characteristic diagram vector and the position code of the characteristic diagram to obtain the following characteristic vector Z p :
Z p =[X pos ;Z t ]
Z p The size of (A) is as follows: (P + C + 1). Times.1, Z p Cannot be directly input into the encoder module and needs normalization processing. The normalization processing mode is as follows:
Z po,i =sigmod(BN(Z p,i *W p,i +B i ))
wherein i is more than or equal to 0<(M x K), and i is an integer wherein W p,i Is a matrix of (P + C + 1) X (P + C + 1), B i Comprises the following steps: (P × C + 1) × 1 matrix. Both Wp and B are learnable network parameters. The sigmod function is used as a nonlinear function, on one hand, the output of the module is normalized to be between (0-1), and meanwhile, the sigmod function has nonlinear characteristics and stronger expression capability. The definition of the Sigmod function is as follows:
the encoder network is used to perform feature vector extraction.
Specifically, the encoder network is composed of L encoders connected in series, and the structure of each encoder is shown in fig. 7. The input to the encoder network is a (M X K) number of (P X P C + 1) x 1 eigenvectors Z po l (0≤l<L-1, and L is an integer), the output of the encoder network is: z po l+1 And (M × K) feature vectors of (P × C + 1) × 1.
And the full connection layer is used for finishing the mapping of the characteristic vector output by the encoder network to the target class and outputting the class probability of the target.
Specifically, assuming that the class of the target is T, (M × K) pieces of (P × C + 1) × 1 are accumulated to obtain a feature vector Z M The vector is a feature vector of (P × C + 1) × 1. Full link layer Z C Output of and Z M The relationship of (a) to (b) is as follows:
Z C = sofmax(Z M *W M +B M )
wherein Z C The vector is T multiplied by 1, the elements of Zc represent the probability that the input heterogeneous image is in a certain category, and the value range is (0-1). W M And B M The weights and biases for fully connected layers are parameters that can be learned. The definition of the Softmax function is as follows:
in some embodiments, establishing a multi-twin neural network having N transform structures comprises:
s110, obtaining N source images to form a data set, and labeling the data set to form training data of a multi-twin neural network;
s120, establishing a joint loss function (L) of the multi-twin neural network, which is defined as:
wherein Y represents whether the plurality of sample labels match, Y =1, representing a label match for N samples; y =0, representing two tags not matching; m is a set threshold value, belongs to a super parameter of the network and is obtained according to experience; h is the number of samples of single training;and outputting the distance between different twin network feature layer in the h training sample.Is defined as follows:
s130, training the multi-twin neural network by using the joint loss function to obtain a parameter W of the multi-twin neural network 1 *,W 2 *,...,W N *。
S200, acquiring N target images in different modes.
Generally, the target image is an infrared image, a visible light image, an SAR radar image, a multispectral image or a laser radar image; specifically, the target image may be one or more of an infrared image, a visible light image, a SAR radar image, a multispectral image, and a lidar image.
Generally, the number of the target images is two or more. In this embodiment, there is no requirement for the shooting angle of the target image or the like.
S300, inputting the target image into the multi-twin neural network, and outputting a recognition result by the multi-twin neural network.
Specifically, the processing procedure of the multi-twin neural network on the input target image is as follows: using parameters of the multi-twin neural network during training: w 1 *,W 2 *,...,W N * For the i-thThe input image of the source passes through the ith neural network unit, and the classification result of the target can be obtained at the output end Y of the neural network unit, as shown in FIG. 8.
The method comprises the steps of extracting global effective information in a scene and focusing attention on local feature points by using a transducer attention mechanism, extracting features and calculating similarity of target images of multiple modes and multiple visual angles by using a multi-twin neural network, and realizing relevance synthesis of different information sources of the same target scene, so that target consistency feature expression of cross-mode, large-visual-angle transformation and scale transformation of a ground building/offshore large ship is completed, and accurate identification of multi-platform multi-mode ground building images is realized.
In this embodiment, different types of target images are input into different networks, the network networks have different network structures and may have different network parameters, and the same feature vectors of the different types of target images for the ground building target are obtained by defining the target functions of the networks, so that the same feature vectors of the target are established among the input images of different modalities, and the purpose of target identification is finally achieved. For example, the infrared image and the visible light image are input into two independent networks, different network structures exist between the two networks, parameters of the networks may also be different, and the same feature vectors of the visible light image and the infrared image for the ground building target are obtained by defining the target functions of the two networks.
A second aspect of the present invention provides a twin Transformer-based multimodal image ground structure recognition apparatus, as shown in fig. 9, the multimodal image ground structure recognition apparatus includes a model construction module, an image acquisition module, and a target recognition module.
The model building module is used for building a multi-twin neural network with N transform structures, and the multi-twin neural network is a pseudo-twin neural network. In this embodiment, the model building module may be configured to perform step S100 shown in fig. 1, and reference may be made to the description of step S100 for a detailed description of the model building module.
And the image acquisition module is used for acquiring N target images in different modes. In this embodiment, the image acquiring module may be configured to perform step S200 shown in fig. 1, and reference may be made to the description of step S200 for a detailed description of the image acquiring module.
And the target recognition module is used for inputting the target image into the multi-twin neural network to obtain a recognition result output by the multi-twin neural network. In this embodiment, the object recognition module may be configured to execute step S300 shown in fig. 1, and reference may be made to the description of step S300 for a detailed description of the object recognition module.
The foregoing is illustrative of the preferred embodiments of this invention, and it is to be understood that the invention is not limited to the precise form disclosed herein and that various other combinations, modifications, and environments may be resorted to, falling within the scope of the concept as disclosed herein, either as described above or as apparent to those skilled in the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (4)
1. The multimodal image ground building identification method based on the twin transform is characterized by comprising the following steps:
establishing a multi-twin neural network with N transform structures, wherein the multi-twin neural network is a pseudo-twin neural network;
acquiring N target images in different modes;
inputting the target image into the multi-twin neural network, and outputting a recognition result by the multi-twin neural network;
the multi-twin neural network comprises a plurality of neural network units, the neural network units comprise an image preprocessing network, a position and image data coding network, an encoder network and a full connection layer, and the encoder network comprises L encoders connected in series;
the image preprocessing network is used for converting an input target image into a normalized image feature map;
the position and image data encoding network is used for converting the image feature map into a feature vector containing position and image data;
the encoder network is used for completing the extraction of the feature vector;
and the full connection layer is used for finishing the mapping of the characteristic vector output by the encoder network to the target class and outputting the class probability of the target.
2. The twin Transformer-based multimodal image ground building identification method according to claim 1, wherein the target image is an infrared image, a visible light image, a SAR radar image, a multispectral image or a lidar image.
3. The twin transducer-based multimodal image ground building identification method according to claim 1, wherein establishing a multi-twin neural network with N transducer structures comprises:
obtaining a plurality of source images to form a data set, and labeling the data set to form training data of a multi-twin neural network;
establishing a joint loss function of the multi-twin neural network;
and training the multi-twin neural network by using the joint loss function to obtain parameters of the multi-twin neural network.
4. Twin transform-based multimodal image ground building recognition device is characterized by comprising:
the model building module is used for building a multi-twin neural network with N transform structures, and the multi-twin neural network is a pseudo-twin neural network;
the image acquisition module is used for acquiring N target images in different modalities;
and the target recognition module is used for inputting the target image into the multi-twin neural network to obtain a recognition result output by the multi-twin neural network.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211545426.6A CN115620150B (en) | 2022-12-05 | 2022-12-05 | Multi-mode image ground building identification method and device based on twin transformers |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211545426.6A CN115620150B (en) | 2022-12-05 | 2022-12-05 | Multi-mode image ground building identification method and device based on twin transformers |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115620150A true CN115620150A (en) | 2023-01-17 |
CN115620150B CN115620150B (en) | 2023-08-04 |
Family
ID=84879822
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211545426.6A Active CN115620150B (en) | 2022-12-05 | 2022-12-05 | Multi-mode image ground building identification method and device based on twin transformers |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115620150B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115861822A (en) * | 2023-02-07 | 2023-03-28 | 海豚乐智科技(成都)有限责任公司 | Target local point and global structured matching method and device |
Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107122733A (en) * | 2017-04-25 | 2017-09-01 | 西安电子科技大学 | Hyperspectral image classification method based on NSCT and SAE |
CN109492666A (en) * | 2018-09-30 | 2019-03-19 | 北京百卓网络技术有限公司 | Image recognition model training method, device and storage medium |
CN110728330A (en) * | 2019-10-23 | 2020-01-24 | 腾讯科技(深圳)有限公司 | Object identification method, device, equipment and storage medium based on artificial intelligence |
CN111368920A (en) * | 2020-03-05 | 2020-07-03 | 中南大学 | Quantum twin neural network-based binary classification method and face recognition method thereof |
CN111461255A (en) * | 2020-04-20 | 2020-07-28 | 武汉大学 | Siamese network image identification method and system based on interval distribution |
CN112215085A (en) * | 2020-09-17 | 2021-01-12 | 云南电网有限责任公司昆明供电局 | Power transmission corridor foreign matter detection method and system based on twin network |
CN112749326A (en) * | 2019-11-15 | 2021-05-04 | 腾讯科技(深圳)有限公司 | Information processing method, information processing device, computer equipment and storage medium |
CN112861988A (en) * | 2021-03-04 | 2021-05-28 | 西南科技大学 | Feature matching method based on attention-seeking neural network |
CN114418956A (en) * | 2021-12-24 | 2022-04-29 | 国网陕西省电力公司电力科学研究院 | Method and system for detecting change of key electrical equipment of transformer substation |
US20220172378A1 (en) * | 2019-04-03 | 2022-06-02 | Nec Corporation | Image processing apparatus, image processing method and non-transitory computer readable medium |
CN114581485A (en) * | 2022-03-02 | 2022-06-03 | 上海瀚所信息技术有限公司 | Target tracking method based on language modeling pattern twin network |
CN115170575A (en) * | 2022-09-09 | 2022-10-11 | 阿里巴巴(中国)有限公司 | Method and equipment for remote sensing image change detection and model training |
CN115272719A (en) * | 2022-07-27 | 2022-11-01 | 上海工程技术大学 | Cross-view-angle scene matching method for unmanned aerial vehicle image and satellite image |
CN115424331A (en) * | 2022-09-19 | 2022-12-02 | 四川轻化工大学 | Human face relative relationship feature extraction and verification method based on global and local attention mechanism |
CN115424155A (en) * | 2022-11-04 | 2022-12-02 | 浙江大华技术股份有限公司 | Illegal construction detection method, illegal construction detection device and computer storage medium |
-
2022
- 2022-12-05 CN CN202211545426.6A patent/CN115620150B/en active Active
Patent Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107122733A (en) * | 2017-04-25 | 2017-09-01 | 西安电子科技大学 | Hyperspectral image classification method based on NSCT and SAE |
CN109492666A (en) * | 2018-09-30 | 2019-03-19 | 北京百卓网络技术有限公司 | Image recognition model training method, device and storage medium |
US20220172378A1 (en) * | 2019-04-03 | 2022-06-02 | Nec Corporation | Image processing apparatus, image processing method and non-transitory computer readable medium |
CN110728330A (en) * | 2019-10-23 | 2020-01-24 | 腾讯科技(深圳)有限公司 | Object identification method, device, equipment and storage medium based on artificial intelligence |
CN112749326A (en) * | 2019-11-15 | 2021-05-04 | 腾讯科技(深圳)有限公司 | Information processing method, information processing device, computer equipment and storage medium |
CN111368920A (en) * | 2020-03-05 | 2020-07-03 | 中南大学 | Quantum twin neural network-based binary classification method and face recognition method thereof |
CN111461255A (en) * | 2020-04-20 | 2020-07-28 | 武汉大学 | Siamese network image identification method and system based on interval distribution |
CN112215085A (en) * | 2020-09-17 | 2021-01-12 | 云南电网有限责任公司昆明供电局 | Power transmission corridor foreign matter detection method and system based on twin network |
CN112861988A (en) * | 2021-03-04 | 2021-05-28 | 西南科技大学 | Feature matching method based on attention-seeking neural network |
CN114418956A (en) * | 2021-12-24 | 2022-04-29 | 国网陕西省电力公司电力科学研究院 | Method and system for detecting change of key electrical equipment of transformer substation |
CN114581485A (en) * | 2022-03-02 | 2022-06-03 | 上海瀚所信息技术有限公司 | Target tracking method based on language modeling pattern twin network |
CN115272719A (en) * | 2022-07-27 | 2022-11-01 | 上海工程技术大学 | Cross-view-angle scene matching method for unmanned aerial vehicle image and satellite image |
CN115170575A (en) * | 2022-09-09 | 2022-10-11 | 阿里巴巴(中国)有限公司 | Method and equipment for remote sensing image change detection and model training |
CN115424331A (en) * | 2022-09-19 | 2022-12-02 | 四川轻化工大学 | Human face relative relationship feature extraction and verification method based on global and local attention mechanism |
CN115424155A (en) * | 2022-11-04 | 2022-12-02 | 浙江大华技术股份有限公司 | Illegal construction detection method, illegal construction detection device and computer storage medium |
Non-Patent Citations (2)
Title |
---|
王默杨: "基于卷积神经网络的多源遥感影像变化检测", 《中国优秀硕士学位论文全文数据库 基础科学辑》,基于卷积神经网络的多源遥感影像变化检测, pages 1 - 74 * |
赵成: "基于高分辨率遥感影像变化检测算法研究", 《中国优秀硕士学位论文全文数据库 工程科技Ⅱ辑 》,基于高分辨率遥感影像变化检测算法研究, pages 1 - 59 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115861822A (en) * | 2023-02-07 | 2023-03-28 | 海豚乐智科技(成都)有限责任公司 | Target local point and global structured matching method and device |
CN115861822B (en) * | 2023-02-07 | 2023-05-12 | 海豚乐智科技(成都)有限责任公司 | Target local point and global structured matching method and device |
Also Published As
Publication number | Publication date |
---|---|
CN115620150B (en) | 2023-08-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110070025B (en) | Monocular image-based three-dimensional target detection system and method | |
CN113673425A (en) | Multi-view target detection method and system based on Transformer | |
CN112949407B (en) | Remote sensing image building vectorization method based on deep learning and point set optimization | |
CN115359372A (en) | Unmanned aerial vehicle video moving object detection method based on optical flow network | |
CN115861619A (en) | Airborne LiDAR (light detection and ranging) urban point cloud semantic segmentation method and system of recursive residual double-attention kernel point convolution network | |
CN115861591B (en) | Unmanned aerial vehicle positioning method based on transformer key texture coding matching | |
CN115359474A (en) | Lightweight three-dimensional target detection method, device and medium suitable for mobile terminal | |
CN112861700A (en) | DeepLabv3+ based lane line network identification model establishment and vehicle speed detection method | |
CN111008979A (en) | Robust night image semantic segmentation method | |
CN115620150A (en) | Multi-modal image ground building identification method and device based on twin transform | |
CN115032648A (en) | Three-dimensional target identification and positioning method based on laser radar dense point cloud | |
US11609332B2 (en) | Method and apparatus for generating image using LiDAR | |
CN116958420A (en) | High-precision modeling method for three-dimensional face of digital human teacher | |
CN112668662B (en) | Outdoor mountain forest environment target detection method based on improved YOLOv3 network | |
CN115393404A (en) | Double-light image registration method, device and equipment and storage medium | |
CN114418913B (en) | ISAR and infrared image pixel level fusion method based on wavelet transformation | |
CN114119615A (en) | Radar segmentation method fusing space attention and self-attention transformation network | |
CN112233079A (en) | Method and system for fusing images of multiple sensors | |
Chenguang et al. | Application of Improved YOLO V5s Model for Regional Poverty Assessment Using Remote Sensing Image Target Detection | |
CN116503737B (en) | Ship detection method and device based on space optical image | |
CN116563716B (en) | GIS data processing system for forest carbon sink data acquisition | |
CN117115566B (en) | Urban functional area identification method and system by utilizing full-season remote sensing images | |
WO2023241372A1 (en) | Camera intrinsic parameter calibration method and related device | |
Yao et al. | Semantic segmentation of remote sensing image based on U-NET | |
Kang | 3D Objects Detection and Recognition from Colour and LiDAR Data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |