CN115620150A - Multi-modal image ground building identification method and device based on twin transform - Google Patents

Multi-modal image ground building identification method and device based on twin transform Download PDF

Info

Publication number
CN115620150A
CN115620150A CN202211545426.6A CN202211545426A CN115620150A CN 115620150 A CN115620150 A CN 115620150A CN 202211545426 A CN202211545426 A CN 202211545426A CN 115620150 A CN115620150 A CN 115620150A
Authority
CN
China
Prior art keywords
image
neural network
twin
twin neural
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211545426.6A
Other languages
Chinese (zh)
Other versions
CN115620150B (en
Inventor
蒙顺开
瞿锐恒
李叶雨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dolphin Lezhi Technology Chengdu Co ltd
Original Assignee
Dolphin Lezhi Technology Chengdu Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dolphin Lezhi Technology Chengdu Co ltd filed Critical Dolphin Lezhi Technology Chengdu Co ltd
Priority to CN202211545426.6A priority Critical patent/CN115620150B/en
Publication of CN115620150A publication Critical patent/CN115620150A/en
Application granted granted Critical
Publication of CN115620150B publication Critical patent/CN115620150B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/176Urban or other man-made structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10048Infrared image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Abstract

The invention discloses a twin transform-based multi-modal image ground building identification method and device, and belongs to the technical field of ground building identification. The multi-modal image ground building identification method comprises the following steps: establishing a multi-twin neural network with N transform structures, wherein the multi-twin neural network is a pseudo-twin neural network; acquiring N target images in different modes; inputting the target image into the multi-twin neural network, and outputting a recognition result by the multi-twin neural network. The invention realizes the accurate recognition of the multi-platform multi-modal ground building image.

Description

Multi-modal image ground building identification method and device based on twin transform
Technical Field
The invention belongs to the technical field of ground building identification, and particularly relates to a twin transform-based multi-modal image ground building identification method and device.
Background
With the continuous advance of urbanization, the land occupation ratio of modern urban buildings is larger and larger, the types of urban buildings are richer and richer, and the buildings are communicated with each other; residential areas with different internal layouts, business areas with different heights of office buildings, low houses and industrial parks with wide occupied areas, and different difficulties are generated in ground building search by various building scenes.
From the perspective of a scout image source, visible light images, infrared images, SAR radar images and the like are mainly used for ground building target scout at present, and corresponding remote sensing satellites or unmanned aerial vehicles and other equipment can be used for capturing the scout image information. The visible light image mainly comprises color and texture information of the target, has higher resolution, has more details and light and shade contrast, and describes the target more specifically and is closer to the target information seen by human eyes. But visible light is greatly affected by illumination and weather conditions during imaging; the infrared image captures the thermal radiation of the target, has strong penetrating power and strong contour capturing performance on the target, but generally has lower resolution and poorer texture. The SAR image belongs to a radar image, has the characteristics of all weather, all time and no influence of weather, has higher imaging resolution and large width, can record information such as phase, amplitude, intensity and the like, and can obtain a clear high-resolution gray scale image through certain focusing processing and other modes.
The omnibearing sensing platform for the ground building comprises a space base platform, an empty base platform, a shore base platform, a sea base platform and the like, and senses information of the ground building, the environment, the geography and the like through various sensors. However, the targets shot by the platforms have large angle and size changes, which brings great difficulty to the identification of the ground buildings.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a twin transform-based multi-modal image ground building identification method and device.
The purpose of the invention is realized by the following technical scheme:
according to the first aspect of the invention, the twin transform-based multi-modal image ground building identification method comprises the following steps:
establishing a multi-twin neural network with N transform structures, wherein the multi-twin neural network is a pseudo-twin neural network;
acquiring N target images in different modes;
inputting the target image into the multi-twin neural network, and outputting a recognition result by the multi-twin neural network;
the multi-twin neural network comprises a plurality of neural network units, the neural network units comprise an image preprocessing network, a position and image data coding network, an encoder network and a full connection layer, and the encoder network comprises L encoders connected in series;
the image preprocessing network is used for converting an input target image into a normalized image feature map;
the position and image data encoding network is used for converting the image feature map into a feature vector containing position and image data;
the encoder network is used for completing the extraction of the feature vector;
and the full connection layer is used for completing the mapping of the characteristic vector output by the encoder network to the target class and outputting the class probability of the target.
Further, the target image is an infrared image, a visible light image, a SAR radar image, a multispectral image or a laser radar image.
Further, establishing a multi-twin neural network with N transform structures, comprising:
acquiring a plurality of source images to form a data set, and labeling the data set to form training data of a multi-twin neural network;
establishing a joint loss function of the multi-twin neural network;
and training the multi-twin neural network by using the joint loss function to obtain parameters of the multi-twin neural network.
According to a second aspect of the present invention, a twin Transformer based multi-modal image ground building recognition apparatus comprises:
the model building module is used for building a multi-twin neural network with N transform structures, and the multi-twin neural network is a pseudo-twin neural network;
the image acquisition module is used for acquiring N target images in different modes;
and the target recognition module is used for inputting the target image into the multi-twin neural network to obtain a recognition result output by the multi-twin neural network.
The invention has the beneficial effects that:
(1) The method utilizes a transducer attention mechanism to realize the extraction of global effective information in a scene and the focusing attention of local feature points, then uses a multi-twin neural network to carry out feature extraction and similarity calculation on target images of multiple modes and multiple visual angles, realizes the relevance synthesis of different information sources of the same target scene, completes the integral modeling expression of the target scene, and realizes the accurate identification of multi-platform multi-mode ground building images;
(2) According to the method, a more typical pseudo-twin neural network is adopted, and a neural network model with consistent expression among different modal data is constructed by designing a loss function of the pseudo-twin neural network, so that the problem of matching of targets among different modal images is solved.
Drawings
FIG. 1 is a flow diagram of one embodiment of a method for multi-modal image ground structure identification in accordance with the present invention;
FIG. 2 is a schematic diagram of a pseudo-twin neural network;
FIG. 3 is a diagram of a transform as an encoder-decoder architecture;
FIG. 4 is a schematic diagram of a multi-twin neural network training process;
FIG. 5 is a schematic diagram of an image pre-processing network;
FIG. 6 is a schematic diagram of a plurality of modules obtained by dividing an image feature map by a location and image data encoding network;
FIG. 7 is a schematic diagram of an encoder;
FIG. 8 is a schematic diagram of a process of inputting a target image of a multi-twin neural network;
fig. 9 is a block diagram of the multi-modal image ground structure recognition apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the following embodiments, and it should be apparent that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.
Referring to fig. 1 to 4, the present invention provides a twin transform-based multi-modal image ground building recognition method and apparatus:
in a first aspect of the present invention, a twin Transformer-based multimodal image ground structure recognition method is provided, as shown in fig. 1, the multimodal image ground structure recognition method includes steps S100 to S300, which are described in detail below.
S100, establishing a multi-twin neural network with N transform structures, wherein the multi-twin neural network is a pseudo-twin neural network.
The neural network structures employed in each input image branch of the pseudo-twin neural network are different or the parameters are not shared, as shown in fig. 2. In the embodiment, a pseudo-twin neural network is adopted, and a neural network model with consistent expression among different modal data is constructed by designing a loss function of the pseudo-twin network, so that the problem of matching targets among different modal images is solved.
In this embodiment, by comparing the distance expression modes among the euclidean distance, the cosine distance, the exponential distance, and the like, a distance measurement method with the minimum distance of the same class and the maximum distance of the different class is selected as the distance measurement method of the multi-twin neural network. The twin network inputs a plurality of vectors passing through the deep neural network, and in a new vector space, the purpose can be achieved as long as the distance between the vectors can be judged, and the distance of the same type is smaller and the distance of the different type is larger.
When the Transformer is used as an encoder-decoder, it is based on the attention mechanism completely, without any convolution layer or recurrent neural network layer, and its overall structure is shown in fig. 3. The embedded representations of the input (source) sequence and the output (target) sequence, plus position coding, are input to an encoder and decoder, respectively. Different from the convolution operation which only models the relation between the neighborhood pixels, the Transformer is the global operation which can model the relation between all the pixels, has stronger modeling capability, can better extract the global characteristics of a scene by using the model, and highlights the relation between local parts and the whole. In the embodiment, a transform is used instead of a convolution operation, so that better scene feature extraction and subsequent task requirements are realized. In this embodiment, the Transformer and the twin network are combined to form a multi-twin neural network with N Transformer structures, and a consistency expression model of the extended target under the conditions of cross-modal, multi-view and different scale changes is established.
In some embodiments, the multi-twin neural network includes a plurality of neural network elements including an image pre-processing network, a position and image data encoding network, an encoder network, a fully connected layer, the encoder network including L encoders in series, as shown in fig. 4.
The image preprocessing network is used for converting an input target image into a normalized image feature map.
Specifically, the image preprocessing network is used for completing conversion from different resolutions and different channel numbers to the same image feature map. Suppose the matrix of the input image of the image preprocessing network in the ith neural network unit is: h i ×W i ×C i The image preprocessing network output size is: (M P) x (K P) x C normalized image feature map. The image pre-processing network structure is shown in fig. 5. The input image is first interpolated into: (M P) x (K P) x Ci size image, and then 1 x 1And (M) convolving the channels to obtain an image feature map of (M P) x (K P) x C.
The location and image data encoding network is used to convert the image feature map into feature vectors containing location and image data.
Specifically, the input of the position and image data encoding network is the output of the image preprocessing module, i.e. the size is: (M x P) x (K x P) x C normalized image feature map. The output of the location and image data encoding network is (M x K) a feature vector containing location and image data. The position and image data coding network divides the normalized image feature map of (M × P) × (K × P) × C into (M × K) modules as shown in fig. 6 according to the size of P × P, and each module contains the following data amount: p × P × C.
Expanding the three-dimensional image feature map of P to form a feature vector Z with the size of P C1 t . Assume that the position of the feature pattern block in the feature map is (m, n) (0)<m<M+1,0<n<K +1, m, n are integers) defines the position code X of the characteristic diagram shown below pos
X pos =(n*M+m)/(M*K)
Combining the P X P C X1 characteristic diagram vector and the position code of the characteristic diagram to obtain the following characteristic vector Z p :
Z p =[X pos ;Z t ]
Z p The size of (A) is as follows: (P + C + 1). Times.1, Z p Cannot be directly input into the encoder module and needs normalization processing. The normalization processing mode is as follows:
Z po,i =sigmod(BN(Z p,i *W p,i +B i ))
wherein i is more than or equal to 0<(M x K), and i is an integer wherein W p,i Is a matrix of (P + C + 1) X (P + C + 1), B i Comprises the following steps: (P × C + 1) × 1 matrix. Both Wp and B are learnable network parameters. The sigmod function is used as a nonlinear function, on one hand, the output of the module is normalized to be between (0-1), and meanwhile, the sigmod function has nonlinear characteristics and stronger expression capability. The definition of the Sigmod function is as follows:
Figure DEST_PATH_IMAGE001
the encoder network is used to perform feature vector extraction.
Specifically, the encoder network is composed of L encoders connected in series, and the structure of each encoder is shown in fig. 7. The input to the encoder network is a (M X K) number of (P X P C + 1) x 1 eigenvectors Z po l (0≤l<L-1, and L is an integer), the output of the encoder network is: z po l+1 And (M × K) feature vectors of (P × C + 1) × 1.
And the full connection layer is used for finishing the mapping of the characteristic vector output by the encoder network to the target class and outputting the class probability of the target.
Specifically, assuming that the class of the target is T, (M × K) pieces of (P × C + 1) × 1 are accumulated to obtain a feature vector Z M The vector is a feature vector of (P × C + 1) × 1. Full link layer Z C Output of and Z M The relationship of (a) to (b) is as follows:
Z C = sofmax(Z M *W M +B M )
wherein Z C The vector is T multiplied by 1, the elements of Zc represent the probability that the input heterogeneous image is in a certain category, and the value range is (0-1). W M And B M The weights and biases for fully connected layers are parameters that can be learned. The definition of the Softmax function is as follows:
Figure 970989DEST_PATH_IMAGE002
in some embodiments, establishing a multi-twin neural network having N transform structures comprises:
s110, obtaining N source images to form a data set, and labeling the data set to form training data of a multi-twin neural network;
s120, establishing a joint loss function (L) of the multi-twin neural network, which is defined as:
Figure 419288DEST_PATH_IMAGE004
wherein Y represents whether the plurality of sample labels match, Y =1, representing a label match for N samples; y =0, representing two tags not matching; m is a set threshold value, belongs to a super parameter of the network and is obtained according to experience; h is the number of samples of single training;
Figure 605550DEST_PATH_IMAGE005
and outputting the distance between different twin network feature layer in the h training sample.
Figure 797497DEST_PATH_IMAGE006
Is defined as follows:
Figure DEST_PATH_IMAGE007
s130, training the multi-twin neural network by using the joint loss function to obtain a parameter W of the multi-twin neural network 1 *,W 2 *,...,W N *。
S200, acquiring N target images in different modes.
Generally, the target image is an infrared image, a visible light image, an SAR radar image, a multispectral image or a laser radar image; specifically, the target image may be one or more of an infrared image, a visible light image, a SAR radar image, a multispectral image, and a lidar image.
Generally, the number of the target images is two or more. In this embodiment, there is no requirement for the shooting angle of the target image or the like.
S300, inputting the target image into the multi-twin neural network, and outputting a recognition result by the multi-twin neural network.
Specifically, the processing procedure of the multi-twin neural network on the input target image is as follows: using parameters of the multi-twin neural network during training: w 1 *,W 2 *,...,W N * For the i-thThe input image of the source passes through the ith neural network unit, and the classification result of the target can be obtained at the output end Y of the neural network unit, as shown in FIG. 8.
The method comprises the steps of extracting global effective information in a scene and focusing attention on local feature points by using a transducer attention mechanism, extracting features and calculating similarity of target images of multiple modes and multiple visual angles by using a multi-twin neural network, and realizing relevance synthesis of different information sources of the same target scene, so that target consistency feature expression of cross-mode, large-visual-angle transformation and scale transformation of a ground building/offshore large ship is completed, and accurate identification of multi-platform multi-mode ground building images is realized.
In this embodiment, different types of target images are input into different networks, the network networks have different network structures and may have different network parameters, and the same feature vectors of the different types of target images for the ground building target are obtained by defining the target functions of the networks, so that the same feature vectors of the target are established among the input images of different modalities, and the purpose of target identification is finally achieved. For example, the infrared image and the visible light image are input into two independent networks, different network structures exist between the two networks, parameters of the networks may also be different, and the same feature vectors of the visible light image and the infrared image for the ground building target are obtained by defining the target functions of the two networks.
A second aspect of the present invention provides a twin Transformer-based multimodal image ground structure recognition apparatus, as shown in fig. 9, the multimodal image ground structure recognition apparatus includes a model construction module, an image acquisition module, and a target recognition module.
The model building module is used for building a multi-twin neural network with N transform structures, and the multi-twin neural network is a pseudo-twin neural network. In this embodiment, the model building module may be configured to perform step S100 shown in fig. 1, and reference may be made to the description of step S100 for a detailed description of the model building module.
And the image acquisition module is used for acquiring N target images in different modes. In this embodiment, the image acquiring module may be configured to perform step S200 shown in fig. 1, and reference may be made to the description of step S200 for a detailed description of the image acquiring module.
And the target recognition module is used for inputting the target image into the multi-twin neural network to obtain a recognition result output by the multi-twin neural network. In this embodiment, the object recognition module may be configured to execute step S300 shown in fig. 1, and reference may be made to the description of step S300 for a detailed description of the object recognition module.
The foregoing is illustrative of the preferred embodiments of this invention, and it is to be understood that the invention is not limited to the precise form disclosed herein and that various other combinations, modifications, and environments may be resorted to, falling within the scope of the concept as disclosed herein, either as described above or as apparent to those skilled in the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (4)

1. The multimodal image ground building identification method based on the twin transform is characterized by comprising the following steps:
establishing a multi-twin neural network with N transform structures, wherein the multi-twin neural network is a pseudo-twin neural network;
acquiring N target images in different modes;
inputting the target image into the multi-twin neural network, and outputting a recognition result by the multi-twin neural network;
the multi-twin neural network comprises a plurality of neural network units, the neural network units comprise an image preprocessing network, a position and image data coding network, an encoder network and a full connection layer, and the encoder network comprises L encoders connected in series;
the image preprocessing network is used for converting an input target image into a normalized image feature map;
the position and image data encoding network is used for converting the image feature map into a feature vector containing position and image data;
the encoder network is used for completing the extraction of the feature vector;
and the full connection layer is used for finishing the mapping of the characteristic vector output by the encoder network to the target class and outputting the class probability of the target.
2. The twin Transformer-based multimodal image ground building identification method according to claim 1, wherein the target image is an infrared image, a visible light image, a SAR radar image, a multispectral image or a lidar image.
3. The twin transducer-based multimodal image ground building identification method according to claim 1, wherein establishing a multi-twin neural network with N transducer structures comprises:
obtaining a plurality of source images to form a data set, and labeling the data set to form training data of a multi-twin neural network;
establishing a joint loss function of the multi-twin neural network;
and training the multi-twin neural network by using the joint loss function to obtain parameters of the multi-twin neural network.
4. Twin transform-based multimodal image ground building recognition device is characterized by comprising:
the model building module is used for building a multi-twin neural network with N transform structures, and the multi-twin neural network is a pseudo-twin neural network;
the image acquisition module is used for acquiring N target images in different modalities;
and the target recognition module is used for inputting the target image into the multi-twin neural network to obtain a recognition result output by the multi-twin neural network.
CN202211545426.6A 2022-12-05 2022-12-05 Multi-mode image ground building identification method and device based on twin transformers Active CN115620150B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211545426.6A CN115620150B (en) 2022-12-05 2022-12-05 Multi-mode image ground building identification method and device based on twin transformers

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211545426.6A CN115620150B (en) 2022-12-05 2022-12-05 Multi-mode image ground building identification method and device based on twin transformers

Publications (2)

Publication Number Publication Date
CN115620150A true CN115620150A (en) 2023-01-17
CN115620150B CN115620150B (en) 2023-08-04

Family

ID=84879822

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211545426.6A Active CN115620150B (en) 2022-12-05 2022-12-05 Multi-mode image ground building identification method and device based on twin transformers

Country Status (1)

Country Link
CN (1) CN115620150B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115861822A (en) * 2023-02-07 2023-03-28 海豚乐智科技(成都)有限责任公司 Target local point and global structured matching method and device

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107122733A (en) * 2017-04-25 2017-09-01 西安电子科技大学 Hyperspectral image classification method based on NSCT and SAE
CN109492666A (en) * 2018-09-30 2019-03-19 北京百卓网络技术有限公司 Image recognition model training method, device and storage medium
CN110728330A (en) * 2019-10-23 2020-01-24 腾讯科技(深圳)有限公司 Object identification method, device, equipment and storage medium based on artificial intelligence
CN111368920A (en) * 2020-03-05 2020-07-03 中南大学 Quantum twin neural network-based binary classification method and face recognition method thereof
CN111461255A (en) * 2020-04-20 2020-07-28 武汉大学 Siamese network image identification method and system based on interval distribution
CN112215085A (en) * 2020-09-17 2021-01-12 云南电网有限责任公司昆明供电局 Power transmission corridor foreign matter detection method and system based on twin network
CN112749326A (en) * 2019-11-15 2021-05-04 腾讯科技(深圳)有限公司 Information processing method, information processing device, computer equipment and storage medium
CN112861988A (en) * 2021-03-04 2021-05-28 西南科技大学 Feature matching method based on attention-seeking neural network
CN114418956A (en) * 2021-12-24 2022-04-29 国网陕西省电力公司电力科学研究院 Method and system for detecting change of key electrical equipment of transformer substation
US20220172378A1 (en) * 2019-04-03 2022-06-02 Nec Corporation Image processing apparatus, image processing method and non-transitory computer readable medium
CN114581485A (en) * 2022-03-02 2022-06-03 上海瀚所信息技术有限公司 Target tracking method based on language modeling pattern twin network
CN115170575A (en) * 2022-09-09 2022-10-11 阿里巴巴(中国)有限公司 Method and equipment for remote sensing image change detection and model training
CN115272719A (en) * 2022-07-27 2022-11-01 上海工程技术大学 Cross-view-angle scene matching method for unmanned aerial vehicle image and satellite image
CN115424331A (en) * 2022-09-19 2022-12-02 四川轻化工大学 Human face relative relationship feature extraction and verification method based on global and local attention mechanism
CN115424155A (en) * 2022-11-04 2022-12-02 浙江大华技术股份有限公司 Illegal construction detection method, illegal construction detection device and computer storage medium

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107122733A (en) * 2017-04-25 2017-09-01 西安电子科技大学 Hyperspectral image classification method based on NSCT and SAE
CN109492666A (en) * 2018-09-30 2019-03-19 北京百卓网络技术有限公司 Image recognition model training method, device and storage medium
US20220172378A1 (en) * 2019-04-03 2022-06-02 Nec Corporation Image processing apparatus, image processing method and non-transitory computer readable medium
CN110728330A (en) * 2019-10-23 2020-01-24 腾讯科技(深圳)有限公司 Object identification method, device, equipment and storage medium based on artificial intelligence
CN112749326A (en) * 2019-11-15 2021-05-04 腾讯科技(深圳)有限公司 Information processing method, information processing device, computer equipment and storage medium
CN111368920A (en) * 2020-03-05 2020-07-03 中南大学 Quantum twin neural network-based binary classification method and face recognition method thereof
CN111461255A (en) * 2020-04-20 2020-07-28 武汉大学 Siamese network image identification method and system based on interval distribution
CN112215085A (en) * 2020-09-17 2021-01-12 云南电网有限责任公司昆明供电局 Power transmission corridor foreign matter detection method and system based on twin network
CN112861988A (en) * 2021-03-04 2021-05-28 西南科技大学 Feature matching method based on attention-seeking neural network
CN114418956A (en) * 2021-12-24 2022-04-29 国网陕西省电力公司电力科学研究院 Method and system for detecting change of key electrical equipment of transformer substation
CN114581485A (en) * 2022-03-02 2022-06-03 上海瀚所信息技术有限公司 Target tracking method based on language modeling pattern twin network
CN115272719A (en) * 2022-07-27 2022-11-01 上海工程技术大学 Cross-view-angle scene matching method for unmanned aerial vehicle image and satellite image
CN115170575A (en) * 2022-09-09 2022-10-11 阿里巴巴(中国)有限公司 Method and equipment for remote sensing image change detection and model training
CN115424331A (en) * 2022-09-19 2022-12-02 四川轻化工大学 Human face relative relationship feature extraction and verification method based on global and local attention mechanism
CN115424155A (en) * 2022-11-04 2022-12-02 浙江大华技术股份有限公司 Illegal construction detection method, illegal construction detection device and computer storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
王默杨: "基于卷积神经网络的多源遥感影像变化检测", 《中国优秀硕士学位论文全文数据库 基础科学辑》,基于卷积神经网络的多源遥感影像变化检测, pages 1 - 74 *
赵成: "基于高分辨率遥感影像变化检测算法研究", 《中国优秀硕士学位论文全文数据库 工程科技Ⅱ辑 》,基于高分辨率遥感影像变化检测算法研究, pages 1 - 59 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115861822A (en) * 2023-02-07 2023-03-28 海豚乐智科技(成都)有限责任公司 Target local point and global structured matching method and device
CN115861822B (en) * 2023-02-07 2023-05-12 海豚乐智科技(成都)有限责任公司 Target local point and global structured matching method and device

Also Published As

Publication number Publication date
CN115620150B (en) 2023-08-04

Similar Documents

Publication Publication Date Title
CN110070025B (en) Monocular image-based three-dimensional target detection system and method
CN113673425A (en) Multi-view target detection method and system based on Transformer
CN112949407B (en) Remote sensing image building vectorization method based on deep learning and point set optimization
CN115359372A (en) Unmanned aerial vehicle video moving object detection method based on optical flow network
CN115861619A (en) Airborne LiDAR (light detection and ranging) urban point cloud semantic segmentation method and system of recursive residual double-attention kernel point convolution network
CN115861591B (en) Unmanned aerial vehicle positioning method based on transformer key texture coding matching
CN115359474A (en) Lightweight three-dimensional target detection method, device and medium suitable for mobile terminal
CN112861700A (en) DeepLabv3+ based lane line network identification model establishment and vehicle speed detection method
CN111008979A (en) Robust night image semantic segmentation method
CN115620150A (en) Multi-modal image ground building identification method and device based on twin transform
CN115032648A (en) Three-dimensional target identification and positioning method based on laser radar dense point cloud
US11609332B2 (en) Method and apparatus for generating image using LiDAR
CN116958420A (en) High-precision modeling method for three-dimensional face of digital human teacher
CN112668662B (en) Outdoor mountain forest environment target detection method based on improved YOLOv3 network
CN115393404A (en) Double-light image registration method, device and equipment and storage medium
CN114418913B (en) ISAR and infrared image pixel level fusion method based on wavelet transformation
CN114119615A (en) Radar segmentation method fusing space attention and self-attention transformation network
CN112233079A (en) Method and system for fusing images of multiple sensors
Chenguang et al. Application of Improved YOLO V5s Model for Regional Poverty Assessment Using Remote Sensing Image Target Detection
CN116503737B (en) Ship detection method and device based on space optical image
CN116563716B (en) GIS data processing system for forest carbon sink data acquisition
CN117115566B (en) Urban functional area identification method and system by utilizing full-season remote sensing images
WO2023241372A1 (en) Camera intrinsic parameter calibration method and related device
Yao et al. Semantic segmentation of remote sensing image based on U-NET
Kang 3D Objects Detection and Recognition from Colour and LiDAR Data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant