CN117710710A

CN117710710A - Image matching method for depth semantic understanding

Info

Publication number: CN117710710A
Application number: CN202410167137.XA
Authority: CN
Inventors: 王洪玲
Original assignee: Hunan Shanxi Culture Co ltd
Current assignee: Hunan Shanxi Culture Co ltd
Priority date: 2024-02-06
Filing date: 2024-02-06
Publication date: 2024-03-15

Abstract

The invention relates to the technical field of image matching, and discloses an image matching method for depth semantic understanding, which comprises the following steps: generalizing the support image to obtain a variation support image set; extracting stable semantic features of the variant support images in the variant support image set by using the image depth semantic understanding model; extracting the deep semantic features of the query image by using the deep semantic feature extraction network model; and performing similarity calculation and image matching on the deep semantic features of the query image and the stable semantic features of the variant support image. According to the method, the self-attention features and the multi-scale local perception features of the variant support image are extracted to serve as depth semantic features, the extracted depth semantic features are fused by combining more stable scale features to obtain stable semantic features, feature similarity calculation is performed by combining feature distribution differences and feature direction differences of the deep semantic features and the stable semantic features, and image matching combining image semantics is achieved.

Description

Image matching method for depth semantic understanding

Technical Field

The invention relates to the technical field of image matching, in particular to an image matching method for depth semantic understanding.

Background

The image matching plays an important role in tasks such as picture searching, identity recognition and the like. Currently, there are a variety of methods and algorithms in the field of image matching. Common image matching methods include feature extraction and description, deep learning-based methods, local feature matching, and the like. Traditional methods such as SIFT, SURF, etc. match by extracting key points and feature descriptors. Whereas deep learning methods such as CNN (convolutional neural network) can automatically learn image representations and match. In addition, some methods based on graphic models, bag of Visual Words, and the like are also widely used in image matching. The image measurement learning network structure of the current mainstream is a supervised learning algorithm based on contrast constraint, a large amount of annotation data is needed during training, and the consumed cost is huge. Furthermore, these algorithms focus only on the image itself, i.e. on the similarity at the pixel level of the image, without regard to the image content, i.e. the similarity of the image semantics, in the similarity measure. Especially, when the image extension content is only slightly modified, the image extension content is very easy to distinguish as different images, and the image content is not considered. Aiming at the problem, the invention provides an image matching method for depth semantic understanding, which realizes accurate image matching by extracting image semantics to perform similarity judgment.

Disclosure of Invention

In view of this, the present invention provides an image matching method for deep semantic understanding, which aims at: 1) On the basis of keeping original support image pixel distribution, performing generalization processing on the support image by adopting a mask self-coding mode to obtain a variation support image set of support image pixel nonlinear transformation, expanding the number of support images which can be matched, realizing generalization segmentation processing of the support image, respectively extracting self-attention features and multiscale local perception features of the variation support image by using an image depth semantic understanding model as depth semantic features, acquiring more stable scale features by combining a mapping fusion mode, and performing fusion processing on the extracted depth semantic features to obtain stable semantic features of self-attention weights and self local perception features of the characterization variation support image in the variation support image set, thereby realizing semantic feature extraction of the variation support image; 2) And sequentially carrying out multi-scale convolution residual processing, pooling operation, depth separable convolution processing and context-combined semantic perception processing on the query image by utilizing the deep semantic feature extraction network model to obtain deep semantic features of the query image, carrying out similarity calculation on the deep semantic features of the query image and the stable semantic features of the variant support image by combining feature distribution differences and feature direction differences of the deep semantic features and the stable semantic features of the stable semantic features, and selecting a support image corresponding to the variant support image with similarity higher than a specified threshold as an image matching result to realize image matching processing combined with image semantics.

The image matching method for depth semantic understanding provided by the invention comprises the following steps of:

s1: obtaining a query image and a plurality of support images, constructing a support image generalization model, and performing generalization treatment on the support images to obtain a variation support image set, wherein the support image generalization model takes the support images as input and takes the variation support image set as output, and a mask self-coding is a main implementation method of the support image generalization;

s2: constructing an image depth semantic understanding model, extracting stable semantic features of a variation support image in a variation support image set, wherein the image depth semantic understanding model takes the variation support image as input, and fuses global self-attention features and multi-scale local perception features to obtain the stable semantic features of the variation support image;

s3: constructing a deep semantic feature extraction network model to extract deep semantic features of the query image, wherein the deep semantic feature extraction network model comprises a multidimensional extraction network module, a deep semantic feature extraction network module and a semantic perception network module;

s4: and carrying out similarity calculation on the deep semantic features of the query image and the stable semantic features of the variant support images, and selecting the support image corresponding to the variant support image with the similarity higher than a specified threshold value as an image matching result.

As a further improvement of the present invention:

optionally, in the step S1, a query image and a plurality of support images are acquired, and a support image generalization model is constructed, including:

acquiring a query image I and a plurality of support images, wherein the support images are candidate images for image matching of the query image, and the acquired support images are represented in the following form:

；

wherein:representing the acquired nth Zhang Zhicheng image;

the method comprises the steps of constructing a support image generalization model, and performing generalization processing on a support image by using the support image generalization model to obtain a variant support image set, wherein the support image generalization model comprises an input layer, an image generalization layer and an output layer, the input layer is used for inputting the support image, the image generalization layer is used for performing generalization mapping processing on the support image, and the output layer is used for outputting a generalization mapping processing result of the support image as a variant support image.

Optionally, in the step S1, a support image generalizing model is used to generalize a support image to obtain a variant support image set, which includes:

generalizing a support image using a support image generalization model, wherein the support imageThe generalization process flow of (1) is as follows:

s11: the input layer receives the support imageAnd support the image->Transmitting to an image generalization layer;

s12: image generalization layer pair supports imagesPerforming M times of generalization mapping processing：

；

Wherein:

representing support image +.>The mth generalization mapping processing result;

representing a nonlinear mapping function; in the embodiment of the invention, the selected nonlinear mapping function is a Sigmoid function;

representing an mth generalization mapping processing template;

representing hadamard product operators;

s13: the output layer outputs the generalized mapping processing result of the support image as a variation support image to form the support imageIs a variant of the support image set: />。

Optionally, constructing an image depth semantic understanding model in the step S2 includes:

the image depth semantic understanding model is constructed, the image depth semantic understanding model takes a variant supporting image as input, and fuses global self-attention features and multi-scale local perception features to obtain stable semantic features of the variant supporting image, wherein the image depth semantic understanding model comprises an input layer, a depth semantic feature extraction layer and a stable semantic feature construction layer, the input layer is used for receiving the variant supporting image, the depth semantic feature extraction layer is used for respectively extracting the self-attention features and the multi-scale local perception features of the variant supporting image as the depth semantic features, and the stable semantic feature construction layer is used for constructing the depth semantic features as the stable semantic features of the variant supporting image.

Optionally, in the step S2, extracting stable semantic features of the variant support image in the variant support image set by using an image depth semantic understanding model includes:

extracting stable semantic features of a variant support image in a variant support image set by using an image depth semantic understanding model, wherein the variant support image setMiddle variant support image->The stable semantic feature extraction flow of (1) is as follows:

s21: the input layer receives the variant support image；

S22: the depth semantic feature extraction layer respectively extracts variant support imagesSelf-attention feature of->And multiscale local perceptual features->As depth semantic feature->：

；

Wherein:

t represents a transpose;

respectively representing convolution weight matrixes in the depth semantic feature extraction layer;

d representsIs a dimension of (2);

representing the respective utilization->Convolution check variance supporting image of pixel size +.>Carrying out convolution treatment to obtain local perception characteristics under three scales;

s23: stabilizing semantic feature build layer to depth semantic featuresConstructed as variant support image->Is described in (1) stable semantic features:

；

wherein:

representing variant support image +.>Is described herein;

representing a ReLU activation function;

an exponential function that is based on a natural constant;

the representation is such that->Maximum parameter ∈>Wherein。

Optionally, constructing a deep semantic feature extraction network model in the step S3, extracting deep semantic features of the query image, including:

constructing a deep semantic feature extraction network model, and extracting deep semantic features of a query image I by using the deep semantic feature extraction network model, wherein the deep semantic feature extraction network model comprises a multi-dimensional extraction network module, a deep semantic feature extraction network module and a semantic perception network module, the multi-dimensional extraction network module is used for receiving the query image and carrying out multi-scale convolution residual processing on the query image to generate a multi-scale feature map of the query image, the deep semantic feature extraction network module is used for converting the multi-scale feature map into deep semantic feature vectors, and the semantic perception network module is used for carrying out semantic perception processing combining context on the deep semantic feature vectors to generate deep semantic features of the query image;

the query image I deep semantic feature extraction flow based on the deep semantic feature extraction network model comprises the following steps:

s31: the multi-dimensional extraction network module receives the query image I and carries out multi-scale convolution residual processing on the query image I, wherein the multi-scale convolution residual processing formula is as follows:

；

wherein:

the convolution residual processing result of the query image I under the U-th scale is represented, and U represents the maximum convolution residual scale;

representation utilization->Convolution operations by a pixel-sized convolution kernel;

generating a multi-scale feature map of the query image I according to the multi-scale convolution residual error processing result:

；

wherein:

a multi-scale feature map representing a query image I;

representing a feature map of the query image I at a u-th scale;

representing maximum pooling operation,/->Representing an average pooling operation,/->Representing characteristic splicing processing symbols;

s32: deep semantic feature extraction network module extracts multi-scale feature imagesConversion into deep semantic feature vectors:

；

wherein:

representing a multiscale feature map->Corresponding deep semantic feature vectors;

representing six-time depth separable convolution processing and three-time maximum pooling operations on the feature map;

s33: the semantic perception network module is used for deep semantic feature vectorsPerforming semantic perception processing combined with context to generate deep semantic feature +.>：

；

Wherein:

w represents a weight parameter matrix of the semantic perception network module;

representing a ReLU activation function;

representing the deep semantic features of the query image I.

Optionally, in the step S4, similarity calculation is performed on deep semantic features of the query image and stable semantic features of the variant support image, and a support image corresponding to the variant support image with similarity higher than a specified threshold is selected as an image matching result, including:

similarity calculation is carried out on deep semantic features of query images and stable semantic features of variant support images, wherein the deep semantic featuresAnd stabilizator semantic feature->The similarity calculation formula of (2) is:

；

wherein:

representing deep semantic features->And stabilizator semantic feature->Similarity of (2);

represents an L1 norm;

and selecting a support image corresponding to the variation support image with the similarity higher than the specified threshold as an image matching result of the query image I. In an embodiment of the present invention, in the present invention,feature distribution differences representing deep semantic features and stable semantic features +.>And the feature direction difference of the deep semantic features and the stable semantic features is represented.

In order to solve the above-described problems, the present invention provides an electronic apparatus including:

a memory storing at least one instruction;

the communication interface is used for realizing the communication of the electronic equipment; and the processor executes the instructions stored in the memory to realize the image matching method for depth semantic understanding.

In order to solve the above-mentioned problems, the present invention also provides a computer-readable storage medium having stored therein at least one instruction that is executed by a processor in an electronic device to implement the above-mentioned image matching method of deep semantic understanding.

Compared with the prior art, the invention provides the image matching method for depth semantic understanding, and the technology has the following advantages:

firstly, the scheme provides a support image generalization and semantic feature extraction method, which utilizes a support image generalization model to generalize a support image, wherein the support imageThe generalization process flow of (1) is as follows: the input layer receives the support imageAnd support the image->Transmitting to an image generalization layer; image generalization layerFor supporting image->Performing M times of generalization mapping treatment:

；

wherein:representing support image +.>The mth generalization mapping processing result; />Representing a nonlinear mapping function; />Representing an mth generalization mapping processing template; />Representing hadamard product operators; the output layer outputs the generalized mapping processing result of the support image as a variation support image to form a support image +.>Is a variant of the support image set:. Constructing an image depth semantic understanding model, wherein the image depth semantic understanding model takes a variant supporting image as input, fuses global self-attention features and multi-scale local perception features to obtain stable semantic features of the variant supporting image, the image depth semantic understanding model comprises an input layer, a depth semantic feature extraction layer and a stable semantic feature construction layer, the input layer is used for receiving the variant supporting image, the depth semantic feature extraction layer is used for respectively extracting the self-attention features and the multi-scale local perception features of the variant supporting image as depth semantic features, and the stable semantic feature construction layer is used for taking the self-attention features and the multi-scale local perception features of the variant supporting image as the depth semantic featuresThe deep semantic features are constructed as stable semantic features of the variant support image. According to the scheme, on the basis of keeping original support image pixel distribution, a mask self-coding mode is adopted to conduct generalization processing on support images, a variation support image set of support image pixel nonlinear transformation is obtained, the number of support images which can be matched is expanded, generalization segmentation processing of the support images is achieved, self-attention features and multiscale local perception features of the variation support images are respectively extracted by using an image depth semantic understanding model to serve as depth semantic features, more stable scale features are obtained in a mapping fusion mode, fusion processing is conducted on the extracted depth semantic features, and stable semantic features of self-attention weights and self local perception features of the variation support images in the variation support image set are obtained, so that semantic feature extraction of the variation support images is achieved.

Meanwhile, the scheme provides a deep semantic feature extraction mode and a feature similarity measurement mode of a query image, a deep semantic feature extraction network model is constructed, deep semantic features of the query image I are extracted by utilizing the deep semantic feature extraction network model, the deep semantic feature extraction network model comprises a multi-dimensional extraction network module, a deep semantic feature extraction network module and a semantic perception network module, the multi-dimensional extraction network module is used for receiving the query image and carrying out multi-scale convolution residual processing on the query image to generate a multi-scale feature map of the query image, the deep semantic feature extraction network module is used for converting the multi-scale feature map into deep semantic feature vectors, the semantic perception network module is used for carrying out semantic perception processing combining the deep semantic feature vectors to generate deep semantic features of the query image, and carrying out similarity calculation on the deep semantic features of the query image and stable semantic features of a variation support image, wherein the deep semantic features are used for carrying out similarity calculation on the deep semantic features of the query imageAnd stabilizator semantic feature->The similarity calculation formula of (2) is:

；

wherein:representing deep semantic features->And stabilizator semantic feature->Similarity of (2); />Represents an L1 norm; and selecting a support image corresponding to the variation support image with the similarity higher than the specified threshold as an image matching result of the query image I. The scheme includes that a deep semantic feature extraction network model is utilized to sequentially perform multi-scale convolution residual processing, pooling operation, depth separable convolution processing and context-combined semantic perception processing on a query image, deep semantic features of the query image are obtained, similarity calculation is performed on the deep semantic features of the query image and stable semantic features of a variation support image through feature distribution differences and feature direction differences of the deep semantic features and the stable semantic features of the stable semantic features, and a support image corresponding to the variation support image with similarity higher than a specified threshold is selected as an image matching result, so that image matching processing combining image semantics is achieved.

Drawings

FIG. 1 is a schematic flow chart of an image matching method with deep semantic understanding according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of an electronic device for implementing an image matching method for deep semantic understanding according to an embodiment of the present invention.

In the figure: 1 an electronic device, 10 a processor, 11 a memory, 12 a program, 13 a communication interface.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

The embodiment of the application provides an image matching method for depth semantic understanding. The execution subject of the image matching method of deep semantic understanding includes, but is not limited to, at least one of a server, a terminal, and the like, which can be configured to execute the method provided by the embodiment of the application. In other words, the image matching method of deep semantic understanding may be performed by software or hardware installed in a terminal device or a server device, and the software may be a blockchain platform. The service end includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like.

Example 1

S1: obtaining a query image and a plurality of support images, constructing a support image generalization model, and generalizing the support images to obtain a variation support image set, wherein the support image generalization model takes the support images as input and takes the variation support image set as output.

In the step S1, a query image and a plurality of support images are acquired, and a support image generalization model is constructed, and the method comprises the following steps:

；

wherein:representing the acquired nth Zhang Zhicheng image;

In the step S1, a support image generalization model is utilized to generalize a support image to obtain a variant support image set, and the method comprises the following steps:

s12: image generalization layer pair supports imagesPerforming M times of generalization mapping treatment:

；

wherein:

representing support image +.>The mth generalization mapping processing result;

representing an mth generalization mapping processing template;

representing hadamard product operators;

S2: constructing an image depth semantic understanding model, extracting stable semantic features of a variation support image in a variation support image set, wherein the image depth semantic understanding model takes the variation support image as input, and fuses global self-attention features and multi-scale local perception features to obtain the stable semantic features of the variation support image.

The step S2 of constructing an image depth semantic understanding model comprises the following steps:

In the step S2, extracting stable semantic features of the variant support image in the variant support image set by using the image depth semantic understanding model includes:

extracting changes using image depth semantic understanding modelsStable semantic features of variant support images in a variant support image set, wherein the variant support image setMiddle variant support image->The stable semantic feature extraction flow of (1) is as follows:

s21: the input layer receives the variant support image；

；

Wherein:

t represents a transpose;

d representsIs a dimension of (2);

；

wherein:

representing variant support image +.>Is described herein;

representing a ReLU activation function;

an exponential function that is based on a natural constant;

the representation is such that->Maximum parameter ∈>Wherein。

S3: constructing a deep semantic feature extraction network model to extract deep semantic features of the query image, wherein the deep semantic feature extraction network model comprises a multidimensional extraction network module, a deep semantic feature extraction network module and a semantic perception network module.

And S3, constructing a deep semantic feature extraction network model, and extracting deep semantic features of the query image, wherein the deep semantic feature extraction network model comprises the following steps:

；

wherein:

representation utilization->Convolution operations by a pixel-sized convolution kernel; in the embodiment of the present invention, < > a->；

；

wherein:

a multi-scale feature map representing a query image I;

representing a feature map of the query image I at a u-th scale;

；

wherein:

；

Wherein:

representing a ReLU activation function;

representing the deep semantic features of the query image I.

In the step S4, similarity calculation is performed on deep semantic features of the query image and stable semantic features of the variant support image, and a support image corresponding to the variant support image with similarity higher than a specified threshold is selected as an image matching result, including:

；

wherein:

represents an L1 norm;

and selecting a support image corresponding to the variation support image with the similarity higher than the specified threshold as an image matching result of the query image I.

Example 2

The electronic device 1 may comprise a processor 10, a memory 11, a communication interface 13 and a bus, and may further comprise a computer program, such as program 12, stored in the memory 11 and executable on the processor 10.

The memory 11 includes at least one type of readable storage medium, including flash memory, a mobile hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may in other embodiments also be an external storage device of the electronic device 1, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only for storing application software installed in the electronic device 1 and various types of data, such as codes of the program 12, but also for temporarily storing data that has been output or is to be output.

The processor 10 may be comprised of integrated circuits in some embodiments, for example, a single packaged integrated circuit, or may be comprised of multiple integrated circuits packaged with the same or different functions, including one or more central processing units (Central Processing unit, CPU), microprocessors, digital processing chips, graphics processors, combinations of various control chips, and the like. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects respective parts of the entire electronic device using various interfaces and lines, executes or executes programs or modules (a program 12 for realizing image matching for deep semantic understanding, etc.) stored in the memory 11, and invokes data stored in the memory 11 to perform various functions of the electronic device 1 and process the data.

The communication interface 13 may comprise a wired interface and/or a wireless interface (e.g. WI-FI interface, bluetooth interface, etc.), typically used to establish a communication connection between the electronic device 1 and other electronic devices and to enable connection communication between internal components of the electronic device.

The bus may be a peripheral component interconnect standard (peripheral component interconnect, PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others. The bus may be classified as an address bus, a data bus, a control bus, etc. The bus is arranged to enable a connection communication between the memory 11 and at least one processor 10 etc.

Fig. 2 shows only an electronic device with components, it being understood by a person skilled in the art that the structure shown in fig. 2 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than shown, or may combine certain components, or may be arranged in different components.

For example, although not shown, the electronic device 1 may further include a power source (such as a battery) for supplying power to each component, and preferably, the power source may be logically connected to the at least one processor 10 through a power management device, so that functions of charge management, discharge management, power consumption management, and the like are implemented through the power management device. The power supply may also include one or more of any of a direct current or alternating current power supply, recharging device, power failure detection circuit, power converter or inverter, power status indicator, etc. The electronic device 1 may further include various sensors, bluetooth modules, wi-Fi modules, etc., which will not be described herein.

The electronic device 1 may optionally further comprise a user interface, which may be a Display, an input unit, such as a Keyboard (Keyboard), or a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch, or the like. The display may also be referred to as a display screen or display unit, as appropriate, for displaying information processed in the electronic device 1 and for displaying a visual user interface.

It should be understood that the embodiments described are for illustrative purposes only and are not limited to this configuration in the scope of the patent application.

The program 12 stored in the memory 11 of the electronic device 1 is a combination of instructions that, when executed in the processor 10, may implement:

acquiring a query image and a plurality of support images, constructing a support image generalization model, and generalizing the support images to obtain a variation support image set;

constructing an image depth semantic understanding model, and extracting stable semantic features of a variation support image in a variation support image set;

constructing a deep semantic feature extraction network model, and extracting deep semantic features of the query image;

and carrying out similarity calculation on the deep semantic features of the query image and the stable semantic features of the variant support images, and selecting the support image corresponding to the variant support image with the similarity higher than a specified threshold value as an image matching result.

Specifically, the specific implementation method of the above instruction by the processor 10 may refer to descriptions of related steps in the corresponding embodiments of fig. 1 to 2, which are not repeated herein.

It should be noted that, the foregoing reference numerals of the embodiments of the present invention are merely for describing the embodiments, and do not represent the advantages and disadvantages of the embodiments. And the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, apparatus, article or method that comprises the element.

Claims

1. An image matching method for deep semantic understanding, the method comprising:

s1: obtaining a query image and a plurality of support images, constructing a support image generalization model, and generalizing the support images to obtain a variation support image set, wherein the support image generalization model takes the support images as input and takes the variation support image set as output;

2. The image matching method for deep semantic understanding according to claim 1, wherein the step S1 of acquiring the query image and the plurality of support images and constructing the support image generalization model comprises the steps of:

；

wherein:representing the acquired nth Zhang Zhicheng image;

3. The image matching method for deep semantic understanding according to claim 2, wherein in the step S1, a support image is subjected to generalization processing by using a support image generalization model to obtain a variant support image set, and the method comprises the following steps:

；

wherein:

representing support image +.>The mth generalization mapping processing result;

representing a nonlinear mapping function;

representing an mth generalization mapping processing template;

representing hadamard product operators;

4. The image matching method for depth semantic understanding according to claim 1, wherein the constructing an image depth semantic understanding model in the step S2 comprises:

5. The image matching method for depth semantic understanding according to claim 4, wherein the extracting stable semantic features of the variant support image in the variant support image set by using the image depth semantic understanding model in the step S2 comprises:

s21: the input layer receives the variant support image；

S22: the depth semantic feature extraction layer respectively extracts variant support imagesSelf-attention feature of->And multiscale local perceptionCharacteristics->As depth semantic feature->：

；

Wherein:

t represents a transpose;

d representsIs a dimension of (2);

；

wherein:

representing variant support image +.>Is described herein;

representing a ReLU activation function;

an exponential function that is based on a natural constant;

the representation is such that->Maximum parameter ∈>Wherein。

6. The image matching method for deep semantic understanding according to claim 1, wherein the constructing a deep semantic feature extraction network model in step S3 extracts deep semantic features of the query image, includes:

；

wherein:

；

wherein:

a multi-scale feature map representing a query image I;

representing a feature map of the query image I at a u-th scale;

；

wherein:

；

Wherein:

representing a ReLU activation function;

representing the deep semantic features of the query image I.

7. The image matching method for deep semantic understanding according to claim 1, wherein in the step S4, similarity calculation is performed on deep semantic features of the query image and stable semantic features of the variant support image, and a support image corresponding to the variant support image with similarity higher than a specified threshold is selected as an image matching result, which comprises:

；

wherein:

represents an L1 norm;