WO2023142602A1 - 图像处理方法、装置和计算机可读存储介质 - Google Patents

图像处理方法、装置和计算机可读存储介质 Download PDF

Info

Publication number
WO2023142602A1
WO2023142602A1 PCT/CN2022/131464 CN2022131464W WO2023142602A1 WO 2023142602 A1 WO2023142602 A1 WO 2023142602A1 CN 2022131464 W CN2022131464 W CN 2022131464W WO 2023142602 A1 WO2023142602 A1 WO 2023142602A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
feature
common
processed
view
Prior art date
Application number
PCT/CN2022/131464
Other languages
English (en)
French (fr)
Inventor
陈颖
徐尚
黄迪和
刘建林
刘永
汪铖杰
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to US18/333,091 priority Critical patent/US20230326173A1/en
Publication of WO2023142602A1 publication Critical patent/WO2023142602A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4038Scaling the whole image or part thereof for image mosaicing, i.e. plane images composed of plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/771Feature selection, e.g. selecting representative features from a multi-dimensional feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Definitions

  • the present application relates to the technical field of the Internet, and in particular to an image processing method, device and computer-readable storage medium.
  • the inventors of the present application found that in the existing image processing method, the image is processed by performing single-point step-by-step matching on the feature points in the image.
  • the processing rate of the feature points is slow, which makes the rate of image processing low, and the efficiency of image processing is low.
  • an image processing method, device, and computer-readable storage medium are provided.
  • An embodiment of the present application provides an image processing method executed by a computer device, including:
  • the correlation feature is used to characterize the mutual information between the images to be processed in the image pair to be processed;
  • At least one common-view feature point is extracted from each adjusted common-view image, and the image pair to be processed is processed based on the common-view feature point.
  • an image processing device including:
  • an acquisition unit configured to acquire an image pair to be processed, and perform image feature extraction on the image to be processed in the image pair to be processed, to obtain image features of the image to be processed;
  • An extracting unit configured to extract, from the image features, a correlation feature of the image pair to be processed, where the correlation feature is used to characterize the mutual information between the images to be processed in the image pair to be processed;
  • An identification unit configured to identify a common-view image of a common-view area in the image to be processed according to the associated feature, and calculate a scale difference between the common-view images
  • An adjustment unit configured to adjust the size of the common-view image based on the scale difference, to obtain an adjusted common-view image
  • a processing unit configured to extract at least one common-view feature point from each adjusted common-view image, and process the image pair to be processed based on the common-view feature point.
  • the present application also provides a computer device.
  • the computer device includes a memory and a processor, the memory stores computer-readable instructions, and the processor implements the steps described in the above image processing method when executing the computer-readable instructions.
  • the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium has computer-readable instructions stored thereon, and when the computer-readable instructions are executed by a processor, the steps described in the above-mentioned image processing method are realized.
  • the present application also provides a computer program product.
  • the computer program product includes computer-readable instructions, and when the computer-readable instructions are executed by a processor, the steps described in the above-mentioned image processing method are implemented.
  • FIG. 1 is a schematic diagram of an implementation scene of an image processing method provided in an embodiment of the present application
  • FIG. 2 is a schematic flow diagram of an image processing method provided in an embodiment of the present application.
  • Fig. 3a is a schematic diagram of multi-scale feature extraction of an image processing method provided by an embodiment of the present application.
  • Fig. 3b is a schematic flow chart of an image processing method provided in an embodiment of the present application.
  • Fig. 4a is a schematic structural diagram of an image processing model of an image processing method provided in an embodiment of the present application.
  • Fig. 4b is a schematic diagram of the focus center coordinates and relative center point offset of an image processing method provided by an embodiment of the present application;
  • FIG. 5 is a schematic diagram of an overall flow of an image processing method provided in an embodiment of the present application.
  • FIG. 6 is another schematic flowchart of an image processing method provided in an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of an image processing device provided by an embodiment of the present application.
  • Fig. 8 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • Embodiments of the present application provide an image processing method, device, and computer-readable storage medium.
  • the image processing apparatus may be integrated in computer equipment, and the computer equipment may be a server, or a terminal or other equipment.
  • the server can be an independent physical server, or a server cluster or distributed system composed of multiple physical servers, and can also provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication , middleware services, domain name services, security services, network acceleration services (Content Delivery Network, CDN), and cloud servers for basic cloud computing services such as big data and artificial intelligence platforms.
  • Terminals may include, but are not limited to, mobile phones, computers, intelligent voice interaction devices, smart home appliances, vehicle-mounted terminals, and aircraft. Terminals and servers can be connected directly or indirectly through wired or wireless communication, which is not limited in this application.
  • FIG. 1 is a schematic diagram of an implementation scene of an image processing method provided by an embodiment of the present application, wherein the computer device can be a server or a terminal, and the The computer device can obtain the image pair to be processed, and perform image feature extraction on the image to be processed in the image to be processed to obtain the image feature of the image to be processed; extract the associated feature of the image to be processed from the image feature; according to the associated feature, Identify the common-view image in the common-view area in the image to be processed, and calculate the scale difference between the common-view images; based on the scale difference, adjust the size of the common-view image to obtain the adjusted common-view image; At least one common-view feature point is extracted from the adjusted common-view image, and the image pair to be processed is processed based on the common-view feature point.
  • the computer device can be a server or a terminal
  • the computer device can obtain the image pair to be processed, and perform image feature extraction on the image to be processed in the image to be processed to obtain the image feature of the image to be
  • the embodiments of the present application can be applied to various scenarios, including but not limited to cloud technology, artificial intelligence, intelligent transportation, assisted driving, and the like.
  • the schematic diagram of the implementation environment scene of the image processing method shown in Figure 1 is just an example.
  • the implementation environment scene of the image processing method described in the embodiment of the application is to more clearly illustrate the technical solution of the embodiment of the application, and does not constitute a The limitation of the technical solution provided by the application embodiment.
  • Those skilled in the art know that with the evolution of image processing and the emergence of new business scenarios, the technical solutions provided in this application are also applicable to similar technical problems.
  • Common view area on multiple images of the same scene or the same target object imaged under different shooting conditions, the image area where the scene or the target object is located, where the target object can be biological or non-biological, biological refers to independent Living organisms, for example, can be any of natural people, animals, plants, etc.
  • Non-living objects refer to various objects, such as any of vehicles, buildings, tables, chairs, etc.
  • Different shooting conditions can be, for example, different perspectives , different distances or different times.
  • a plurality means at least two. For example, when a binocular camera is used to shoot a cat from left and right perspectives to obtain image A and image B, the area where the cat is located in image A and image B can be a common viewing area.
  • image A and image B are obtained by shooting a certain road scene at different time points, then the image area where the road scene in image A and image B is located may be a common view area.
  • the area shape of the common view area can be various shapes as required, for example, it can be a rectangle, a square or a circle.
  • Feature point In image processing, a feature point refers to a point where the gray value of the image changes drastically or a point with a large curvature on the edge of the image (ie, the intersection point of two edges). Image feature points play a very important role in image matching algorithms based on feature points. Image feature points can reflect the essential characteristics of the image and can identify the target object in the image. Image matching can be completed by matching feature points.
  • Feature matching Obtain the pixel-level or sub-pixel-level correspondence of images of the same object imaged at two different viewing angles.
  • Scale Describes the imaging size of the object on the camera plane. The smaller the scale, the smaller the image of the object on the camera plane. The larger the scale, the larger the image of the object on the camera plane.
  • FIG. 2 is a schematic flowchart of an image processing method provided in an embodiment of the present application.
  • the image processing method is executed by a computer device, and the computer device may be a server, or Can be a terminal.
  • the image processing method includes:
  • the image pair to be processed may be a whole composed of multiple images to be processed, for example, may be a whole composed of two images to be processed.
  • the image to be processed in the image pair to be processed may be an image with a common view area, that is, two images of the same scene or the same object taken at different angles of view, different distances, or at different times.
  • the image feature may be feature information characterizing the image to be processed.
  • the image pair to be processed can be obtained from a memory connected to the image processing device, or it can be obtained from other data storage terminals. It can also be obtained from the memory of the physical terminal, or from a virtual storage space such as a data set or a corpus, which is not limited here.
  • image feature extraction can be performed on the image to be processed in the image pair to be processed.
  • the image feature extraction of the image to be processed in the image pair to be processed can be performed in a variety of ways, for example, feature mapping can be performed on the image to be processed in the image pair to be processed to obtain a feature map corresponding to the image to be processed; Perform dimension reduction processing on the feature map corresponding to the image to be processed to obtain the feature map after dimension reduction; perform multi-scale feature extraction on the feature map after dimension reduction to obtain the scale image features corresponding to each scale of the image to be processed; The image to be processed is fused with the scale image features corresponding to each scale to obtain the image features of the image to be processed.
  • the feature map can be the feature information representing the image to be processed in each channel (Channel).
  • the data exists in three-dimensional form, which can be viewed as A number of two-dimensional pictures are superimposed together, and each two-dimensional picture can be called a feature map.
  • the feature map after dimensionality reduction may be a feature map obtained after dimensionality reduction of the image to be processed
  • the scale image features may be image features corresponding to each scale obtained after multi-scale feature extraction of the image to be processed.
  • a convolution kernel can be used to perform convolution processing on the image to be processed, so that the feature map of the image to be processed Go to the feature map layer to obtain the feature map corresponding to the image to be processed.
  • dimensionality reduction processing can be performed on the feature map corresponding to the image to be processed.
  • Figure 3a is a schematic diagram of multi-scale feature extraction of an image processing method provided by the embodiment of the present application, assuming that the dimension of the feature map corresponding to the image to be processed is w ⁇ h ⁇ 1024, where w represents the image to be processed Corresponding width, h represents the length corresponding to the image to be processed, 1024 represents the number of channels corresponding to the feature map, the feature map corresponding to the image to be processed can be convoluted, and the number of channels is reduced from 1024 to 256 channels to obtain dimensionality reduction
  • the dimension corresponding to the post feature map is w ⁇ h ⁇ 256.
  • multi-scale feature extraction can be performed on the feature map after dimension reduction.
  • convolution kernels of different sizes can be used to convolve the feature map after dimension reduction to obtain scale images of multiple scales.
  • features that is, the scale image features corresponding to each scale of the image to be processed can be obtained.
  • k represents the convolution kernel size (Kernel size), and s represents the convolution step size (Stride, also known as stride), in this way, the feature map after dimensionality reduction can be convoluted with a convolution kernel size of 4 ⁇ 4 and a step size of 2 ⁇ 2, and the corresponding dimension of this scale is w/2 ⁇ h/2 ⁇ 256
  • the size of the convolution kernel is 8 ⁇ 8 and the step size is 2 ⁇ 2 to convolve the feature map after dimension reduction
  • the corresponding dimension of the scale is w/2 ⁇ h/2 ⁇ 128 Scale image features
  • the scale image features corresponding to these three scales can be spliced to obtain multi-scale image features with a dimension of w/2 ⁇ h/2
  • the scale image features corresponding to each scale of the image to be processed can be fused.
  • the scale image features corresponding to each scale can be fused at the channel level, The image feature corresponding to the dimension of w/2 ⁇ h/2 ⁇ 256 corresponding to the image to be processed is obtained.
  • FIG. 3b is a schematic flowchart of an image processing method provided in an embodiment of the present application, wherein the steps indicated by the solid arrows represent the steps belonging to the model training and application phase, and the dotted lines The steps indicated by the arrows only belong to the steps in the model training stage.
  • the image pair to be processed includes the images Ia and Ib to be processed, and the length is H and the width is W (ie H ⁇ W).
  • the residual network (Resnet50) is down-sampled.
  • Resnet50-Layer3 Shared Layer3, the third layer structure in Resnet50
  • the number of channels can be 1024 , so that the dimension corresponding to the feature map after dimension reduction can be obtained as W/16 ⁇ H/16 ⁇ 256, so that the feature maps after dimension reduction corresponding to the images Ia and Ib to be processed can be input to the multi-scale feature extraction module (Multi- Scale Feature Extractor) performs multi-scale feature extraction and fusion to obtain image features corresponding to dimensions of W/32 ⁇ H/32 ⁇ 256 for the images Ia and Ib to be processed.
  • Multi- Scale Feature Extractor performs multi-scale feature extraction and fusion to obtain image features corresponding to dimensions of W/32 ⁇ H/32 ⁇ 256 for the images Ia and Ib to be processed.
  • the association feature can be used to characterize the mutual information between the images to be processed in the image pair to be processed, and the mutual information can be information that characterizes the association relationship between the images to be processed, for example, it can represent the information between the images to be processed
  • the associated feature can be a feature map, and the dimension of the feature map can be, for example, 256 dimensions, which can be expressed as F ⁇ R h ⁇ w ⁇ 256 .
  • the image features can be flattened to obtain the flat image features of the image to be processed, and the flattened image features can be obtained.
  • Feature extraction is to obtain the initial attention feature corresponding to the image to be processed, and perform cross feature extraction on the initial attention feature to obtain the associated feature of each image to be processed in the image pair to be processed.
  • the flat image feature can be a feature obtained after flattening the image feature corresponding to the image to be processed, and the initial attention feature can be understood as being used to characterize each feature in the image feature corresponding to the image to be processed A feature that is associated with other features.
  • the flatten layer can be used to flatten the image feature, so that the dimension is w/2 ⁇ h/2 ⁇ 256
  • the image features are flattened to obtain the one-dimensional flat image features corresponding to the image to be processed.
  • the flat image feature can include multiple sub-flat image features, and feature extraction can be performed on the flat image feature to obtain each sub-flat image feature
  • the initial association feature corresponding to the sub-flat image feature based on the initial association feature, determine the initial association weight corresponding to each sub-flat image feature in the flat image feature, and determine the initial association weight corresponding to each sub-flat image feature in the flat image feature according to the initial association weight.
  • the flat image features are fused to obtain the initial attention features corresponding to the image to be processed.
  • the sub-flat image feature may be at least one of the flat image features, for example, the flat image feature may be divided into multiple regions, and the corresponding feature of each region is a sub-flat image feature.
  • the feature extraction of the flat image features is the process of feature mapping the sub-flat image features in the flat image features.
  • the mapped features are the initial associated features corresponding to the sub-flat image features.
  • the initial associated features can be the sub-flat image features The feature information used in the feature to determine the association relationship with other sub-flat image features.
  • the initial correlation weight may represent the importance of each sub-flat image feature in the flat image feature.
  • each flat image feature can be converted into a three-dimensional space vector, including a query vector (Query, Q for short) , key vector (Key, referred to as K) and value vector (Value, referred to as V), the specific conversion method can be understood as the fusion of each flat image feature and three-dimensional conversion parameters, the query vector, key The vector and the value vector are used as the initial associated features corresponding to each flat image feature.
  • Query query vector
  • Q key vector
  • Value vector Value vector
  • V value vector
  • each sub-flat image in the flat image feature can be determined based on the initial associated feature
  • the initial association weight corresponding to the feature wherein, based on the initial association feature, there are many ways to determine the initial association weight corresponding to each sub-flat image feature in the flat image feature, for example, the attention network can be used to convert the flat image
  • the query vector corresponding to each sub-flat image feature in the feature is dot producted with the key vectors of other sub-flat image features, and the attention score (Score) corresponding to each sub-flat image feature can be obtained, and then based on each sub-flat image
  • the attention score corresponding to the feature is used to calculate the initial association weight corresponding to each sub-flat image feature.
  • each sub-flat image feature in the flat image feature can be fused according to the initial association weight.
  • each sub-flat image feature in the flat image feature may be fused based on the initial association weight weighted, and the weighted sub-flat image features are accumulated, and the initial attention feature corresponding to the image to be processed can be obtained according to the accumulation result.
  • the image pair to be processed includes image A to be processed and image B to be processed
  • the flat image feature corresponding to image A to be processed includes 4 sub-flat image features, namely G, B, C and D
  • the initial association weights corresponding to each sub-flat image feature are respectively g, b, c and d, and then each sub-flat image feature in the flat image feature can be weighted based on the initial association weight, and Gg, Bb, Cc and Dd, so that the weighted sub-flat image features can be accumulated, and the accumulated result is Gg+Bb+Cc+Dd.
  • the initial attention feature corresponding to the image to be processed can be obtained as Gg+Bb+Cc +Dd.
  • the flattened image features can be input into an encoding module (Transformer Encoder) to obtain initial attention features corresponding to the image to be processed.
  • FIG. 4a is a schematic structural diagram of an image processing model of an image processing method provided in an embodiment of the present application, wherein it is assumed that the image pair to be processed includes images Ia and Ib to be processed to obtain the image to be processed Taking the initial attention feature corresponding to image Ia as an example, the flat image feature corresponding to image Ia to be processed can be Input to the self-attention sub-module of the Transformer Encoder module on the left side of the figure to obtain the initial attention features corresponding to the image to be processed.
  • the flat image feature corresponding to the image Ia to be processed can be Converted to three-dimensional space vectors of K, Q, and V, and input to the self-attention sub-module of the Transformer Encoder module, in this self-attention sub-module, through the multi-head attention unit (Multi-head Attention) for the performing feature extraction on the flat image feature to obtain an initial correlation weight corresponding to each sub-flat image feature in the flat image feature, and weighting and merging each sub-flat image feature in the flat image feature according to the initial correlation weight, To get the output of the multi-head attention unit, and then the output of the multi-head attention unit and the flat image feature can be obtained through the merging unit (Concat).
  • Multi-head Attention multi-head attention unit
  • the Merge, and then the merged result can be normalized through the normalization unit (Layer Normalization), so that the feedforward network subunit (Feed Forward) in the feedforward network and residual connection unit (FeedForward&Add) can be used to
  • the result of normalization processing is processed by full connection, and the result of full connection processing and the combined result are processed by residual connection through the feedforward network and the residual connection subunit (Add) in the residual connection unit to obtain the to-be-processed Initial attention features corresponding to image Ia.
  • the cross feature extraction can be performed on the initial attention feature to obtain the value of each image to be processed in the image pair to be processed.
  • Associated features there can be multiple ways to extract the cross feature of the initial attention feature, for example, the image feature and the initial attention feature can be extracted to obtain the cross correlation feature corresponding to each image to be processed, According to the cross-correlation feature, the cross-correlation weight corresponding to the image to be processed is determined, and based on the cross-correlation weight, the initial attention feature corresponding to each image to be processed is weighted to obtain the correlation feature corresponding to the image to be processed.
  • the cross-correlation feature may be a feature used to determine the correlation between the images to be processed in the image pair to be processed, and the cross-correlation weight may represent the degree of correlation between the images to be processed in the image pair to be processed,
  • the image feature may be a flattened image feature, that is, a flat image feature.
  • an attention network can be used to extract the image features and the initial attention feature is used for cross-feature extraction.
  • the initial attention feature corresponding to a certain image to be processed can be converted into a query vector, and the image feature of another image to be processed (the image feature can be converted into a flat image feature ) into a key vector and a value vector.
  • the specific conversion method can be understood as the fusion of the image features and the initial attention features with the conversion parameters of the corresponding dimension.
  • the corresponding query vector, key vector and value vector are used as each An image feature corresponds to a cross-correlation feature.
  • the cross-correlation weight corresponding to the image to be processed can be determined according to the cross-correlation feature, wherein , according to the cross-correlation feature, there are many ways to determine the cross-correlation weight corresponding to the image to be processed.
  • the query vector is dot producted with the key vectors of the image features corresponding to other images to be processed, and the attention score of the image feature corresponding to a certain image to be processed in the image pair to be processed and the corresponding initial attention feature can be obtained respectively, and then based on The attention score is used to calculate the cross-correlation weight of each image feature and the corresponding initial attention feature.
  • the initial attention feature corresponding to each image to be processed can be weighted based on the cross-correlation weight to obtain the corresponding associated features.
  • the cross-correlation weight there may be multiple ways of weighting the initial attention feature corresponding to each image to be processed.
  • the image pair to be processed includes image A to be processed and image B to be processed
  • the initial attention feature corresponding to image A to be processed is E
  • the image feature corresponding to image B to be processed is And determined the cross-correlation weight corresponding to the initial attention feature E as e
  • the image feature The corresponding cross-correlation weight is f
  • the initial attention feature E and image features can be based on the cross-correlation weight as Fusion is performed to obtain associated features.
  • the initial attention feature E and image features can be based on cross-correlation weights as Weighted and summed, the associated features are obtained as
  • the flat image corresponding to the image to be processed Ia can be feature Input to the self-attention sub-module of the Transformer Encoder module on the left side of the figure to obtain the initial attention feature corresponding to the image to be processed, and input the initial attention feature to the cross-attention sub-module of the Transformer Encoder module, specifically
  • the initial attention feature corresponding to the image Ia to be processed can be converted into a query vector Q, and the flat image feature corresponding to the image Ib to be processed Converted into a key vector K and a value vector V, and then can be input into the multi-head attention unit of the cross-attention sub-module, and the cross-feature extraction of the image feature and the initial attention feature is performed through the multi-head attention unit, and each The cross-correlation feature corresponding to the image to be processed, according to the
  • the associated feature corresponding to the image to be processed Ib can be obtained by using the method of acquiring the associated feature corresponding to the image to be processed Ia, which will not be repeated here.
  • the co-view image may be an image of a region where the co-coherent region of each image to be processed is located
  • the scale difference value may be a numerical value representing a scale difference between the co-alignment images of the image pair to be processed.
  • the preset area feature can be obtained, and the image processing model after training can be used to carry out the pre-set area feature.
  • Feature extraction to obtain the initial region feature, perform cross-feature extraction on the initial region feature and the associated feature, and obtain the common-view region feature corresponding to the initial region feature, based on the common-view region feature and the associated feature, in the image to be processed
  • the common-view image in the common-view area is identified in .
  • the preset area feature may be a preset feature information used to characterize the bounding box of the common-view area, which can be understood as an abstract expression of information learned in advance to detect the bounding box of the common-view area.
  • the preset area feature It may be a 256-dimensional feature vector (Q ⁇ R 1 ⁇ 256 ).
  • the initial region feature may be feature information obtained through fusion based on the correlation between each feature in the preset region features, and the common view region feature may be feature information representing a bounding box corresponding to the common view region of the image to be processed.
  • the image processing model after training can be a trained model for processing the image to be processed in the image pair to be processed, and can be a Transformer model.
  • the specific structure of the image processing model after training can refer to the image processing provided in Figure 4a Schematic diagram of the structure of the model.
  • the preset regional features can be pre-designed and input by the developer, or it can be automatically generated directly according to the pre-acquired regional feature template, etc., which is not limited here.
  • the trained image processing model can be used to perform feature extraction on the preset regional features to obtain the initial regional features.
  • the area sub-feature may be at least one of preset area features, for example, the preset area feature may be divided into multiple areas, and the feature corresponding to each area is an area sub-feature.
  • the feature extraction of the preset regional features is to perform feature mapping on the regional sub-features in the preset regional features, and the mapped features are the regional associated features corresponding to the regional sub-features, which can be used to determine the preset
  • the area association weight may represent the importance of each area sub-feature in the preset area feature.
  • the attention network can be used to Preset regional features for feature extraction to obtain regional associated features corresponding to each regional sub-feature in the preset regional feature.
  • each regional sub-feature can be converted into a three-dimensional space vector, including query vector, key Vector and value vector, the specific conversion method can be understood as the fusion of each regional sub-feature and three-dimensional conversion parameters, and the query vector, key vector and value vector are used as the regional association corresponding to each regional sub-feature feature.
  • the preset regional feature can be determined based on the regional correlation feature
  • the regional association weight corresponding to each regional sub-feature in wherein, based on the regional association feature, there are many ways to determine the regional association weight corresponding to each regional sub-feature in the preset regional feature, for example, attention can be used
  • the network performs a dot product of the query vector corresponding to each regional sub-feature in the preset regional features with the key vectors of other regional sub-features to obtain the attention score corresponding to each regional sub-feature, and then based on the corresponding The attention score of , to calculate the region association weight corresponding to each region sub-feature.
  • each area sub-feature in the preset area feature can be fused according to the area association weight.
  • the region association weight there are many ways to fuse each region sub-feature in the preset region feature. For example, each region sub-feature in the preset region feature can be fused based on the region association weight. weighted, and the weighted regional sub-features are accumulated, and the initial regional feature corresponding to the preset regional feature can be obtained according to the accumulation result.
  • the decoding module in the trained image processing model on the right side of the figure can be used to perform feature extraction on the preset regional features to obtain the preset regional features The region-associated features corresponding to each region sub-feature.
  • the preset region feature (Single Query) can be converted into K, Q and V three Dimension space vector, and input to the normalization unit of the Transformer Decoder module for normalization processing, and the normalized K, Q, V three space vector input to the multi-head self-attention unit (Multi- In head Self-Attention), the multi-head self-attention unit is used to extract the feature of the preset region to obtain the region association feature corresponding to each region sub-feature in the preset region feature.
  • each regional sub-feature in the preset regional feature is weighted, so that the weighted result is input into the regularization and residual Feature fusion is performed in the difference connection unit (Dropout&Add) to obtain the initial region features corresponding to the image Ia to be processed.
  • Dropout&Add difference connection unit
  • the cross feature extraction can be performed on the initial region feature and the associated feature.
  • feature extraction can be performed on the initial region feature and the associated feature to obtain the image associated feature corresponding to the associated feature, and the The initial region correlation feature corresponding to the initial region feature, determine the image correlation weight corresponding to the correlation feature according to the image correlation feature and the initial region correlation feature, based on the image correlation weight, weight the correlation feature to obtain the common view image feature, and The common-view image feature is fused with the initial region feature to obtain the common-view region feature.
  • the feature extraction of the associated feature is the feature mapping of the associated feature
  • the mapped feature is the image associated feature corresponding to the associated feature.
  • the image associated feature can be used to determine the relationship between the associated feature and the initial region feature.
  • the feature information of the association relationship; the feature extraction of the initial area feature is the feature mapping of the initial area feature, and the mapped feature is the initial area association feature corresponding to the initial area feature.
  • the initial area association feature can be used for Determine the feature information of the association relationship between the initial region feature and the associated feature
  • the image association weight can represent the degree of association between the associated feature and the initial region feature
  • the common-view image feature can be the The feature information of the relationship between them.
  • the attention network can be used To perform feature extraction on the initial region feature and the associated feature.
  • the initial region feature corresponding to a certain image to be processed can be converted into a query vector, and the corresponding associated feature can be converted into a key vector and a value vector.
  • the specific conversion The method can be understood as the fusion of the initial region feature and the associated feature with the conversion parameters of the corresponding dimension, the corresponding query vector is used as the initial region associated feature corresponding to the initial region feature, and the corresponding key vector and value vector are used as the The associated feature corresponds to the image associated feature.
  • the image association weight corresponding to the association feature wherein, there are many ways to determine the image association weight corresponding to the association feature according to the image association feature and the initial region association feature.
  • the query vector of the associated feature is dot producted with the key vector of the initial area associated feature corresponding to the initial area feature, and the attention score of each feature in the associated feature can be obtained respectively, and then based on the attention score, the corresponding value of the image to be processed is calculated.
  • the association feature can be weighted based on the image association weight.
  • the value vector in the image association feature corresponding to the association feature can be weighted according to the image association weight, and the weighted value vector Fusion is performed to obtain common-view image features.
  • the common-view image feature and the initial area feature can be fused to obtain the common-view area feature.
  • the corresponding common-view area feature is taken as an example.
  • the associated feature fa corresponding to the image Ia to be processed can be input into the Transformer Decoder module on the right side of the figure to obtain the common-view area feature corresponding to the image Ia to be processed.
  • it can be
  • the initial region feature and the associated feature are used for feature extraction.
  • the initial region feature corresponding to the image to be processed Ia can be weighted with the corresponding preset region feature, and the weighted result is converted into a query vector Q, that is, the initial region association feature, convert the associated feature fa corresponding to the image Ia to be processed into a value vector V, and perform positional encoding on the associated feature fa through the Positional Encoding module (Positional Encoding), and convert the positional encoding result corresponding to fa into a key vector K, based on
  • the value vector V and the key vector K can obtain the image association features corresponding to the association features, and then the image association features and the initial area association features can be normalized through the normalization unit, and the normalization processing results can be input to the multi-head attention
  • the multi-head attention unit is used to determine the image association weight corresponding to the association feature according to the image association feature and the initial region association feature, and based on the image association weight, the association feature is weighted to obtain the common-view image feature
  • the method of acquiring the common-view area feature corresponding to the image to be processed Ia can be used to acquire the common-view area feature corresponding to the image Ib to be processed, and details will not be described here.
  • the common-view image in the common-view region can be identified in the image to be processed based on the common-view region feature and the associated feature.
  • there may be multiple ways to identify the common-view image in the common-view area in the image to be processed for example, based on the common-view area feature and the associated feature, Calculate the common-view weight corresponding to the associated feature, determine the center of attention coordinates in the image to be processed according to the common-view weight and the associated feature, perform regression processing on the common-view area feature, and obtain the relative center corresponding to the common-view area Point offset, according to the attention center coordinates and the relative center point offset, identify the common-view image in the common-view area in the image to be processed.
  • the common view weight can represent the importance of the feature of each position in the associated feature in the associated feature
  • the attention center coordinate can be the degree of importance in the common view area determined based on the common view weight
  • the coordinates of the higher center can be understood as the focus center of the common view area
  • the relative center point offset can be the offset distance of the focus center coordinates relative to the bounding box of the common view area, according to the focus center coordinates and the corresponding relative center
  • the point offset can determine a rectangular frame, that is, it can determine the common view area.
  • the common-view weight corresponding to the associated feature.
  • the number product the common-view weight according to the operation result.
  • the common-view weight can be expressed as
  • A represents the common view weight corresponding to the image to be processed
  • dot() represents the dot product operation function
  • Q represents the associated feature
  • F represents the common view area feature
  • R represents the dimension
  • h represents the length of the common view weight distribution
  • w represents the common view weight distribution. Depending on the width of the weight distribution.
  • the attention center coordinates can be determined in the image to be processed according to the co-viewing weight and the co-viewing feature.
  • the common view weight and the associated feature there are many ways to determine the coordinates of the attention center in the image to be processed. For example, according to the common view weight and the associated feature, each The attention weight of the preset coordinate point is weighted based on the attention weight to the preset coordinate point to obtain a weighted coordinate point, and the weighted coordinate points are accumulated to obtain the attention center coordinate in the image to be processed.
  • the attention weight can represent the attention degree of each preset coordinate point in the common view area, and can be understood as representing the probability that each preset coordinate point in the common view area is the geometric center point of the common view area.
  • the coordinate points can be the coordinate points in the preset relative coordinate map. For example, an image with a size of w*h can be divided into multiple 1*1 coordinate grids (Grid), then the relative coordinate map can be obtained, and the relative coordinate
  • the coordinates of each Grid in the figure are coordinates of preset coordinate points, and the weighted coordinate points may be weighted coordinate points based on attention weights.
  • the attention center module calculates the attention weight of each preset coordinate point in the common view area to obtain the attention center coordinates of the common view area.
  • the associated features can be converted into the form of a feature map, so that The cross-product operation can be performed on the common-view weight and the associated feature, that is, A ⁇ F, and the result of the cross-product operation and the associated feature can be residually connected to obtain the residual connection processing result A ⁇ F+F, and then the residual
  • the difference connection processing result A ⁇ F+F is convoluted through a Fully Convolution Network (FCN) to generate a common-view area probability map P, which is the central coordinate probability distribution Pc(x, y) in the common-view area ), which can be used to characterize the attention weight corresponding to each preset coordinate point in the common-view area, where the common-view area probability map P can be expressed as
  • represents the cross product operation
  • + represents the residual connection processing
  • softmax() represents the logistic regression function
  • conv 3 ⁇ 3 can represent the convolution processing with a convolution kernel size of 3 ⁇ 3.
  • the preset coordinate point After calculating the attention weight of each preset coordinate point in the common-view area according to the common-view weight and the associated feature, the preset coordinate point can be weighted based on the attention weight to obtain the weighted coordinate point.
  • the weighted coordinate points are accumulated to obtain the attention center coordinates in the image to be processed.
  • there are many ways of weighting and summing the preset coordinate points based on the attention weight Set the coordinate points to be weighted and summed to obtain the coordinates of the center of attention in the common view area, which can be expressed as
  • H represents the length of the image to be processed
  • W represents the width of the image to be processed
  • x represents the abscissa in the relative coordinate graph
  • y represents the vertical coordinate in the relative coordinate graph
  • represents the summation symbol .
  • regression processing can be performed on the common view area feature to obtain the relative center point offset corresponding to the common view area.
  • FIG. 4b is a schematic diagram of the focus center coordinates and the relative center point offset of an image processing method provided by an embodiment of the present application.
  • the common-view image in the common-view area can be identified in the image to be processed according to the focus center coordinates and the relative center point offset.
  • the attention center coordinates and the relative center point offset there may be many ways to identify the common-view image in the common-view area in the image to be processed, for example, according to the attention center coordinates and the relative center point offset, calculating the geometric center coordinates and border size information of the common view area in the image to be processed, based on the geometric center coordinates and the border size information, determining the common view of the image to be processed in the image to be processed segment the common-view area in the image to be processed to obtain a common-view image in the common-view area.
  • the coordinates of the geometric center may be the coordinates of the geometric center of the rectangular frame corresponding to the common view area
  • the boundary size information may be information including the size of the side length of the rectangular frame corresponding to the common view area
  • the geometric center coordinates and boundary size information of the common view area in the image to be processed there are many ways to calculate the geometric center coordinates and boundary size information of the common view area in the image to be processed. For example, please continue to refer to FIG. 4b, assuming Focus on the center coordinates as (x c , y c ), offset from the center point (l, t, m, j), and assume that j is greater than t, m is greater than l, and the common view area is located in the first quadrant of the relative coordinate diagram , then the abscissa of the geometric center coordinate can be calculated as [(l+m)/2]-l+x c , and the ordinate of the geometric center coordinate can be calculated as [(t+j)/2]+y c -j, That is, the coordinates of the geometric center are ([(l+m)/2]-l+x c , [(t+j)/2]+y c -j), and the boundary size information of the rectangular frame corresponding to
  • the image processing model can be trained to obtain the trained image processing model, wherein, there can be multiple ways to train the image processing model, for example, please continue to refer to Figure 3b, which can be achieved through the same center of symmetry Specifically, image sample pairs can be obtained, and the preset image processing model can be used to predict the common-view area of each image sample in the image sample pair to obtain the predicted common-view area. The view area and the predicted common view area are used to train the preset image processing model to obtain the trained image processing model.
  • the image sample pair can be an image pair sample used for training a preset image processing model
  • the image samples in the image sample pair include marked common-view regions
  • the preset image processing model can be a pre-designed or
  • the predicted common view area may be the common view area corresponding to the image sample predicted by the preset image processing model based on the input image sample pair
  • training The stop condition may be any one of the training duration reaching the preset duration, the number of training times reaching the preset number of times, or the convergence of loss information.
  • the prediction corresponding to the predicted common-view area may be extracted from the predicted common-view area Geometric center coordinates and predicted boundary size information.
  • the labeled geometric center coordinates and labeled boundary size information corresponding to the labeled common view area are extracted. According to the predicted geometric center coordinates, predicted boundary size information, and labeled geometry Center coordinates and marked boundary size information are used to train the preset image processing model to obtain a trained image processing model.
  • the coordinates of the predicted geometric center may be the coordinates of the geometric center of the rectangular frame corresponding to the predicted common-view area
  • the predicted boundary size information may be information including the size of the side length of the rectangular frame corresponding to the predicted common-view area
  • the annotation geometry The center coordinates may be the coordinates of the geometric center of the rectangular frame corresponding to the common view area
  • the marked boundary size information may be information including the dimension of the side length of the rectangular frame corresponding to the common view area.
  • the predicted geometric center coordinates and predicted boundary size information corresponding to the predicted common-view area there are many ways to extract the predicted geometric center coordinates and predicted boundary size information corresponding to the predicted common-view area.
  • the predicted geometric center coordinates and predicted boundary size information corresponding to the predicted common view area are determined according to the predicted focus center coordinates and the predicted center point offset.
  • the coordinates of the predicted focus center can be the coordinates of a center with higher importance in the predicted common view area, which can be understood as the focus center of the predicted common view area, and the predicted center point offset can be the coordinates of the predicted focus center relative to the predicted common view.
  • the offset distance of the viewport's bounding box can be the coordinates of the predicted focus center relative to the predicted common view.
  • the predicted geometric center coordinates, predicted boundary size information, marked geometric center coordinates, and marked boundary sizes can be information, and train the preset image processing model to obtain a trained image processing model.
  • the preset image processing model for example, based on the predicted geometric center coordinates and Annotate the geometric center coordinates, calculate the cycle consistency loss information corresponding to the preset image processing model, and calculate the preset image based on the predicted geometric center coordinates and predicted boundary size information, as well as the labeled geometric center coordinates and labeled boundary size information Processing the boundary loss information and mean absolute error loss information corresponding to the model, using the cycle consistency loss information, the mean absolute error loss information and the boundary loss information as the loss information corresponding to the preset image processing model, and according to the loss The information is used to train the preset image processing model to obtain a trained image processing model.
  • the cycle consistency loss information may be loss information of a preset image processing model determined based on a cycle consistency loss function (cycle consistency loss), which is used to prevent the samples generated by the two generators from contradicting each other.
  • the average absolute error loss information may be loss information determined based on a regression loss function (L1Loss), which is used to measure an average error in a group of predicted values.
  • the boundary loss information may be loss information determined based on a boundary loss function (Generalized Intersection over Union), and is used to determine a loss function that predicts a gap between a bounding box of the common-view area and a bounding box that marks the common-view area.
  • the cycle consistency loss information based on the predicted geometric center coordinates and the marked geometric center coordinates, there are various ways to calculate the cycle consistency loss information corresponding to the preset image processing model.
  • the cycle consistency loss information can be expressed as
  • L loc represents the cycle consistency loss information
  • represents the norm symbol, where the norm is a function with the concept of "length".
  • a norm is a function that assigns a nonzero positive length or magnitude to all vectors in a vector space.
  • ⁇ 1 indicates the 1-norm
  • c i indicates the coordinates of the geometric center of the annotation
  • It is the coordinates of the center point obtained by exchanging the associated features between the input image pairs to be processed in the preset image processing model.
  • the boundary loss information and mean absolute error loss information can be expressed as
  • L L1 represents the average absolute error loss information
  • b i represents the coordinates of the marked geometric center and the marked boundary size information corresponding to the normalized marked common view area
  • the boundary loss information can be expressed as
  • L giou represents the boundary loss information
  • b i represents the coordinates of the geometric center of the annotation and the dimension information of the annotation boundary corresponding to the normalized common-view area of the annotation
  • the cycle consistency loss information, the mean absolute error loss information, and the boundary loss information are used as the loss information corresponding to the preset image processing model.
  • the loss information corresponding to the preset image processing model may represent for
  • two V100 graphics cards can be used to reproduce 35 generations of training (that is, 35 epochs) on the data set (Megadepth) to train the preset image processing model, for example, it can be trained for 48 hours.
  • the preset image processing model can be trained based on the loss information corresponding to the preset image processing model.
  • the preset image processing model meets the training conditions, and the preset image that meets the training conditions can be
  • the processing model serves as the post-training image processing model.
  • the scale difference between the common-view images can be calculated.
  • the size information of the common-view image corresponding to each image to be processed can be obtained, and the distance between the images to be processed can be calculated based on the size information.
  • the size information may be information including the size of the common-view image corresponding to each image to be processed, for example, may include size information such as length and width of the common-view image.
  • the size difference may be a value representing a gap between size information of the images to be processed, and the target size difference may be a size difference selected from the size differences as a scale difference.
  • the ratio between the width and length of each common-view image can be calculated to obtain the difference between the common-view images.
  • the image pair to be processed includes the images Ia and Ib to be processed
  • the common-view image corresponding to the image Ia to be processed is Ia'
  • the size information corresponding to the common-view image Ia' is length ha
  • the width is wa
  • the common-view image corresponding to the image Ib to be processed is Ib'
  • the size information corresponding to the common-view image Ib' is hb in length and wb in width
  • a target size difference satisfying a preset condition can be selected from the size differences.
  • the size difference with the largest value can be selected from the size difference as the target size difference.
  • the image pair to be processed includes images Ia and Ib to be processed
  • the common-view image corresponding to image Ia to be processed is Ia'
  • the size information corresponding to common-view image Ia' is length ha, width wa
  • the common-view image corresponding to Ib is Ib'
  • the size information corresponding to the common-view image Ib' is hb in length and wb in width
  • the four size differences can be obtained as (ha/hb, hb/ha, wa/wb ,wb/wa)
  • the adjusted common-view image may be a common-view image obtained after adjustment according to a scale difference between the common-view images.
  • the size of each common-view image can be adjusted based on the scale difference, so that feature points can be extracted and matched in common-view images of the same scale, etc.
  • processing wherein, based on the scale difference, there are many ways to adjust the size of the common-view image.
  • the original length and original width of the common-view image can be obtained, and the original length and original width of the common-view image can be compared with The scale difference values are multiplied respectively to obtain the adjusted scale and adjusted width, so that the common-view image can be scaled based on the adjusted scale and adjusted width, so as to adjust the size of the common-view image to obtain Adjusted common view image.
  • the common-view feature point may be a feature point extracted from the adjusted common-view image.
  • there are many ways to extract at least one common-view feature point in each adjusted common-view image, for example, corner detection algorithm (FAST algorithm), scale-invariant feature transformation (Scale-Invariant Feature Transform) can be used. , referred to as SIFT), accelerated robust feature algorithm (Speeded Up Robust Features, referred to as SURF) and other feature point extraction methods to extract at least one common-view feature point in each adjusted common-view image.
  • corner detection algorithm FAST algorithm
  • Scale-Invariant Feature Transform Scale-Invariant Feature Transform
  • SIFT Scale-Invariant Feature Transform
  • SURF accelerated robust feature algorithm
  • other feature point extraction methods to extract at least one common-view feature point in each adjusted common-view image.
  • the image pair to be processed can be processed based on the common-view feature point.
  • the common-view feature point of each image to be processed in the image pair to be processed in the adjusted common-view image Perform feature point matching to obtain the matched common-view feature point, based on the scale difference and the size information of the adjusted common-view image, determine the source feature point corresponding to the matched common-view feature point in the image to be processed, based on The source feature point is used to process the image pair to be processed.
  • the matched common-view feature point may be a common-view feature point matched with common-view feature points in other adjusted common-view images in an adjusted common-view image of an image to be processed, and the source feature point may be Corresponding feature points in the image to be processed corresponding to the common-view feature points after matching.
  • a distance matching method (Brute-Froce Matcher) to calculate the distance between a common-view feature point descriptor and all common-view feature point descriptors in other adjusted common-view images, and then sort the obtained distances, and take the closest common-view feature point as Matching points to obtain the matched common-view feature points.
  • the adjusted common-view image can be adjusted according to the matched common-view feature points in the adjusted common-view image (Pose Estimation).
  • the adjusted pose information corresponding to the common-view image after adjustment so that the original pose information corresponding to the image to be processed can be calculated based on the adjusted pose information, the scale difference and the size information of the adjusted common-view image, so that According to the original pose information, the position of the matched common-view feature point in the adjusted common-view image is inversely transformed to the image to be processed, so that the source corresponding to the matched common-view feature point can be determined in the image to be processed Feature points.
  • the random sampling consensus algorithm (RANdom SAmple Consensus, referred to as RANSAC) can be used to estimate the pose of the adjusted common-view image according to the matched common-view feature points in the adjusted common-view image.
  • RANSAC Random SAmple Consensus
  • the RANSAC algorithm is a In the data set including outliers, the parameters of the model are estimated in an iterative manner.
  • the image to be processed can be paired based on the source feature point. Processing, wherein, based on the source feature points, there are many ways to process the image to be processed, for example, the feature points in the image to be processed can be extracted, matched and positioned, and based on this The above image to be processed can be further applied, for example, data positioning can be performed in a virtual map application, which is not limited here.
  • FIG. 5 is a schematic flowchart of an image processing method provided in the embodiment of the present application.
  • the The image processing model regresses the common-view area of the input two images to be processed to obtain the location of the corresponding area, and segment the common-view image.
  • the common-view image is scale-aligned at the image level. Feature point extraction and matching are performed on the aligned adjusted common-view image. On the one hand, it can ensure that the feature points are extracted on an image of one scale, which can reduce the difficulty of feature point extraction and matching, and improve the efficiency of feature point extraction and matching.
  • matching feature points in the common view area can effectively improve the filtering effect of outliers, improve the accuracy of feature point matching, and at the same time improve the rate of feature point matching.
  • the third stage by calculating the image to be processed The corresponding original pose information, so that the position of the matched common-view feature point in the adjusted common-view image can be inversely transformed to the image to be processed according to the original pose information, so that the position of the common-view feature point can be determined in the image to be processed The source feature points corresponding to the common-view feature points after matching.
  • the image processing method provided by the embodiment of the present application can effectively handle feature extraction, matching and positioning in the case of large scale differences, which is denser than existing feature extraction and matching algorithms, and is suitable for image registration and large-scale scene reconstruction , Simultaneous Localization and Mapping (SLAM) and visual positioning and other tasks can improve the accuracy and speed of image processing, thereby improving the efficiency of image processing.
  • SLAM Simultaneous Localization and Mapping
  • the embodiment of the present application acquires the image pair to be processed, and extracts the image features of the image to be processed in the image pair to be processed to obtain the image features of the image to be processed; extracts the association of the image pair to be processed from the image features Features; according to the associated features, identify the common-view image in the common-view area in the image to be processed, and calculate the scale difference between the common-view images; based on the scale difference, adjust the size of the common-view image, and get the adjusted Common-view image: extract at least one common-view feature point from each adjusted common-view feature point, and process the image pair to be processed based on the common-view feature point.
  • FIG. 6 is another schematic flowchart of the image processing method provided by the embodiment of the present application. The specific process is as follows:
  • the server acquires an image sample pair, uses a preset image processing model to predict the common-view area of each image sample in the image sample pair, and obtains the predicted common-view area, and extracts the predicted common-view area from the predicted common-view area.
  • the predicted geometric center coordinates and predicted boundary size information corresponding to the predicted common view area are determined according to the predicted focus center coordinates and the predicted center point offset.
  • step 202 the server extracts the marked geometric center coordinates and marked boundary size information corresponding to the marked common view area from the marked common view area of the image sample, and calculates the Based on the cycle consistency loss information corresponding to the preset image processing model, based on the predicted geometric center coordinates and predicted boundary size information, as well as the marked geometric center coordinates and marked boundary size information, respectively calculate the boundary loss information corresponding to the preset image processing model and mean absolute error loss information.
  • step 203 the server uses the cycle consistency loss information, the mean absolute error loss information, and the boundary loss information as the loss information corresponding to the preset image processing model, and uses the loss information for the preset image processing model Perform training to obtain a trained image processing model.
  • the server obtains the image pair to be processed, performs feature mapping on the image to be processed in the image pair to be processed, obtains a feature map corresponding to the image to be processed, and performs dimensionality reduction processing on the feature map corresponding to the image to be processed , to obtain the feature map after dimensionality reduction, perform multi-scale feature extraction on the feature map after dimensionality reduction, obtain the scale image features corresponding to each scale of the image to be processed, and obtain the scale image features corresponding to each scale of the image to be processed Fusion is performed to obtain the image features of the image to be processed.
  • step 205 the server performs flattening processing on the image features to obtain the flat image features of the image to be processed, performs feature extraction on the flat image features, and obtains the initial Correlation feature, based on the initial correlation feature, determine the initial correlation weight corresponding to each sub-flat image feature in the flat image feature, and fuse each sub-flat image feature in the flat image feature according to the initial correlation weight to obtain The initial attention feature corresponding to the image to be processed.
  • the server performs cross-feature extraction on the image feature and the initial attention feature, obtains the cross-correlation feature corresponding to each image to be processed, and determines the cross-correlation weight corresponding to the image to be processed according to the cross-correlation feature , based on the cross-correlation weight, the initial attention feature corresponding to each image to be processed is weighted to obtain the associated feature corresponding to the image to be processed.
  • the server obtains the preset regional features, uses the trained image processing model to perform feature extraction on the preset regional features, and obtains the regional correlation features corresponding to each regional sub-feature in the preset regional features, based on the regional correlation feature, determining the regional association weight corresponding to each regional sub-feature in the preset regional feature, and merging each regional sub-feature in the preset regional feature according to the regional association weight to obtain an initial regional feature.
  • the server performs feature extraction on the initial region feature and the associated feature, obtains the image associated feature corresponding to the associated feature, and the initial region associated feature corresponding to the initial region feature, according to the image associated feature and the initial area
  • the correlation feature determines the image correlation weight corresponding to the correlation feature, based on the image correlation weight, the correlation feature is weighted to obtain the common-view image feature, and the common-view image feature is fused with the initial region feature to obtain the common-view area feature.
  • the server calculates the common view weight corresponding to the common view area based on the common view area feature and the associated feature, and calculates each preset coordinate point in the common view area according to the common view weight and the common view image feature
  • the attention weight is weighted to the preset coordinate point based on the attention weight to obtain a weighted coordinate point, and the weighted coordinate points are accumulated to obtain the attention center coordinate in the image to be processed.
  • the server performs regression processing on the feature of the common-view area to obtain the relative center point offset corresponding to the common-view area, and calculates the position of the common-view area at the waiting point according to the center of attention coordinates and the relative center point offset.
  • Processing the geometric center coordinates and border size information in the image based on the geometric center coordinates and the border size information, determining the common view area of the image to be processed in the image to be processed, and determining the common view area in the image to be processed
  • the region is segmented to obtain the common-view image in the common-view region.
  • the server acquires the size information of the common-view image corresponding to each image to be processed, calculates at least one size difference between the images to be processed based on the size information, and selects from the size difference that meets the preset requirements.
  • the conditional target size difference is set, and the target size difference is used as the scale difference between the common-view images. Based on the scale difference, the size of the common-view image is adjusted to obtain the adjusted common-view image.
  • the server extracts at least one common-view feature point from each of the adjusted common-view images, and the common-view feature of each of the images to be processed in the adjusted common-view image in the pair of images to be processed Points are matched with feature points to obtain the matched common-view feature points. Based on the scale difference and the size information of the adjusted common-view image, determine the source feature points corresponding to the matched common-view feature points in the image to be processed. Based on the source feature point, the image pair to be processed is processed.
  • the training in the embodiment of the present application extracts the associated features representing the mutual information between the images to be processed from the image features, and identifies the common information between the two images to be processed according to the associated features.
  • the common-view image of the common-view area is used to quickly extract and match the common-view feature points in the common-view area based on the common-view image, which improves the speed and accuracy of feature point matching, and can effectively deal with problems with large scale differences.
  • the extraction, matching and positioning of feature points thereby improving the accuracy and speed of image processing, thereby improving the efficiency of image processing.
  • steps in the flow charts involved in the above embodiments are shown sequentially according to the arrows, these steps are not necessarily executed sequentially in the order indicated by the arrows. Unless otherwise specified herein, there is no strict order restriction on the execution of these steps, and these steps can be executed in other orders. Moreover, at least some of the steps in the flow charts involved in the above embodiments may include multiple steps or stages, and these steps or stages are not necessarily executed at the same time, but may be executed at different times, The execution order of these steps or stages is not necessarily performed sequentially, but may be executed in turn or alternately with other steps or at least a part of steps or stages in other steps.
  • an embodiment of the present application further provides an image processing apparatus, the image processing apparatus may be integrated into a computer device, and the computer device may be a server.
  • the image processing device may include an acquisition unit 301, an extraction unit 302, an identification unit 303, an adjustment unit 304, and a processing unit 305, as follows:
  • An acquisition unit 301 configured to acquire an image pair to be processed, and perform image feature extraction on the image to be processed in the image pair to be processed, to obtain image features of the image to be processed;
  • An extraction unit 302 configured to extract, from the image features, a correlation feature of the image pair to be processed, where the correlation feature is used to characterize the mutual information between the images to be processed in the image pair to be processed;
  • the identification unit 303 is configured to identify the common-view image of the common-view area in the image to be processed according to the associated feature, and calculate the scale difference between the common-view images;
  • An adjustment unit 304 configured to adjust the size of the common-view image based on the scale difference to obtain an adjusted common-view image
  • the processing unit 305 is configured to extract at least one common-view feature point from each adjusted common-view image, and process the image pair to be processed based on the common-view feature point.
  • the identification unit 303 includes: an initial region feature extraction subunit, configured to obtain a preset region feature, and use a trained image processing model to perform feature extraction on the preset region feature to obtain an initial region feature;
  • the cross-feature extraction subunit is used to perform cross-feature extraction on the initial region feature and the associated feature to obtain the common-view region feature corresponding to the initial region feature;
  • the common-view image recognition subunit is used to based on the common-view region feature and The associated feature identifies the common-view image in the common-view area in the image to be processed.
  • the initial region feature extraction subunit includes: a region-associated feature extraction module, configured to perform feature extraction on the preset region feature using the trained image processing model to obtain each region in the preset region feature The regional association feature corresponding to the sub-feature; the regional association weight determination module is used to determine the regional association weight corresponding to each regional sub-feature in the preset regional feature based on the regional association feature; the initial regional feature fusion module is used to determine according to the regional association feature The regional association weight is used to fuse each regional sub-feature in the preset regional feature to obtain the initial regional feature.
  • the intersection feature extraction subunit includes: an intersection feature extraction module, configured to perform feature extraction on the initial region feature and the associated feature, to obtain the image associated feature corresponding to the associated feature, and the initial region feature Corresponding initial area association features; association weight determination module, used to determine the image association weight corresponding to the association feature according to the image association features and the initial area association features; common-view weighting module, for based on the image association weight, the The associated features are weighted to obtain common-view image features, and the common-view image features are fused with the initial region features to obtain common-view region features.
  • the common-view image recognition subunit includes: a common-view weight calculation module, configured to calculate the common-view weight corresponding to the related feature based on the common-view area feature and the related feature; the focus center coordinate determination module, It is used to determine the focus center coordinates in the image to be processed according to the common view weight and the associated feature; the relative center point offset regression module is used to perform regression processing on the common view area feature to obtain the corresponding common view area
  • the relative center point offset the common-view image recognition module is used to identify the common-view image in the common-view area in the image to be processed according to the attention center coordinates and the relative center point offset.
  • the common-view image recognition module includes: geometric center coordinates and boundary size information calculation submodules, which are used to calculate the position of the common-view area in the pending process according to the attention center coordinates and the relative center point offset.
  • the geometric center coordinates and boundary size information in the image is used to determine the common view area of the image to be processed in the image to be processed based on the geometric center coordinates and the boundary size information;
  • common view The image segmentation sub-module is used to segment the common-view area in the image to be processed to obtain the common-view image in the common-view area.
  • the focus center coordinate determination module includes: a focus weight calculation submodule for calculating the focus weight of each preset coordinate point in the common view area according to the common view weight and the common view image features ;
  • the coordinate point weighting sub-module is used to weight the preset coordinate point based on the attention weight to obtain the weighted coordinate point;
  • the coordinate point accumulation sub-module is used to accumulate the weighted coordinate point to obtain the image to be processed The coordinates of the center of interest in .
  • the image processing device further includes: an image sample pair acquisition unit, configured to acquire an image sample pair, the image sample of the image sample pair includes a marked common-view area; a predicted common-view area prediction unit, configured to Using a preset image processing model to predict the common-view area of each image sample in the image sample pair to obtain the predicted common-view area; the training unit is used to process the preset image according to the marked common-view area and the predicted common-view area. Perform training to obtain the trained image processing model.
  • an image sample pair acquisition unit configured to acquire an image sample pair, the image sample of the image sample pair includes a marked common-view area
  • a predicted common-view area prediction unit configured to Using a preset image processing model to predict the common-view area of each image sample in the image sample pair to obtain the predicted common-view area
  • the training unit is used to process the preset image according to the marked common-view area and the predicted common-view area. Perform training to obtain the trained image processing model.
  • the training unit includes: predicted geometric center coordinates and predicted boundary size information extraction subunits, used to extract the predicted geometric center coordinates and predicted Boundary size information; labeling geometric center coordinates and labeling boundary size information extraction subunits, used to extract the labeling geometric center coordinates and labeling boundary size information corresponding to the labeling common view area in the labeling common view area; training subunit, It is used to train the preset image processing model according to the predicted geometric center coordinates, predicted boundary size information, marked geometric center coordinates and marked boundary size information to obtain a trained image processing model.
  • the predicted geometric center coordinates and predicted boundary size information extraction sub-unit is used to: in the predicted common view area, extract the predicted focus center coordinates and the predicted center point offset corresponding to the predicted common view area. and determine the predicted geometric center coordinates and predicted boundary size information corresponding to the predicted common view area according to the predicted attention center coordinates and the predicted center point offset.
  • the training subunit includes: a first loss information calculation module, configured to calculate cycle consistency loss information corresponding to the preset image processing model based on the predicted geometric center coordinates and the marked geometric center coordinates; 2.
  • a loss information calculation module which is used to calculate respectively the boundary loss information and the mean absolute error loss corresponding to the preset image processing model based on the predicted geometric center coordinates and predicted boundary size information, as well as the marked geometric center coordinates and marked boundary size information information;
  • a training module configured to use the cycle consistency loss information, the mean absolute error loss information, and the boundary loss information as the loss information corresponding to the preset image processing model, and process the preset image according to the loss information
  • the model is trained to obtain the trained image processing model.
  • the extraction unit 302 includes: a flattening processing subunit, configured to perform flattening processing on the image features to obtain the flattened image features of the image to be processed; an initial attention feature extraction subunit, configured to Feature extraction is performed on the flat image feature to obtain the initial attention feature corresponding to the image to be processed; the associated feature cross extraction subunit is used to perform cross feature extraction on the initial attention feature to obtain each The associated features of the image to be processed.
  • the initial attention feature extraction subunit includes: an initial associated feature extraction module, configured to perform feature extraction on the flat image feature, and obtain the initial Correlation feature; initial correlation weight determination module, for determining the initial correlation weight corresponding to each sub-flat image feature in the flat image feature based on the initial correlation feature; initial attention feature fusion module, for based on the initial correlation weight Each sub-flat image feature in the flat image feature is fused to obtain an initial attention feature corresponding to the image to be processed.
  • the associated feature cross extraction subunit includes: a cross-correlation feature extraction module for performing cross-feature extraction on the image feature and the initial attention feature, and obtaining a cross-correlation corresponding to each image to be processed Features; cross-correlation weight determination module, used to determine the cross-correlation weight corresponding to the image to be processed according to the cross-correlation feature; The initial attention features are weighted to obtain the associated features corresponding to the image to be processed.
  • the acquisition unit 301 includes: a feature mapping subunit, configured to perform feature mapping on the image to be processed in the image pair to be processed, to obtain a feature map corresponding to the image to be processed; a dimensionality reduction processing subunit , which is used to perform dimensionality reduction processing on the feature map corresponding to the image to be processed to obtain the feature map after dimensionality reduction; the scale image feature extraction subunit is used to perform multi-scale feature extraction on the feature map after dimensionality reduction to obtain the feature map to be processed The scale image features corresponding to each scale of the image; the image feature fusion subunit is used to fuse the scale image features corresponding to each scale of the image to be processed to obtain the image features of the image to be processed.
  • the identification unit 303 includes: a size information acquisition subunit, configured to acquire the size information of the common-view image corresponding to each image to be processed; a size difference calculation subunit, configured to obtain the size information based on the size information Calculate at least one size difference between the images to be processed; the scale difference screening subunit is used to filter out target size differences that meet preset conditions from the size differences, and use the target size difference as the Scale difference between common view images.
  • the processing unit 305 includes: a common-view feature point matching subunit, configured to perform common-view feature points of each image to be processed in the adjusted common-view image in the image pair to be processed Matching feature points to obtain matched common-view feature points; source feature point determination subunit is used to determine the matched common-view feature in the image to be processed based on the scale difference and the size information of the adjusted common-view image The source feature point corresponding to the point; the processing subunit is used to process the image pair to be processed based on the source feature point.
  • a common-view feature point matching subunit configured to perform common-view feature points of each image to be processed in the adjusted common-view image in the image pair to be processed Matching feature points to obtain matched common-view feature points
  • source feature point determination subunit is used to determine the matched common-view feature in the image to be processed based on the scale difference and the size information of the adjusted common-view image The source feature point corresponding to the point; the processing subunit is used to process the image pair to be processed based on
  • each of the above units may be implemented as an independent entity, or may be combined arbitrarily as the same or several entities.
  • the specific implementation of each of the above units may refer to the previous method embodiments, and will not be repeated here.
  • the acquisition unit 301 acquires the image pair to be processed, and performs image feature extraction on the image to be processed in the image pair to be processed, so as to obtain the image feature of the image to be processed;
  • the extraction unit 302 extracts from the image features The associated feature of the image pair to be processed;
  • the identification unit 303 identifies the common-view image of the common-view area in the image to be processed according to the associated feature, and calculates the scale difference between the common-view images;
  • the adjustment unit 304 is based on the scale difference, Adjust the size of the common-view image to obtain the adjusted common-view image;
  • the processing unit 305 extracts at least one common-view feature point from each adjusted common-view image, and processes the pair of images to be processed based on the common-view feature point .
  • the embodiment of the present application also provides a computer device, as shown in FIG. 8 , which shows a schematic structural diagram of the computer device involved in the embodiment of the present application.
  • the computer device may be a server, specifically:
  • the computer device may include a processor 401 of one or more processing cores, a memory 402 of one or more computer-readable storage media, a power supply 403, an input unit 404 and other components.
  • a processor 401 of one or more processing cores may include a processor 401 of one or more processing cores, a memory 402 of one or more computer-readable storage media, a power supply 403, an input unit 404 and other components.
  • FIG. 8 does not constitute a limitation on the computer device, and may include more or less components than shown in the figure, or combine some components, or arrange different components. in:
  • the processor 401 is the control center of the computer equipment, uses various interfaces and lines to connect various parts of the entire computer equipment, runs or executes the software programs and/or modules stored in the memory 402, and calls the software stored in the memory 402 Data, perform various functions of computer equipment and process data.
  • the processor 401 may include one or more processing cores; preferably, the processor 401 may integrate an application processor and a modem processor, wherein the application processor mainly processes operating systems, user interfaces, and application programs, etc. , the modem processor mainly handles wireless communications. It can be understood that the foregoing modem processor may not be integrated into the processor 401 .
  • the memory 402 can be used to store software programs and modules, and the processor 401 executes various functional applications and image processing by running the software programs and modules stored in the memory 402 .
  • the memory 402 can mainly include a program storage area and a data storage area, wherein the program storage area can store an operating system, at least one application program required by a function (such as a sound playback function, an image playback function, etc.); Data created by the use of computer equipment, etc.
  • the memory 402 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage devices.
  • the memory 402 may further include a memory controller to provide the processor 401 with access to the memory 402 .
  • the computer device also includes a power supply 403 for supplying power to each component.
  • the power supply 403 can be logically connected to the processor 401 through a power management system, so that functions such as charging, discharging, and power consumption management can be realized through the power management system.
  • the power supply 403 may also include one or more DC or AC power supplies, recharging systems, power failure detection circuits, power converters or inverters, power status indicators and other arbitrary components.
  • the computer device can also include an input unit 404, which can be used to receive input digital or character information, and generate keyboard, mouse, joystick, optical or trackball signal input related to user settings and function control.
  • an input unit 404 which can be used to receive input digital or character information, and generate keyboard, mouse, joystick, optical or trackball signal input related to user settings and function control.
  • the computer device may also include a display unit, etc., which will not be repeated here.
  • the processor 401 in the computer device loads the executable file corresponding to the process of one or more application programs into the memory 402 according to the following instructions, and the processor 401 runs the executable file stored in the
  • the application program in the memory 402 implements an image processing method, which belongs to the same idea as the image processing method in the above embodiment, and its specific implementation process is detailed in the above method embodiment.
  • a computer device including a memory and a processor, where computer-readable instructions are stored in the memory, and the processor implements the steps of the above-mentioned image processing method when executing the computer-readable instructions.
  • a computer-readable storage medium on which computer-readable instructions are stored, and when the computer-readable instructions are executed by a processor, the steps of the above-mentioned image processing method are implemented.
  • a computer program product including computer readable instructions, which implement the steps of the above image processing method when executed by a processor.
  • user information including but not limited to user equipment information, user personal information, etc.
  • data including but not limited to data used for analysis, stored data, displayed data, etc.
  • the computer-readable instructions can be stored in a non-volatile computer
  • the computer-readable instructions may include the processes of the embodiments of the above-mentioned methods when executed.
  • any reference to storage, database or other media used in the various embodiments provided in the present application may include at least one of non-volatile and volatile storage.
  • Non-volatile memory can include read-only memory (Read-Only Memory, ROM), magnetic tape, floppy disk, flash memory, optical memory, high-density embedded non-volatile memory, resistive variable memory (ReRAM), magnetic variable memory (Magnetoresistive Random Access Memory, MRAM), Ferroelectric Random Access Memory (FRAM), Phase Change Memory (Phase Change Memory, PCM), graphene memory, etc.
  • the volatile memory may include random access memory (Random Access Memory, RAM) or external cache memory, etc.
  • RAM Random Access Memory
  • RAM Random Access Memory
  • RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM).
  • the databases involved in the various embodiments provided in this application may include at least one of a relational database and a non-relational database.
  • the non-relational database may include a blockchain-based distributed database, etc., but is not limited thereto.
  • the processors involved in the various embodiments provided by this application can be general-purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, data processing logic devices based on quantum computing, etc., and are not limited to this.

Abstract

本申请实施例公开了一种图像处理方法、装置和计算机可读存储介质,应用于云技术、人工智能、智慧交通、辅助驾驶等各种场景;通过获取待处理图像对,并对待处理图像对中的待处理图像进行图像特征提取,得到待处理图像的图像特征(101);在图像特征中提取出待处理图像对的关联特征(102);根据关联特征,在待处理图像中识别出共视区域的共视图像,并计算共视图像之间的尺度差值(013);基于尺度差值,对共视图像的尺寸进行调整(014);在每一调整后共视图像中提取出至少一个共视特征点,并基于共视特征点,对待处理图像对进行处理(105)。

Description

图像处理方法、装置和计算机可读存储介质
相关申请
本申请要求2022年01月25日申请的,申请号为2022100889886,名称为“图像处理方法、装置和计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及互联网技术领域,具体涉及一种图像处理方法、装置和计算机可读存储介质。
背景技术
随着互联网技术的快速发展,对图像的处理也越来越多样化,例如,在大规模场景重建(Structure from Motion,简称SFM)的应用场景中,将同一场景中以不同视角成像的两张图像之间的两个对应的局部特征点进行匹配。在现有的图像处理方法中,通过对两张图像中特征点尺度一致的区域进行估计,来对两张图像中的每一特征点进行逐步的提取和匹配。
在对现有技术的研究和实践过程中,本申请的发明人发现,现有图像处理方法中通过对图像中的特征点进行单点逐步的匹配来对图像进行处理,这种方法对图像中特征点的处理速率较慢,使得图像处理的速率较低,进行导致图像处理的效率较低。
发明内容
根据本申请提供的各种实施例,提供一种图像处理方法、装置和计算机可读存储介质。
本申请实施例提供一种图像处理方法,由计算机设备执行,包括:
获取待处理图像对,并对所述待处理图像对中的待处理图像进行图像特征提取,得到所述待处理图像的图像特征;
在所述图像特征中提取出所述待处理图像对的关联特征,所述关联特征用于表征所述待处理图像对中的待处理图像之间的相互信息;
根据所述关联特征,在所述待处理图像中识别出共视区域的共视图像,并计算所述共视图像之间的尺度差值;
基于所述尺度差值,对所述共视图像的尺寸进行调整,得到调整后共视图像;及
在每一所述调整后共视图像中提取出至少一个共视特征点,并基于所述共视特征点,对所述待处理图像对进行处理。
相应的,本申请实施例提供一种图像处理装置,包括:
获取单元,用于获取待处理图像对,并对所述待处理图像对中的待处理图像进行图像特征提取,得到所述待处理图像的图像特征;
提取单元,用于在所述图像特征中提取出所述待处理图像对的关联特征,所述关联特征用于表征所述待处理图像对中的待处理图像之间的相互信息;
识别单元,用于根据所述关联特征,在所述待处理图像中识别出共视区域的共视图像,并计算所述共视图像之间的尺度差值;
调整单元,用于基于所述尺度差值,对所述共视图像的尺寸进行调整,得到调整后共视图像;及
处理单元,用于在每一所述调整后共视图像中提取出至少一个共视特征点,并基于所 述共视特征点,对所述待处理图像对进行处理。
另一方面,本申请还提供了一种计算机设备。所述计算机设备包括存储器和处理器,所述存储器存储有计算机可读指令,所述处理器执行所述计算机可读指令时实现上述图像处理方法所述的步骤。
另一方面,本申请还提供了一种计算机可读存储介质。所述计算机可读存储介质,其上存储有计算机可读指令,所述计算机可读指令被处理器执行时实现上述图像处理方法所述的步骤。
另一方面,本申请还提供了一种计算机程序产品。所述计算机程序产品,包括计算机可读指令,该计算机可读指令被处理器执行时实现上述图像处理方法所述的步骤。
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征、目的和优点将从说明书、附图以及权利要求书变得明显。
附图说明
为了更清楚地说明本申请实施例或传统技术中的技术方案,下面将对实施例或传统技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据公开的附图获得其他的附图。
图1是本申请实施例提供的一种图像处理方法实施场景示意图;
图2是本申请实施例提供的一种图像处理方法的流程示意图;
图3a是本申请实施例提供的一种图像处理方法的多尺度特征提取示意图;
图3b是本申请实施例提供的一种图像处理方法的具体流程示意图;
图4a是本申请实施例提供的一种图像处理方法的图像处理模型结构示意图;
图4b是本申请实施例提供的一种图像处理方法的关注中心坐标和相对中心点偏移示意图;
图5是本申请实施例提供的一种图像处理方法的整体流程示意图;
图6是本申请实施例提供的一种图像处理方法的另一流程示意图;
图7是本申请实施例提供的图像处理装置的结构示意图;
图8是本申请实施例提供的计算机设备的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请实施例提供一种图像处理方法、装置和计算机可读存储介质。其中,该图像处理装置可以集成在计算机设备中,该计算机设备可以是服务器,也可以是终端等设备。
其中,服务器可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、网络加速服务(Content Delivery Network,CDN)、以及大数据和人工智能平台等基础云计算服务的云服务器。终端可以包括但不限于手机、电脑、智能语音交互设备、智能家电、车载终端、飞行器等。终端以及服务器可以 通过有线或无线通信方式进行直接或间接地连接,本申请在此不做限制。
请参阅图1,以图像处理装置集成在计算机设备中为例,图1为本申请实施例所提供的图像处理方法的实施场景示意图,其中,该计算机设备可以为服务器,也可以为终端,该计算机设备可以获取待处理图像对,并对待处理图像对中的待处理图像进行图像特征提取,得到待处理图像的图像特征;在图像特征中提取出待处理图像对的关联特征;根据关联特征,在待处理图像中识别出共视区域的共视图像,并计算共视图像之间的尺度差值;基于尺度差值,对共视图像的尺寸进行调整,得到调整后共视图像;在每一调整后共视图像中提取出至少一个共视特征点,并基于共视特征点,对待处理图像对进行处理。
需要说明的是,本申请实施例可应用于各种场景,包括但不限于云技术、人工智能、智慧交通、辅助驾驶等。图1所示的图像处理方法的实施环境场景示意图仅仅是一个示例,本申请实施例描述的图像处理方法的实施环境场景是为了更加清楚的说明本申请实施例的技术方案,并不构成对于本申请实施例提供的技术方案的限定。本领域普通技术人员可知,随着图像处理的演变和新业务场景的出现,本申请提供的技术方案对于类似的技术问题,同样适用。
为了更好的说明本申请实施例,请参照以下名词进行参考:
共视区域:针对同一场景或者同一目标对象在不同拍摄条件下成像的多个图像上,该场景或者该目标对象所在的图像区域,其中目标对象可以是生物或者非生物,生物指的是独立的生命体,例如可以是自然人、动物、植物等中的任意一种,非生物指的是各种物体,例如车辆、大楼、桌子、椅子等中任意一种,不同拍摄条件例如可以是不同的视角、不同距离或者不同时间中的任意一种。多个指的是至少两个。举例说明,当采用双目摄像头对一只猫从左右两个视角拍摄得到图像A和图像B,则图像A和图像B中猫所在区域可以为共视区域。再比如,在道路重建任务中,针对某一个路面场景在不同时间点的拍摄得到图像A和图像B,则图像A和图像B中路面场景所在图像区域可以为共视区域。。共视区域的区域形状根据需要可以是各种形状,例如可以是矩形、正方形或圆形。
特征点:图像处理中,特征点指的是图像灰度值发生剧烈变化的点或者在图像边缘上曲率较大的点(即两个边缘的交点)。图像特征点在基于特征点的图像匹配算法中有着十分重要的作用。图像特征点能够反映图像本质特征,能够标识图像中目标物体。通过特征点的匹配能够完成图像的匹配。
特征匹配:得到同一物体在两个不同视角成像的图像的像素级或亚像素级对应关系。
尺度:描述物体在相机平面的成像大小,尺度越小表示物体在相机平面成像越小,尺度越大表示物体在相机平面成像越大。
本申请实施例提供的方案涉及人工智能的计算机视觉等技术,具体通过如下实施例进行说明。需要说明的是,以下实施例的描述顺序不作为对实施例优选顺序的限定。
在一个实施例中,请参阅图2,图2是本申请实施例提供的图像处理方法的流程示意图,在本实施例中,该图像处理方法由计算机设备执行,该计算机设备可以是服务器,也可以是终端。具体地,该图像处理方法包括:
101、获取待处理图像对,并对待处理图像对中的待处理图像进行图像特征提取,得到待处理图像的图像特征。
其中,该待处理图像对可以为多张待处理图像组成的整体,例如,可以为两张待处理图像组成的整体。该待处理图像对中的待处理图像可以为存在共视区域的图像,即可以为 同一场景或者同一对象在不同视角、不同距离或者不同时间拍摄的两张图像。该图像特征可以为表征该待处理图像的特征信息。
其中,获取待处理图像对的方式可以有多种,例如,可以从与图像处理装置连接的存储器中获取,也可以从其他数据存储终端获取。还可以从实体终端的存储器中获取,也可以从虚拟的存储空间如数据集或者语料库中获取,在此不做限定。
在获取待处理图像对之后,便可以对待处理图像对中的待处理图像进行图像特征提取。其中,对待处理图像对中的待处理图像进行图像特征提取的方式可以用多种,比如,可以对该待处理图像对中的待处理图像进行特征映射,得到该待处理图像对应的特征图;对该待处理图像对应的特征图进行降维处理,得到降维后特征图;对降维后特征图进行多尺度的特征提取,得到该待处理图像在每一尺度对应的尺度图像特征;将该待处理图像在每一尺度对应的尺度图像特征进行融合,得到该待处理图像的图像特征。
其中,该特征图(Feature map)可以为表征待处理图像在每一通道(Channel)中的特征信息,在卷积神经网络的每个卷积层中,数据是以三维形式存在的,可以视为许多个二维图片叠加在一起,其中每一二维图片可以称为一个特征图。该降维后特征图可以为对待处理图像进行降维之后得到的特征图,该尺度图像特征可以为对待处理图像进行多尺度的特征提取之后得到的每一尺度对应的图像特征。
其中,对该待处理图像对中的待处理图像进行特征映射的方式可以有多种,例如,可以采用卷积核(Kernel)来对待处理图像进行卷积处理,以将待处理图像的特征映射到特征映射层中,来得到该待处理图像对应的特征图。
为了可以降低模型的计算量,同时控制模型的大小,在对该待处理图像对中的待处理图像进行特征映射之后,便可以对该待处理图像对应的特征图进行降维处理。其中,对该待处理图像对应的特征图进行降维处理的方式可以有多种,比如,可以在通道层面上,对待处理图像对应的特征图进行卷积处理,得到降维后特征图,例如,请参考图3a,图3a是本申请实施例提供的一种图像处理方法的多尺度特征提取示意图,假设待处理图像对应的特征图维度为w×h×1024,其中,w表示待处理图像对应的宽度,h表示待处理图像对应的长度,1024表示特征图对应的通道数,可以对待处理图像对应的特征图进行卷积处理,将通道数1024降维到256个通道数,得到降维后特征图对应的维度为w×h×256。
在对该待处理图像对应的特征图进行降维处理之后,便可以对降维后特征图进行多尺度的特征提取。其中,对降维后特征图进行多尺度的特征提取的方式可以有多种,比如,可以采用不同大小的卷积核分别对降维后特征图进行卷积,来得到多个尺度的尺度图像特征,即可以得到该待处理图像在每一尺度对应的尺度图像特征,例如,请继续参考图3a,k表示卷积核尺寸(Kernel size),s表示卷积的步长(Stride,也称步幅),以此,可以采用卷积核大小为4×4、步长为2×2对降维后特征图进行卷积,得到该尺度对应的维度为w/2×h/2×256的尺度图像特征,同时可以采用卷积核大小为8×8、步长为2×2对降维后特征图进行卷积,得到该尺度对应的维度为w/2×h/2×128的尺度图像特征,还可以采用卷积核大小为16×16、步长为2×2对降维后特征图进行卷积,得到该尺度对应的维度为w/2×h/2×128的尺度图像特征,可以对这三个尺度对应的尺度图像特征进行拼接,得到维度为w/2×h/2×512的多尺度图像特征。
在对降维后特征图进行多尺度的特征提取之后,便可以将该待处理图像在每一尺度对应的尺度图像特征进行融合。其中,将该待处理图像在每一尺度对应的尺度图像特征进行 融合的方式可以有多种,例如,请继续参考图3a,可以在通道层面中对每一尺度对应的尺度图像特征进行融合,得到该待处理图像对应的维度为w/2×h/2×256的图像特征。
在一个实施例中,请参考图3b,图3b是本申请实施例提供的一种图像处理方法的具体流程示意图,其中,实线箭头指示的步骤表示属于模型训练与应用阶段中的步骤,虚线箭头指示的步骤表示只属于模型训练阶段中的步骤,可以假设待处理图像对中包括待处理图像Ia和Ib,长为H、宽为W(即H×W),将待处理Ia和Ib经过残差网络(Resnet50)进行下采样,例如,可以采用Resnet50-Layer3(Shared Layer3,即Resnet50中的第三层结构)对待处理图像Ia和Ib下采样8倍特征图,其通道数可以为1024个,从而可以得到降维后特征图对应的维度为W/16×H/16×256,从而可以将待处理图像Ia和Ib对应的降维后特征图分别输入到多尺度特征提取模块(Multi-Scale Feature Extractor)中进行多尺度的特征提取以及融合,得到待处理图像Ia和Ib对应的维度为W/32×H/32×256的图像特征。
102、在图像特征中提取出待处理图像对的关联特征。
其中,该关联特征可以用于表征待处理图像对中的待处理图像之间的相互信息,该相互信息可以为表征该待处理图像之间的关联关系的信息,例如可以表征待处理图像之间存在的相同场景或者对象的信息,该关联特征可以为特征图,特征图的维度例如可以是256维,可以表示为F∈R h×w×256
其中,在图像特征中提取出待处理图像对的关联特征的方式可以有多种,例如,可以对该图像特征进行扁平化处理,得到该待处理图像的扁平图像特征,对该扁平图像特征进行特征提取,得到该待处理图像对应的初始注意力特征,对该初始注意力特征进行交叉特征提取,得到该待处理图像对中每一该待处理图像的关联特征。
其中,该扁平图像特征可以为将待处理图像对应的图像特征进行展平之后得到的特征,该初始注意力特征可以理解为在待处理图像对应的图像特征中用于表征图像特征中每一特征与其他特征之间的关联关系的特征。
其中,对该图像特征进行扁平化处理的方式可以有多种,例如,可以采用展平层(Flatten Layer)对该图像特征进行扁平化处理,来将维度为w/2×h/2×256的图像特征进行展平,得到该待处理图像对应的一维的扁平图像特征。
在对该图像特征进行扁平化处理之后,便可以对该扁平图像特征进行特征提取,来得到该待处理图像对应的初始注意力特征。其中,对该扁平图像特征进行特征提取的方式可以有多种,例如,该扁平图像特征可以包含多个子扁平图像特征,可以对该扁平图像特征进行特征提取,得到该扁平图像特征中的每一子扁平图像特征对应的初始关联特征,基于该初始关联特征,确定该扁平图像特征中的每一子扁平图像特征对应的初始关联权重,根据该初始关联权重对该扁平图像特征中的每一子扁平图像特征进行融合,得到该待处理图像对应的初始注意力特征。
其中,该子扁平图像特征可以为扁平图像特征中的至少一个特征,例如,可以将扁平图像特征划分为多个区域,每一区域对应的特征则为子扁平图像特征。对扁平图像特征进行特征提取即对扁平图像特征中的子扁平图像特征进行特征映射的过程,映射得到的特征即为子扁平图像特征对应的初始关联特征,该初始关联特征可以为该子扁平图像特征中用于确定与其他子扁平图像特征之间的关联关系的特征信息。该初始关联权重可以为表征扁平图像特征中每一子扁平图像特征在扁平图像特征中的重要程度。
其中,对该扁平图像特征进行特征提取,得到该扁平图像特征中的每一子扁平图像特征对应的初始关联特征的方式可以有多种,比如,可以采用注意力网络(Attention)对扁平图像特征进行特征提取,来得到扁平图像特征中的每一子扁平图像特征对应的初始关联特征,例如,可以将每一扁平图像特征转换为三个维度的空间向量,包括查询向量(Query,简称Q)、键向量(Key,简称K)和值向量(Value,简称V),具体的转换方式可以理解为对每一扁平图像特征与三个维度的转换参数进行融合而得到的,将查询向量、键向量和值向量作为每一扁平图像特征对应的初始关联特征。
在对该扁平图像特征进行特征提取,得到该扁平图像特征中的每一子扁平图像特征对应的初始关联特征之后,便可以基于该初始关联特征,确定该扁平图像特征中的每一子扁平图像特征对应的初始关联权重,其中,基于该初始关联特征,确定该扁平图像特征中的每一子扁平图像特征对应的初始关联权重的方式可以有多种,例如,可以采用注意力网络将扁平图像特征中的每一子扁平图像特征对应的查询向量与其他子扁平图像特征的键向量进行点积,可以得到每一子扁平图像特征对应的注意力得分(Score),再基于每一子扁平图像特征对应的注意力得分,来计算每一子扁平图像特征对应的初始关联权重。
其中,除了可以采用注意力网络对该扁平图像特征进行特征提取,得到该扁平图像特征中的每一子扁平图像特征对应的初始关联特征之后,基于该初始关联特征,确定该扁平图像特征中的每一子扁平图像特征对应的初始关联权重以外,还可以采用其他可以捕捉每一子扁平图像特征与其他子扁平图像特征之间的关联关系,进而确定每一子扁平图像特征在扁平图像特征中所占的权重的网络。
在基于该初始关联特征,确定该扁平图像特征中的每一子扁平图像特征对应的初始关联权重之后,便可以根据该初始关联权重对该扁平图像特征中的每一子扁平图像特征进行融合。其中,根据该初始关联权重对该扁平图像特征中的每一子扁平图像特征进行融合的方式可以有多种,比如,可以基于初始关联权重对该扁平图像特征中的每一子扁平图像特征进行加权,并将加权后的子扁平图像特征进行累加,根据累加结果可以得到该待处理图像对应的初始注意力特征。例如,假设待处理图像对中包括待处理图像甲和待处理图像乙,其中,待处理图像甲对应的扁平图像特征中包括4个子扁平图像特征,分别为G、B、C和D,并确定了每一子扁平图像特征对应的初始关联权重,分别为g、b、c和d,进而可以基于初始关联权重对该扁平图像特征中的每一子扁平图像特征进行加权,得到Gg、Bb、Cc和Dd,从而可以将加权后的子扁平图像特征进行累加,得到累加结果为Gg+Bb+Cc+Dd,根据累加结果可以得到该待处理图像对应的初始注意力特征为Gg+Bb+Cc+Dd。
在一个实施例中,请参考图3b,可以将扁平图像特征输入到编码模块(Transformer Encoder)中,来得到待处理图像对应的初始注意力特征。可选的,请参考图4a,图4a是本申请实施例提供的一种图像处理方法的图像处理模型结构示意图,其中,假设待处理图像对中包括待处理图像Ia和Ib,以获取待处理图像Ia对应的初始注意力特征为例,可以将待处理图像Ia对应的扁平图像特征
Figure PCTCN2022131464-appb-000001
输入到图中左侧的Transformer Encoder模块的自注意力子模块中,来得到待处理图像对应的初始注意力特征。具体的,可以将待处理图像Ia对应的扁平图像特征
Figure PCTCN2022131464-appb-000002
转换为K、Q以及V三个维度的空间向量,并输入到Transformer Encoder模块的自注意力子模块中,在该自注意力子模块中,通过多头注意力单元(Multi-head Attention)对该扁平图像特征进行特征提取,来得到该扁平图像特征中的每一子扁平图像特征对应的初始关联权重,根据该初始关联权重对该扁平图像特征中的每一子扁平图像特 征进行加权以及合并,来得到多头注意力单元的输出,进而可以通过合并单元(Concat)对多头注意力单元的输出以及扁平图像特征
Figure PCTCN2022131464-appb-000003
进行合并,进而可以将合并的结果通过归一化单元(Layer Normalization)进行归一化处理,从而可以通过前馈网络和残差连接单元(FeedForward&Add)中的前馈网络子单元(Feed Forward)将归一化处理的结果进行全连接处理,并通过前馈网络和残差连接单元中的残差连接子单元(Add)将全连接处理的结果与合并的结果进行残差连接处理,得到待处理图像Ia对应的初始注意力特征。
在对该扁平图像特征进行特征提取,得到该待处理图像对应的初始注意力特征之后,便可以对该初始注意力特征进行交叉特征提取,得到该待处理图像对中每一该待处理图像的关联特征。其中,对该初始注意力特征进行交叉特征提取的方式可以有多种,比如,可以对该图像特征以及该初始注意力特征进行交叉特征提取,得到每一该待处理图像对应的交叉关联特征,根据该交叉关联特征,确定该待处理图像对应的交叉关联权重,基于该交叉关联权重,对每一该待处理图像对应的初始注意力特征进行加权,以得到该待处理图像对应的关联特征。
其中,该交叉关联特征可以为用于确定待处理图像对中的待处理图像之间的关联关系的特征,该交叉关联权重可以为表征待处理图像对中的待处理图像之间的关联程度,该图像特征可以为扁平化处理后的图像特征,也即扁平图像特征。
其中,对该图像特征以及该初始注意力特征进行交叉特征提取,得到每一该待处理图像对应的交叉关联特征的方式可以有多种,比如,可以采用注意力网络来对该图像特征以及该初始注意力特征进行交叉特征提取,例如,可以将某一待处理图像对应的初始注意力特征转换为查询向量,并将另一待处理图像的图像特征(可以将该图像特征转化为扁平图像特征)转换为键向量和值向量,具体的转换方式可以理解为对图像特征以及该初始注意力特征与对应维度的转换参数进行融合而得到的,将对应的查询向量、键向量和值向量作为每一图像特征对应的交叉关联特征。
在对该图像特征以及该初始注意力特征进行交叉特征提取,得到每一该待处理图像对应的交叉关联特征之后,便可以根据该交叉关联特征,确定该待处理图像对应的交叉关联权重,其中,根据该交叉关联特征,确定该待处理图像对应的交叉关联权重的方式可以有多种,例如,可以采用注意力网络将待处理图像对中某一待处理图像对应的初始注意力特征对应的查询向量与其他待处理图像对应的图像特征的键向量进行点积,可以分别得到待处理图像对中某一待处理图像对应的该图像特征和对应的初始注意力特征的注意力得分,再基于该注意力得分,来计算每一图像特征和对应的初始注意力特征的交叉关联权重。
在根据该交叉关联特征,确定该待处理图像对应的交叉关联权重之后,便可以基于该交叉关联权重,对每一该待处理图像对应的初始注意力特征进行加权,以得到该待处理图像对应的关联特征。其中,基于该交叉关联权重,对每一该待处理图像对应的初始注意力特征进行加权的方式可以有多种,例如,假设待处理图像对中包括待处理图像甲和待处理图像乙,其中,以获取待处理图像甲对应的关联特征为例,假设待处理图像甲对应的初始注意力特征为E,待处理图像乙对应的图像特征为
Figure PCTCN2022131464-appb-000004
并确定了初始注意力特征E对应的交叉关联权重为e,图像特征
Figure PCTCN2022131464-appb-000005
对应的交叉关联权重为f,进而可以基于交叉关联权重对初始注意力特征E和图像特征为
Figure PCTCN2022131464-appb-000006
进行融合,来得到关联特征,例如,可以基于交叉关联权重对初始注意力特征E和图像特征为
Figure PCTCN2022131464-appb-000007
进行加权并求和,得到关联特征为
Figure PCTCN2022131464-appb-000008
在一个实施例中,请参考图4a,其中,假设待处理图像对中包括待处理图像Ia和Ib, 以获取待处理图像Ia对应的关联特征为例,可以将待处理图像Ia对应的扁平图像特征
Figure PCTCN2022131464-appb-000009
输入到图中左侧的Transformer Encoder模块的自注意力子模块中,来得到待处理图像对应的初始注意力特征,并将初始注意力特征输入到Transformer Encoder模块的交叉注意力子模块中,具体的,可以将待处理图像Ia对应的初始注意力特征转换为查询向量Q,将待处理图像Ib对应的扁平图像特征
Figure PCTCN2022131464-appb-000010
转换为键向量K和值向量V,进而可以输入到交叉注意力子模块的多头注意力单元中,通过该多头注意力单元对该图像特征以及该初始注意力特征进行交叉特征提取,得到每一该待处理图像对应的交叉关联特征,根据该交叉关联特征,确定该待处理图像对应的交叉关联权重,基于该交叉关联权重,对待处理图像Ia对应的初始注意力特征以及待处理图像Ib对应的扁平图像特征
Figure PCTCN2022131464-appb-000011
进行加权以及合并处理,来得到多头注意力单元的输出,进而可以通过合并单元对多头注意力单元的输出以及待处理图像Ia对应的初始注意力特征进行合并,并将合并的结果通过归一化单元进行归一化处理,从而可以通过前馈网络和残差连接单元中的前馈网络子单元将归一化处理的结果进行全连接处理,并通过前馈网络和残差连接单元中的残差连接子单元将全连接处理的结果与合并的结果进行残差连接处理,得到待处理图像Ia对应的关联特征。
同理,可以采用获取待处理图像Ia对应的关联特征的方法,对待处理图像Ib对应的关联特征进行获取,在此不进行赘述。
103、根据关联特征,在待处理图像中识别出共视区域的共视图像,并计算共视图像之间的尺度差值。
其中,该共视图像可以为每一待处理图像中共视区域所在的区域图像,该尺度差值可以为表征待处理图像对中共视图像之间的尺度差距的数值。
其中,根据关联特征,在待处理图像中识别出共视区域的共视图像的方式可以有多种,比如,可以获取预设区域特征,并采用训练后图像处理模型对该预设区域特征进行特征提取,得到初始区域特征,对该初始区域特征以及该关联特征进行交叉特征提取,得到该初始区域特征对应的共视区域特征,基于该共视区域特征以及该关联特征,在该待处理图像中识别出该共视区域中的共视图像。
其中,该预设区域特征可以为预先设定的一个用来表征共视区域的边界框的特征信息,可以理解为预先学习到的检测共视区域边界框的信息抽象表达,该预设区域特征可以为256维的特征向量(Q∈R 1×256)。该初始区域特征可以为基于预设区域特征中每一特征之间的关联关系进行融合得到的特征信息,该共视区域特征可以为表征待处理图像中共视区域对应的边界框的特征信息。该训练后图像处理模型可以为训练好的用于对待处理图像对中的待处理图像进行处理的模型,可以为Transformer模型,该训练后图像处理模型的具体结构可以参考图4a中提供的图像处理模型的结构示意图。
其中,获取预设区域特征的方式可以有多种,例如,可以由开发人员预先进行设计并输入,也可以直接根据预先获取到的区域特征模板进行自动的生成等,在此不做限定。
在获取预设区域特征之后,便可以采用训练后图像处理模型对该预设区域特征进行特征提取,得到初始区域特征。其中,采用训练后图像处理模型对该预设区域特征进行特征提取的方式可以有多种,比如,该预设区域特征可以包括多个区域子特征,可以采用训练后图像处理模型对该预设区域特征进行特征提取,得到该预设区域特征中每一区域子特征对应的区域关联特征,基于该区域关联特征,确定该预设区域特征中每一区域子特征对应的区域关联权重,根据该区域关联权重,对该预设区域特征中每一区域子特征进行融合, 得到初始区域特征。
其中,该区域子特征可以为预设区域特征中的至少一个特征,例如,可以将预设区域特征划分为多个区域,每一区域对应的特征则为区域子特征。对预设区域特征进行特征提取即对预设区域特征中的区域子特征进行特征映射,映射得到的特征即为该区域子特征对应的区域关联特征,该区域关联特征可以为用于确定预设区域特征中该区域子特征与其他区域子特征之间的关联关系的特征信息。该区域关联权重可以为表征预设区域特征中每一区域子特征在预设区域特征中的重要程度。
其中,采用训练后图像处理模型对该预设区域特征进行特征提取,得到该预设区域特征中每一区域子特征对应的区域关联特征的方式可以有多种,比如,可以采用注意力网络对预设区域特征进行特征提取,来得到预设区域特征中的每一区域子特征对应的区域关联特征,例如,可以将每一区域子特征转换为三个维度的空间向量,包括查询向量、键向量和值向量,具体的转换方式可以理解为对每一区域子特征与三个维度的转换参数进行融合而得到的,将查询向量、键向量和值向量作为每一区域子特征对应的区域关联特征。
在采用训练后图像处理模型对该预设区域特征进行特征提取,得到该预设区域特征中每一区域子特征对应的区域关联特征之后,便可以基于该区域关联特征,确定该预设区域特征中每一区域子特征对应的区域关联权重,其中,基于该区域关联特征,确定该预设区域特征中每一区域子特征对应的区域关联权重的方式可以有多种,例如,可以采用注意力网络将预设区域特征中的每一区域子特征对应的查询向量与其他区域子特征的键向量进行点积,可以得到每一区域子特征对应的注意力得分,再基于每一区域子特征对应的注意力得分,来计算每一区域子特征对应的区域关联权重。
在基于该区域关联特征,确定该预设区域特征中每一区域子特征对应的区域关联权重之后,便可以根据该区域关联权重,对该预设区域特征中每一区域子特征进行融合。其中,根据该区域关联权重,对该预设区域特征中每一区域子特征进行融合的方式可以有多种,比如,可以基于区域关联权重对该预设区域特征中的每一区域子特征进行加权,并将加权后的区域子特征进行累加,根据累加结果可以得到该预设区域特征对应的初始区域特征。
在一个实施例中,请继续参考图4a,可以通过图中右侧的训练后图像处理模型中的解码模块(Transformer Decoder)来对该预设区域特征进行特征提取,得到该预设区域特征中每一区域子特征对应的区域关联特征。具体的,假设待处理图像对中包括待处理图像Ia和Ib,以获取待处理图像Ia对应的区域关联特征为例,可以将预设区域特征(Single Query)转换为K、Q以及V三个维度的空间向量,并输入到Transformer Decoder模块的归一化单元中进行归一化处理,并将归一化处理后的K、Q、V三个空间向量输入到多头自注意力单元(Multi-head Self-Attention)中,通过该多头自注意力单元对该预设区域特征进行特征提取,得到该预设区域特征中每一区域子特征对应的区域关联特征,基于该区域关联特征,确定该预设区域特征中每一区域子特征对应的区域关联权重,进而根据该区域关联权重,对该预设区域特征中每一区域子特征进行加权,从而将加权后的结果输入到正则化和残差连接单元(Dropout&Add)中进行特征融合,来得到待处理图像Ia对应的初始区域特征。
在采用训练后图像处理模型对该预设区域特征进行特征提取,得到初始区域特征之后,便可以对该初始区域特征以及该关联特征进行交叉特征提取。其中,对该初始区域特征以及该关联特征进行交叉特征提取的方式可以有多种,例如,可以对该初始区域特征和该关联特征进行特征提取,得到该关联特征对应的图像关联特征,以及该初始区域特征对应的 初始区域关联特征,根据该图像关联特征和初始区域关联特征确定该关联特征对应的图像关联权重,基于该图像关联权重,对该关联特征进行加权,得到共视图像特征,并将共视图像特征和该初始区域特征进行融合,得到共视区域特征。
其中,对关联特征进行特征提取即对该关联特征进行特征映射,映射得到的特征即为该关联特征对应的图像关联特征,该图像关联特征可以为用于确定该关联特征与初始区域特征之间的关联关系的特征信息;对初始区域特征进行特征提取即对该初始区域特征进行特征映射,映射得到的特征即为该初始区域特征对应的初始区域关联特征,该初始区域关联特征可以为用于确定该初始区域特征与关联特征之间的关联关系的特征信息,该图像关联权重可以为表征关联特征与初始区域特征之间的关联程度,该共视图像特征可以为表征关联特征与初始区域特征之间的关联关系的特征信息。
其中,对该初始区域特征和该关联特征进行特征提取,得到该关联特征对应的图像关联特征,以及该初始区域特征对应的初始区域关联特征的方式可以有多种,比如,可以采用注意力网络来对该初始区域特征和该关联特征进行特征提取,例如,可以将某一待处理图像对应的初始区域特征转换为查询向量,并将对应的关联特征转换为键向量和值向量,具体的转换方式可以理解为对初始区域特征和该关联特征与对应维度的转换参数进行融合而得到的,将对应的查询向量作为初始区域特征对应的初始区域关联特征,将对应的键向量和值向量作为该关联特征对应的图像关联特征。
在对该初始区域特征和该关联特征进行特征提取,得到该关联特征对应的图像关联特征,以及该初始区域特征对应的初始区域关联特征之后,便可以根据该图像关联特征和初始区域关联特征确定该关联特征对应的图像关联权重,其中,根据该图像关联特征和初始区域关联特征确定该关联特征对应的图像关联权重的方式可以有多种,例如,可以采用注意力网络将关联特征对应的图像关联特征的查询向量与初始区域特征对应的初始区域关联特征的键向量进行点积,可以分别得到关联特征中每一特征的注意力得分,再基于该注意力得分,来计算待处理图像对应的关联特征的图像关联权重。
在根据该图像关联特征和初始区域关联特征确定该关联特征对应的图像关联权重之后,便可以基于该图像关联权重,对该关联特征进行加权。其中,基于该图像关联权重,对该关联特征进行加权的方式可以有多种,例如,可以根据图像关联权重对关联特征对应的图像关联特征中的值向量进行加权,并将加权后的值向量进行融合,得到共视图像特征。
在基于该图像关联权重,对该关联特征进行加权之后,便可以将共视图像特征和该初始区域特征进行融合,得到共视区域特征。其中,将共视图像特征和该初始区域特征进行融合的方式可以有多种,例如,请参考图4a,其中,假设待处理图像对中包括待处理图像Ia和Ib,以获取待处理图像Ia对应的共视区域特征为例,可以将待处理图像Ia对应的关联特征fa输入到图中右侧的Transformer Decoder模块中,来得到待处理图像Ia对应的共视区域特征,具体的,可以对该初始区域特征和该关联特征进行特征提取,例如,可以将待处理图像Ia对应的初始区域特征与对应的预设区域特征进行加权,并将加权结果转换为查询向量Q,也即初始区域关联特征,将待处理图像Ia对应的关联特征fa转换为值向量V,并将关联特征fa通过位置编码模块(Positional Encoding)进行位置编码,并将fa对应的位置编码结果转换为键向量K,基于值向量V以及键向量K可以得到关联特征对应的图像关联特征,进而可以通过归一化单元对图像关联特征和初始区域关联特征进行归一化处理,并将归一化处理结果输入到多头注意力单元中,通过该多头注意力单元来根据该图像关联特 征和初始区域关联特征确定该关联特征对应的图像关联权重,基于该图像关联权重,对该关联特征进行加权,得到共视图像特征,来得到多头注意力单元的输出,进而可以通过正则化和残差连接单元来对多头注意力单元的输出进行正则化处理,进而可以对正则化处理结果和该初始区域特征进行残差连接处理,接着可以通过归一化单元对残差连接处理结果进行归一化处理,再接着可以通过前馈网络和残差连接单元中的前馈网络子单元对归一化处理结果进行全连接处理,并通过前馈网络和残差连接单元中的残差连接子单元对全连接处理结果以及正则化和残差连接单元中的残差连接处理结果进行残差连接处理,以得到待处理图像Ia对应的共视区域特征qa。
同理,可以采用获取待处理图像Ia对应的共视区域特征的方法,对待处理图像Ib对应的共视区域特征进行获取,在此不进行赘述。
在对该初始区域特征以及该关联特征进行交叉特征提取之后,便可以基于该共视区域特征以及该关联特征,在该待处理图像中识别出该共视区域中的共视图像。其中,基于该共视区域特征以及该关联特征,在该待处理图像中识别出该共视区域中的共视图像的方式可以有多种,例如,可以基于该共视区域特征和关联特征,计算该关联特征对应的共视权重,根据该共视权重以及该关联特征,在该待处理图像中确定关注中心坐标,对该共视区域特征进行回归处理,得到该共视区域对应的相对中心点偏移,根据该关注中心坐标以及该相对中心点偏移,在该待处理图像中识别出该共视区域中的共视图像。
其中,该共视权重(Attention Map)可以表示关联特征中每一位置的特征在关联特征中的重要程度,该关注中心坐标(Centerness)可以为基于共视权重确定的在共视区域中重要程度较高的中心的坐标,可以理解为共视区域的关注中心,该相对中心点偏移可以为关注中心坐标相对于共视区域的边界框的偏移距离,根据关注中心坐标以及对应的相对中心点偏移可以确定一个矩形框,也即可以确定共视区域。
其中,基于该共视区域特征和关联特征,计算该关联特征对应的共视权重的方式可以有多种,例如,可以待处理图像对应的共视区域特征和关联特征进行点积运算(dot product,又称数量积),来根据运算结果得到共视权重,可选的,该共视权重可以表示为
A dot(Q,F)∈R h×w
其中,A表示待处理图像对应的共视权重,dot()表示点积运算函数,Q表示关联特征,F表示共视区域特征,R表示维度,h表示共视权重分布的长度,w表示共视权重分布的宽度。
在基于该共视区域特征和关联特征,计算该关联特征对应的共视权重之后,便可以根据该共视权重以及该关联特征,在该待处理图像中确定关注中心坐标。其中,根据该共视权重以及该关联特征,在该待处理图像中确定关注中心坐标的方式可以有多种,比如,可以根据该共视权重以及该关联特征,计算该共视区域中每一预设坐标点的关注权重,基于该关注权重对该预设坐标点进行加权,得到加权后坐标点,对该加权后坐标点进行累加,得到该待处理图像中的关注中心坐标。
其中,该关注权重可以表征共视区域中每一预设坐标点的关注程度,可以理解为表征共视区域中每一预设坐标点为共视区域的几何中心点的概率大小,该预设坐标点可以为预设的相对坐标图中的坐标点,例如,可以将大小为w*h的图像划分为多个1*1的坐标方格(Grid),则可以得到相对坐标图,相对坐标图中每一Grid的坐标为预设坐标点的坐标,该加权后坐标点可以为基于关注权重进行加权后的坐标点。
其中,根据该共视权重以及该关联特征,计算该共视区域中每一预设坐标点的关注权重的方式可以有多种,例如,请继续参考图3b,可以通过特征融合模块以及加权求和关注中心模块(WS-Centerness)计算该共视区域中每一预设坐标点的关注权重,以得到共视区域的关注中心坐标,具体的,可以将关联特征转换为特征图的形式,从而可以对共视权重以及该关联特征进行叉乘运算,即A×F,并将叉乘运算的结果与关联特征进行残差连接处理,得到残差连接处理结果A×F+F,进而将残差连接处理结果A×F+F通过全卷积网络(Fully Convolution Network,FCN)进行卷积,来生成共视区域概率图P,也即共视区域中的中心坐标概率分布Pc(x、y),可以用于表征共视区域中每一预设坐标点对应的关注权重,其中,共视区域概率图P可以表示为
P=softmax(conv 3×3(A×F+F))
其中,×表示叉乘运算,+表示残差连接处理,softmax()表示逻辑回归函数,conv 3×3可以表示卷积核大小为3×3的卷积处理。
在根据该共视权重以及该关联特征,计算该共视区域中每一预设坐标点的关注权重之后,便可以基于该关注权重对该预设坐标点进行加权,得到加权后坐标点,对该加权后坐标点进行累加,得到该待处理图像中的关注中心坐标。其中,基于该关注权重对该预设坐标点进行加权求和的方式可以有多种,例如,可以将共视区域中的中心坐标概率分布Pc(x、y)与相对坐标图中对应的预设坐标点进行加权以及求和,得到共视区域的关注中心坐标,可以表示为
Figure PCTCN2022131464-appb-000012
其中,
Figure PCTCN2022131464-appb-000013
表示关注中心坐标中的横坐标,
Figure PCTCN2022131464-appb-000014
表示关注中心坐标中的纵坐标,H表示待处理图像的长度,W表示待处理图像的宽度,x表示相对坐标图中的横坐标,y表示相对坐标图中的纵坐标,∑表示求和符号。
在根据该共视权重以及该关联特征,在该待处理图像中确定关注中心坐标之后,便可以对该共视区域特征进行回归处理,得到该共视区域对应的相对中心点偏移。其中,对该共视区域特征进行回归处理的方式可以有多种,例如,请继续参考图3b,可以通过共视框回归模块(Box Regression)对该共视区域特征进行回归处理,具体的,可以假设共视区域特征可以为256维的向量,则可以通过全连接层对共视区域特征进行全连接处理,进而可以将全连接处理的结果通过激活函数(线性整流函数,ReLU函数)进行激活,从而可以将激活结果再通过全连接层进行全连接处理,来得到共视区域特征对应的4维向量,接着可以经过激活函数(Sigmoid)得到归一化后的4维的中心点偏移(L,T,M,J),最后L和M乘以待处理图像的宽度W,T和J乘以图像长度H,得到相对中心点偏移(l,t,m,j),例如,请参考图4b,图4b是本申请实施例提供的一种图像处理方法的关注中心坐标和相对中心点偏移示意图。
在对该共视区域特征进行回归处理之后,便可以根据该关注中心坐标以及该相对中心点偏移,在该待处理图像中识别出该共视区域中的共视图像。其中,根据该关注中心坐标以及该相对中心点偏移,在该待处理图像中识别出该共视区域中的共视图像的方式可以有多种,例如,可以根据该关注中心坐标以及该相对中心点偏移,计算该共视区域在该待处理图像中的几何中心坐标以及边界尺寸信息,基于该几何中心坐标以及该边界尺寸信息,在该待处理图像中确定出该待处理图像的共视区域,在该待处理图像中将该共视区域进行 分割,得到该共视区域中的共视图像。
其中,该几何中心坐标可以为共视区域对应的矩形框的几何中心的坐标,该边界尺寸信息可以为包括共视区域对应的矩形框的边长的尺寸的信息。
其中,根据该关注中心坐标以及该相对中心点偏移,计算该共视区域在该待处理图像中的几何中心坐标以及边界尺寸信息的方式可以有多种,例如,请继续参考图4b,假设关注中心坐标为(x c,y c),相对中心点偏移(l,t,m,j),同时假设j大于t,m大于l,且共视区域位于相对坐标图中的第一象限,则可以计算几何中心坐标的横坐标为[(l+m)/2]-l+x c,可以计算几何中心坐标的纵坐标为[(t+j)/2]+y c-j,即几何中心坐标为([(l+m)/2]-l+x c,[(t+j)/2]+y c-j),可以计算共视区域对应的矩形框的边界尺寸信息为长度为t+j,宽度为l+m。
在一个实施例中,可以对图像处理模型进行训练,来得到训练后图像处理模型,其中,对图像处理模型进行训练的方式可以有多种,例如,请继续参考图3b,可以通过对称中心一致性损失来对图像处理模型进行训练,具体的,可以获取图像样本对,采用预设图像处理模型预测该图像样本对中每一图像样本的共视区域,得到预测共视区域,根据该标注共视区域和预测共视区域对该预设图像处理模型进行训练,得到该训练后图像处理模型。
其中,该图像样本对可以为用于对预设图像处理模型进行训练的图像对样本,该图像样本对中的图像样本中包括标注共视区域,该预设图像处理模型可以为预先设计的还未训练好的图像处理模型,该预测共视区域可以为由预设图像处理模型基于输入的图像样本对预测得到的图像样本对应的共视区域,该标注共视区域可以为图像样本中预先标注好的共视区域。对预设图像处理模型进行训练即对预设图像处理模型的参数进行调整,在对预设图像处理模型进行训练的过程中,当满足训练停止条件时,得到训练后图像处理模型,其中,训练停止条件可以是训练时长达到预设时长、训练次数达到预设次数或者损失信息收敛中的任意一种。
其中,根据该标注共视区域和预测共视区域对该预设图像处理模型进行训练的方式可以有多种,例如,可以在该预测共视区域中,提取出该预测共视区域对应的预测几何中心坐标和预测边界尺寸信息,在该标注共视区域中,提取出该标注共视区域对应的标注几何中心坐标和标注边界尺寸信息,根据该预测几何中心坐标、预测边界尺寸信息、标注几何中心坐标以及标注边界尺寸信息,对该预设图像处理模型进行训练,得到训练后图像处理模型。
其中,该预测几何中心坐标可以为预测共视区域对应的矩形框的几何中心的坐标,该预测边界尺寸信息可以为包括预测共视区域对应的矩形框的边长的尺寸的信息,该标注几何中心坐标可以为标注共视区域对应的矩形框的几何中心的坐标,该标注边界尺寸信息可以为包括标注共视区域对应的矩形框的边长的尺寸的信息。
其中,在该预测共视区域中,提取出该预测共视区域对应的预测几何中心坐标和预测边界尺寸信息的方式可以有多种,例如,可以在该预测共视区域中,提取出该预测共视区域对应的预测关注中心坐标和该预测中心点偏移,根据该预测关注中心坐标以及该预测中心点偏移,确定该预测共视区域对应的预测几何中心坐标和预测边界尺寸信息。
其中,该预测关注中心坐标可以为预测共视区域中重要程度较高的中心的坐标,可以理解为预测共视区域的关注中心,该预测中心点偏移可以为预测关注中心坐标相对于预测共视区域的边界框的偏移距离。
在该预测共视区域中,提取出该预测共视区域对应的预测几何中心坐标和预测边界尺寸信息之后,便可以根据该预测几何中心坐标、预测边界尺寸信息、标注几何中心坐标以及标注边界尺寸信息,对该预设图像处理模型进行训练,得到训练后图像处理模型。其中,根据该预测几何中心坐标、预测边界尺寸信息、标注几何中心坐标以及标注边界尺寸信息,对该预设图像处理模型进行训练的方式可以有多种,例如,可以基于该预测几何中心坐标和标注几何中心坐标,计算该预设图像处理模型对应的循环一致性损失信息,基于该预测几何中心坐标和预测边界尺寸信息,以及该标注几何中心坐标和标注边界尺寸信息,分别计算该预设图像处理模型对应的边界损失信息以及平均绝对误差损失信息,将该循环一致性损失信息、该平均绝对误差损失信息以及该边界损失信息,作为该预设图像处理模型对应的损失信息,并根据该损失信息对该预设图像处理模型进行训练,得到训练后图像处理模型。
其中,该循环一致性损失信息可以为基于循环一致性损失函数(cycle consistency loss)确定的预设图像处理模型的损失信息,用于让两个生成器生成的样本之间不要相互矛盾。该平均绝对误差损失信息可以为基于回归损失函数(L1Loss)确定的损失信息,用于衡量的是一组预测值中的平均误差大小。该边界损失信息可以为基于边界损失函数(Generalized Intersection over Union)确定的损失信息,用于确定预测共视区域的边界框与的标注共视区域的边界框之间的差距的损失函数。
其中,基于该预测几何中心坐标和标注几何中心坐标,计算该预设图像处理模型对应的循环一致性损失信息的方式可以有多种,例如,该循环一致性损失信息可以表示为
Figure PCTCN2022131464-appb-000015
其中,L loc表示循环一致性损失信息,∥∥表示范数符号,其中,范数,是具有“长度”概念的函数。在线性代数、泛函分析及相关的数学领域,范数是一个函数,是矢量空间内的所有矢量赋予非零的正长度或大小。∥∥ 1表示1-范数,c i表示标注几何中心坐标,
Figure PCTCN2022131464-appb-000016
为预设图像处理模型中交换输入的待处理图像对之间的关联特征后得到的中心点坐标。
其中,基于该预测几何中心坐标和预测边界尺寸信息,以及该标注几何中心坐标和标注边界尺寸信息,分别计算该预设图像处理模型对应的边界损失信息以及平均绝对误差损失信息的方式可以有多种,例如,该平均绝对误差损失信息可以表示为
Figure PCTCN2022131464-appb-000017
其中,L L1表示平均绝对误差损失信息,b i表示经过归一化后的标注共视区域对应的标注几何中心坐标以及标注边界尺寸信息,
Figure PCTCN2022131464-appb-000018
表示经过归一化后的预测共视区域对应的预测几何中心坐标以及预测边界尺寸信息,b i∈[0,1] 4
该边界损失信息可以表示为
Figure PCTCN2022131464-appb-000019
其中,L giou表示边界损失信息,
Figure PCTCN2022131464-appb-000020
表示边界损失函数,b i表示经过归一化后的标注共视区域对应的标注几何中心坐标以及标注边界尺寸信息,
Figure PCTCN2022131464-appb-000021
表示经过归一化后的预测 共视区域对应的预测几何中心坐标以及预测边界尺寸信息。
以此,将该循环一致性损失信息、该平均绝对误差损失信息以及该边界损失信息,作为该预设图像处理模型对应的损失信息,可选的,预设图像处理模型对应的损失信息可以表示为
Figure PCTCN2022131464-appb-000022
其中,
Figure PCTCN2022131464-appb-000023
表示预设图像处理模型对应的损失信息,
Figure PCTCN2022131464-appb-000024
表示预测几何中心坐标与标注几何中心坐标之间的损失信息,λ con为其对应的超参数,λ lociou和λ L1分别为循环一致性损失信息、边界损失信息和平均绝对误差损失信息对应的超参数。
可选的,可以采用2张V100显卡在数据集(Megadepth)上35代训练(即35个epoch)复现,来对预设图像处理模型进行训练,例如,可以训练48小时。
以此,可以基于该预设图像处理模型对应的损失信息对预设图像处理模型进行训练,当该损失信息收敛时,该预设图像处理模型满足训练条件,可以将满足训练条件的预设图像处理模型作为训练后图像处理模型。
在根据关联特征,在待处理图像中识别出共视区域的共视图像之后,便可以计算该共视图像之间的尺度差值。其中,计算该共视图像之间的尺度差值的方式可以有多种,例如,可以获取每一该待处理图像对应的共视图像的尺寸信息,基于该尺寸信息计算该待处理图像之间的至少一个尺寸差值,在该尺寸差值中筛选出满足预设条件的目标尺寸差值,并将该目标尺寸差值作为该共视图像之间的尺度差值。
其中,该尺寸信息可以为包含每一待处理图像对应的共视图像的尺寸的信息,例如,可以包括共视图像的长度以及宽度等尺寸信息。该尺寸差值可以为表征待处理图像的尺寸信息之间的差距的数值,该目标尺寸差值可以为在尺寸差值中筛选出来作为尺度差值的尺寸差值。
其中,基于该尺寸信息计算该待处理图像之间的至少一个尺寸差值的方式可以有多种,比如,可以计算每一共视图像的宽度以及长度之间的比值,来得到共视图像之间的至少一个尺寸差值,例如,假设待处理图像对中包括待处理图像Ia和Ib,待处理图像Ia对应的共视图像为Ia',共视图像Ia'对应的尺寸信息为长度为ha、宽度为wa,待处理图像Ib对应的共视图像为Ib',共视图像Ib'对应的尺寸信息为长度为hb、宽度为wb,则可以得到四个尺寸差值分别为ha/hb、hb/ha、wa/wb、wb/wa。
在基于该尺寸信息计算该待处理图像之间的至少一个尺寸差值之后,便可以在该尺寸差值中筛选出满足预设条件的目标尺寸差值。其中,在该尺寸差值中筛选出满足预设条件的目标尺寸差值的方式可以有多种,比如,可以在尺寸差值中筛选出数值最大的尺寸差值,来作为目标尺寸差值,例如,假设待处理图像对中包括待处理图像Ia和Ib,待处理图像Ia对应的共视图像为Ia',共视图像Ia'对应的尺寸信息为长度为ha、宽度为wa,待处理图像Ib对应的共视图像为Ib',共视图像Ib'对应的尺寸信息为长度为hb、宽度为wb,则可以得到四个尺寸差值分别为(ha/hb,hb/ha,wa/wb,wb/wa),则目标尺寸差值可以为S(Ia',Ib') =max(ha/hb,hb/ha,wa/wb,wb/wa),其中,max()可以表示为取最大值的函数,从而可以将最大的尺寸差值作为该共视图像之间的尺度差值。
104、基于尺度差值,对共视图像的尺寸进行调整,得到调整后共视图像。
其中,调整后共视图像可以为根据共视图像之间的尺度差值进行调整后得到的共视图像。
为了提高共视图像之间特征点提取与匹配的准确性,可以对基于尺度差值对每一共视图像的尺寸进行调整,以可以在同一尺度的共视图像中进行特征点的提取与匹配等处理,其中,基于尺度差值,对共视图像的尺寸进行调整的方式可以有多种,例如,可以获取共视图像的原始长度与原始宽度,并将共视图像的原始长度与原始宽度与该尺度差值分别进行相乘,来得到调整后尺度以及调整后宽度,从而可以基于调整后尺度以及调整后宽度,来对共视图像进行缩放,以对共视图像的尺寸进行调整,来得到调整后共视图像。
105、在每一调整后共视图像中提取出至少一个共视特征点,并基于共视特征点,对待处理图像对进行处理。
其中,该共视特征点可以为在调整后共视图像中提取出来的特征点。
其中,在每一调整后共视图像中提取出至少一个共视特征点的方式可以有多种,例如,可以采用角点检测算法(FAST算法)、尺度不变特征变换(Scale-Invariant Feature Transform,简称SIFT)、加速稳健特征算法(Speeded Up Robust Features,简称SURF)等特征点提取方法,来在每一调整后共视图像中提取出至少一个共视特征点。
在每一调整后共视图像中提取出至少一个共视特征点之后,便可以基于共视特征点,对待处理图像对进行处理。其中,基于共视特征点,对待处理图像对进行处理的方式可以有多种,例如,可以对该待处理图像对中每一该待处理图像在该调整后共视图像中的共视特征点进行特征点匹配,得到匹配后共视特征点,基于该尺度差值以及该调整后共视图像的尺寸信息,在该待处理图像中确定该匹配后共视特征点对应的源特征点,基于该源特征点,对该待处理图像对进行处理。
其中,该匹配后共视特征点可以为在某一待处理图像的调整后共视图像中与其他调整后共视图像中的共视特征点匹配的共视特征点,该源特征点可以为匹配后共视特征点对应的待处理图像中对应的特征点。
其中,对该待处理图像对中每一该待处理图像在该调整后共视图像中的共视特征点进行特征点匹配的方式可以有多种,例如,可以采用距离匹配方法(Brute-Froce Matcher)来计算某一个共视特征点描述子与其他调整后共视图像中所有共视特征点描述子之间的距离,然后将得到的距离进行排序,取距离最近的一个共视特征点作为匹配点,来得到匹配后共视特征点。
在对该待处理图像对中每一该待处理图像在该调整后共视图像中的共视特征点进行特征点匹配之后,便可以基于该尺度差值以及该调整后共视图像的尺寸信息,在该待处理图像中确定该匹配后共视特征点对应的源特征点,其中,基于该尺度差值以及该调整后共视图像的尺寸信息,在该待处理图像中确定该匹配后共视特征点对应的源特征点的方式可以有多种,例如,可以根据调整后共视图像中的匹配后共视特征点进行调整后共视图像的位姿估计(Pose Estimation),来得到调整后共视图像对应的调整后位姿信息,从而可以基于调整后位姿信息、该尺度差值以及该调整后共视图像的尺寸信息,来计算待处理图像对应的原始位姿信息,从而可以根据原始位姿信息,将匹配后共视特征点在调整后共视图 像中的位置进行逆变换到待处理图像上,从而可以在该待处理图像中确定该匹配后共视特征点对应的源特征点。
可选的,可以采用随机抽样一致算法(RANdom SAmple Consensus,简称RANSAC)来根据调整后共视图像中的匹配后共视特征点进行调整后共视图像的位姿估计,RANSAC算法是一种在包含离群点在内的数据集里,通过迭代的方式估计模型的参数。
在基于该尺度差值以及该共视图像的尺寸信息,在该待处理图像中确定该匹配后共视特征点对应的源特征点之后,便可以基于该源特征点,对该待处理图像对进行处理,其中,基于该源特征点,对该待处理图像对进行处理的方式可以有多种,例如,可以对待处理图像中的特征点进行提取、匹配以及定位等处理,还可以在此基础上对待处理图像进行进一步的应用,例如,可以在虚拟地图应用中进行数据定位等,在此不做限定。
本申请实施例还提供了一种图像处理方法,请参考图5,图5是本申请实施例提供的一种图像处理方法的整体流程示意图,在第一阶段中,通过本申请实施例提供的图像处理模型对输入的两张待处理图像的共视区域进行回归获取对应区域所在位置,并分割出共视图像,在第二阶段中,再在图像层面对共视图像进行尺度对齐,在尺度对齐的调整后共视图像上进行特征点提取和匹配,一方面可以保证特征点在一个尺度的图像上进行提取,可以降低特征点提取与匹配的难度,提高特征点提取与匹配的效率,另一方面,在共视区域中进行特征点的匹配,可以有效提高外点过滤作用,提高特征点匹配的准确性,同时可以提高特征点匹配的速率,在第三阶段中,通过计算待处理图像对应的原始位姿信息,从而可以根据原始位姿信息,将匹配后共视特征点在调整后共视图像中的位置进行逆变换到待处理图像上,从而可以在该待处理图像中确定该匹配后共视特征点对应的源特征点。以此,本申请实施例提供的图像处理方法可以有效处理尺度差异大的情况下的特征提取、匹配与定位,比现有特征提取匹配算法更为稠密,适用于图像配准、大规模场景重建、同时定位与建图(SLAM)以及视觉定位等任务,可以提高图像处理的准确性以及速率,从而提升了图像处理效率。
由以上可知,本申请实施例通过获取待处理图像对,并对待处理图像对中的待处理图像进行图像特征提取,得到待处理图像的图像特征;在图像特征中提取出待处理图像对的关联特征;根据关联特征,在待处理图像中识别出共视区域的共视图像,并计算共视图像之间的尺度差值;基于尺度差值,对共视图像的尺寸进行调整,得到调整后共视图像;在每一调整后共视图像中提取出至少一个共视特征点,并基于共视特征点,对待处理图像对进行处理。以此,通过在图像特征中提取出表征待处理图像之间的相互信息的关联特征,并根据该关联特征在待处理图像中识别出两张待处理图像之间的共视区域的共视图像,以基于共视图像来对共视区域中的共视特征点进行快速提取以及匹配,提高了特征点匹配的速率以及准确性,进而提高了图像处理的准确性以及速度,从而提升了图像处理效率。
根据上面实施例所描述的方法,以下将举例作进一步详细说明。
在本实施例中,将以该图像处理装置具体集成在计算机设备为例进行说明。其中,该图像处理方法以服务器为执行主体进行具体的描述。需要说明的是,该实施例中所包括的与上文实施例中相同的部分,可以参考上文实施例中的相关解释。为了更好的描述本申请实施例,请参阅图6。如图6所示,图6为本申请实施例提供的图像处理方法的另一流程示意图。具体流程如下:
在步骤201中,服务器获取图像样本对,采用预设图像处理模型预测该图像样本对中 每一图像样本的共视区域,得到预测共视区域,在该预测共视区域中,提取出该预测共视区域对应的预测关注中心坐标和该预测中心点偏移,根据该预测关注中心坐标以及该预测中心点偏移,确定该预测共视区域对应的预测几何中心坐标和预测边界尺寸信息。
在步骤202中,服务器在该图像样本的标注共视区域中,提取出该标注共视区域对应的标注几何中心坐标和标注边界尺寸信息,基于该预测几何中心坐标和标注几何中心坐标,计算该预设图像处理模型对应的循环一致性损失信息,基于该预测几何中心坐标和预测边界尺寸信息,以及该标注几何中心坐标和标注边界尺寸信息,分别计算该预设图像处理模型对应的边界损失信息以及平均绝对误差损失信息。
在步骤203中,服务器将该循环一致性损失信息、该平均绝对误差损失信息以及该边界损失信息,作为该预设图像处理模型对应的损失信息,并根据该损失信息对该预设图像处理模型进行训练,得到训练后图像处理模型。
在步骤204中,服务器获取待处理图像对,对该待处理图像对中的待处理图像进行特征映射,得到该待处理图像对应的特征图,对该待处理图像对应的特征图进行降维处理,得到降维后特征图,对降维后特征图进行多尺度的特征提取,得到该待处理图像在每一尺度对应的尺度图像特征,将该待处理图像在每一尺度对应的尺度图像特征进行融合,得到该待处理图像的图像特征。
在步骤205中,服务器对该图像特征进行扁平化处理,得到该待处理图像的扁平图像特征,对该扁平图像特征进行特征提取,得到该扁平图像特征中的每一子扁平图像特征对应的初始关联特征,基于该初始关联特征,确定该扁平图像特征中的每一子扁平图像特征对应的初始关联权重,根据该初始关联权重对该扁平图像特征中的每一子扁平图像特征进行融合,得到该待处理图像对应的初始注意力特征。
在步骤206中,服务器对该图像特征以及该初始注意力特征进行交叉特征提取,得到每一该待处理图像对应的交叉关联特征,根据该交叉关联特征,确定该待处理图像对应的交叉关联权重,基于该交叉关联权重,对每一该待处理图像对应的初始注意力特征进行加权,以得到该待处理图像对应的关联特征。
在步骤207中,服务器获取预设区域特征,采用训练后图像处理模型对该预设区域特征进行特征提取,得到该预设区域特征中每一区域子特征对应的区域关联特征,基于该区域关联特征,确定该预设区域特征中每一区域子特征对应的区域关联权重,根据该区域关联权重,对该预设区域特征中每一区域子特征进行融合,得到初始区域特征。
在步骤208中,服务器对该初始区域特征和该关联特征进行特征提取,得到该关联特征对应的图像关联特征,以及该初始区域特征对应的初始区域关联特征,根据该图像关联特征和该初始区域关联特征确定该关联特征对应的图像关联权重,基于该图像关联权重,对该关联特征进行加权,得到共视图像特征,并将该共视图像特征和该初始区域特征进行融合,得到共视区域特征。
在步骤209中,服务器基于该共视区域特征和关联特征,计算该关联特征对应的共视权重,根据该共视权重以及该共视图像特征,计算该共视区域中每一预设坐标点的关注权重,基于该关注权重对该预设坐标点进行加权,得到加权后坐标点,对该加权后坐标点进行累加,得到该待处理图像中的关注中心坐标。
在步骤210中,服务器对该共视区域特征进行回归处理,得到该共视区域对应的相对中心点偏移,根据该关注中心坐标以及该相对中心点偏移,计算该共视区域在该待处理图 像中的几何中心坐标以及边界尺寸信息,基于该几何中心坐标以及该边界尺寸信息,在该待处理图像中确定出该待处理图像的共视区域,在该待处理图像中将该共视区域进行分割,得到该共视区域中的共视图像。
在步骤211中,服务器获取每一该待处理图像对应的共视图像的尺寸信息,基于该尺寸信息计算该待处理图像之间的至少一个尺寸差值,在该尺寸差值中筛选出满足预设条件的目标尺寸差值,并将该目标尺寸差值作为该共视图像之间的尺度差值,基于该尺度差值,对该共视图像的尺寸进行调整,得到调整后共视图像。
在步骤212中,服务器在每一该调整后共视图像中提取出至少一个共视特征点,对该待处理图像对中每一该待处理图像在该调整后共视图像中的共视特征点进行特征点匹配,得到匹配后共视特征点,基于该尺度差值以及该调整后共视图像的尺寸信息,在该待处理图像中确定该匹配后共视特征点对应的源特征点,基于该源特征点,对该待处理图像对进行处理。
由以上可知,本申请实施例训练通过在图像特征中提取出表征待处理图像之间的相互信息的关联特征,并根据该关联特征在待处理图像中识别出两张待处理图像之间的共视区域的共视图像,以基于共视图像来对共视区域中的共视特征点进行快速提取以及匹配,提高了特征点匹配的速率以及准确性,可以有效处理尺度差异大的情况下的特征点的提取、匹配与定位,进而提高了图像处理的准确性以及速度,从而提升了图像处理效率。
应该理解的是,虽然如上的各实施例所涉及的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,如上的各实施例所涉及的流程图中的至少一部分步骤可以包括多个步骤或者多个阶段,这些步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤中的步骤或者阶段的至少一部分轮流或者交替地执行。
为了更好地实施以上方法,本申请实施例还提供一种图像处理装置,该图像处理装置可以集成在计算机设备中,该计算机设备可以为服务器。
例如,如图7所示,为本申请实施例提供的图像处理装置的结构示意图,该图像处理装置可以包括获取单元301、提取单元302、识别单元303、调整单元304和处理单元305,如下:
获取单元301,用于获取待处理图像对,并对该待处理图像对中的待处理图像进行图像特征提取,得到该待处理图像的图像特征;
提取单元302,用于在该图像特征中提取出该待处理图像对的关联特征,该关联特征用于表征该待处理图像对中的待处理图像之间的相互信息;
识别单元303,用于根据该关联特征,在该待处理图像中识别出共视区域的共视图像,并计算该共视图像之间的尺度差值;
调整单元304,用于基于该尺度差值,对该共视图像的尺寸进行调整,得到调整后共视图像;
处理单元305,用于在每一该调整后共视图像中提取出至少一个共视特征点,并基于该共视特征点,对该待处理图像对进行处理。
在一个实施例中,该识别单元303,包括:初始区域特征提取子单元,用于获取预设 区域特征,并采用训练后图像处理模型对该预设区域特征进行特征提取,得到初始区域特征;交叉特征提取子单元,用于对该初始区域特征以及该关联特征进行交叉特征提取,得到该初始区域特征对应的共视区域特征;共视图像识别子单元,用于基于该共视区域特征以及该关联特征,在该待处理图像中识别出该共视区域中的共视图像。
在一个实施例中,该初始区域特征提取子单元,包括:区域关联特征提取模块,用于采用训练后图像处理模型对该预设区域特征进行特征提取,得到该预设区域特征中每一区域子特征对应的区域关联特征;区域关联权重确定模块,用于基于该区域关联特征,确定该预设区域特征中每一区域子特征对应的区域关联权重;初始区域特征融合模块,用于根据该区域关联权重,对该预设区域特征中每一区域子特征进行融合,得到初始区域特征。
在一个实施例中,该交叉特征提取子单元,包括:交叉特征提取模块,用于对该初始区域特征和该关联特征进行特征提取,得到该关联特征对应的图像关联特征,以及该初始区域特征对应的初始区域关联特征;关联权重确定模块,用于根据该图像关联特征和该初始区域关联特征确定该关联特征对应的图像关联权重;共视加权模块,用于基于该图像关联权重,对该关联特征进行加权,得到共视图像特征,并将该共视图像特征和该初始区域特征进行融合,得到共视区域特征。
在一个实施例中,该共视图像识别子单元,包括:共视权重计算模块,用于基于该共视区域特征和关联特征,计算该关联特征对应的共视权重;关注中心坐标确定模块,用于根据该共视权重以及该关联特征,在该待处理图像中确定关注中心坐标;相对中心点偏移回归模块,用于对该共视区域特征进行回归处理,得到该共视区域对应的相对中心点偏移;共视图像识别模块,用于根据该关注中心坐标以及该相对中心点偏移,在该待处理图像中识别出该共视区域中的共视图像。
在一个实施例中,该共视图像识别模块,包括:几何中心坐标以及边界尺寸信息计算子模块,用于根据该关注中心坐标以及该相对中心点偏移,计算该共视区域在该待处理图像中的几何中心坐标以及边界尺寸信息;共视区域确定子模块,用于基于该几何中心坐标以及该边界尺寸信息,在该待处理图像中确定出该待处理图像的共视区域;共视图像分割子模块,用于在该待处理图像中将该共视区域进行分割,得到该共视区域中的共视图像。
在一个实施例中,该关注中心坐标确定模块,包括:关注权重计算子模块,用于根据该共视权重以及该共视图像特征,计算该共视区域中每一预设坐标点的关注权重;坐标点加权子模块,用于基于该关注权重对该预设坐标点进行加权,得到加权后坐标点;坐标点累加子模块,用于对该加权后坐标点进行累加,得到该待处理图像中的关注中心坐标。
在一个实施例中,该图像处理装置,还包括:图像样本对获取单元,用于获取图像样本对,该图像样本对的图像样本中包括标注共视区域;预测共视区域预测单元,用于采用预设图像处理模型预测该图像样本对中每一图像样本的共视区域,得到预测共视区域;训练单元,用于根据该标注共视区域和预测共视区域对该预设图像处理模型进行训练,得到该训练后图像处理模型。
在一个实施例中,该训练单元,包括:预测几何中心坐标和预测边界尺寸信息提取子单元,用于在该预测共视区域中,提取出该预测共视区域对应的预测几何中心坐标和预测边界尺寸信息;标注几何中心坐标和标注边界尺寸信息提取子单元,用于在该标注共视区域中,提取出该标注共视区域对应的标注几何中心坐标和标注边界尺寸信息;训练子单元,用于根据该预测几何中心坐标、预测边界尺寸信息、标注几何中心坐标以及标注边界尺寸 信息,对该预设图像处理模型进行训练,得到训练后图像处理模型。
在一个实施例中,该预测几何中心坐标和预测边界尺寸信息提取子单元,用于:在该预测共视区域中,提取出该预测共视区域对应的预测关注中心坐标和该预测中心点偏移;根据该预测关注中心坐标以及该预测中心点偏移,确定该预测共视区域对应的预测几何中心坐标和预测边界尺寸信息。
在一个实施例中,该训练子单元,包括:第一损失信息计算模块,用于基于该预测几何中心坐标和标注几何中心坐标,计算该预设图像处理模型对应的循环一致性损失信息;第二损失信息计算模块,用于基于该预测几何中心坐标和预测边界尺寸信息,以及该标注几何中心坐标和标注边界尺寸信息,分别计算该预设图像处理模型对应的边界损失信息以及平均绝对误差损失信息;训练模块,用于将该循环一致性损失信息、该平均绝对误差损失信息以及该边界损失信息,作为该预设图像处理模型对应的损失信息,并根据该损失信息对该预设图像处理模型进行训练,得到训练后图像处理模型。
在一个实施例中,该提取单元302,包括:扁平化处理子单元,用于对该图像特征进行扁平化处理,得到该待处理图像的扁平图像特征;初始注意力特征提取子单元,用于对该扁平图像特征进行特征提取,得到该待处理图像对应的初始注意力特征;关联特征交叉提取子单元,用于对该初始注意力特征进行交叉特征提取,得到该待处理图像对中每一该待处理图像的关联特征。
在一个实施例中,该初始注意力特征提取子单元,包括:初始关联特征提取模块,用于对该扁平图像特征进行特征提取,得到该扁平图像特征中的每一子扁平图像特征对应的初始关联特征;初始关联权重确定模块,用于基于该初始关联特征,确定该扁平图像特征中的每一子扁平图像特征对应的初始关联权重;初始注意力特征融合模块,用于根据该初始关联权重对该扁平图像特征中的每一子扁平图像特征进行融合,得到该待处理图像对应的初始注意力特征。
在一个实施例中,该关联特征交叉提取子单元,包括:交叉关联特征提取模块,用于对该图像特征以及该初始注意力特征进行交叉特征提取,得到每一该待处理图像对应的交叉关联特征;交叉关联权重确定模块,用于根据该交叉关联特征,确定该待处理图像对应的交叉关联权重;交叉关联权重加权模块,用于基于该交叉关联权重,对每一该待处理图像对应的初始注意力特征进行加权,以得到该待处理图像对应的关联特征。
在一个实施例中,该获取单元301,包括:特征映射子单元,用于对该待处理图像对中的待处理图像进行特征映射,得到该待处理图像对应的特征图;降维处理子单元,用于对该待处理图像对应的特征图进行降维处理,得到降维后特征图;尺度图像特征提取子单元,用于对降维后特征图进行多尺度的特征提取,得到该待处理图像在每一尺度对应的尺度图像特征;图像特征融合子单元,用于将该待处理图像在每一尺度对应的尺度图像特征进行融合,得到该待处理图像的图像特征。
在一个实施例中,该识别单元303,包括:尺寸信息获取子单元,用于获取每一该待处理图像对应的共视图像的尺寸信息;尺寸差值计算子单元,用于基于该尺寸信息计算该待处理图像之间的至少一个尺寸差值;尺度差值筛选子单元,用于在该尺寸差值中筛选出满足预设条件的目标尺寸差值,并将该目标尺寸差值作为该共视图像之间的尺度差值。
在一个实施例中,该处理单元305,包括:共视特征点匹配子单元,用于对该待处理图像对中每一该待处理图像在该调整后共视图像中的共视特征点进行特征点匹配,得到匹 配后共视特征点;源特征点确定子单元,用于基于该尺度差值以及该调整后共视图像的尺寸信息,在该待处理图像中确定该匹配后共视特征点对应的源特征点;处理子单元,用于基于该源特征点,对该待处理图像对进行处理。
具体实施时,以上各个单元可以作为独立的实体来实现,也可以进行任意组合,作为同一或若干个实体来实现,以上各个单元的具体实施可参见前面的方法实施例,在此不再赘述。
由以上可知,本申请实施例通过获取单元301获取待处理图像对,并对待处理图像对中的待处理图像进行图像特征提取,得到待处理图像的图像特征;提取单元302在图像特征中提取出待处理图像对的关联特征;识别单元303根据关联特征,在待处理图像中识别出共视区域的共视图像,并计算共视图像之间的尺度差值;调整单元304基于尺度差值,对共视图像的尺寸进行调整,得到调整后共视图像;处理单元305在每一调整后共视图像中提取出至少一个共视特征点,并基于共视特征点,对待处理图像对进行处理。以此,通过在图像特征中提取出表征待处理图像之间的相互信息的关联特征,并根据该关联特征在待处理图像中识别出两张待处理图像之间的共视区域的共视图像,以基于共视图像来对共视区域中的共视特征点进行快速提取以及匹配,提高了特征点匹配的速率以及准确性,进而提高了图像处理的准确性以及速度,从而提升了图像处理效率。
本申请实施例还提供一种计算机设备,如图8所示,其示出了本申请实施例所涉及的计算机设备的结构示意图,该计算机设备可以是服务器,具体来讲:
该计算机设备可以包括一个或者一个以上处理核心的处理器401、一个或一个以上计算机可读存储介质的存储器402、电源403和输入单元404等部件。本领域技术人员可以理解,图8中示出的计算机设备结构并不构成对计算机设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。其中:
处理器401是该计算机设备的控制中心,利用各种接口和线路连接整个计算机设备的各个部分,通过运行或执行存储在存储器402内的软件程序和/或模块,以及调用存储在存储器402内的数据,执行计算机设备的各种功能和处理数据。可选的,处理器401可包括一个或多个处理核心;优选的,处理器401可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器401中。
存储器402可用于存储软件程序以及模块,处理器401通过运行存储在存储器402的软件程序以及模块,从而执行各种功能应用以及图像处理。存储器402可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据计算机设备的使用所创建的数据等。此外,存储器402可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。相应地,存储器402还可以包括存储器控制器,以提供处理器401对存储器402的访问。
计算机设备还包括给各个部件供电的电源403,优选的,电源403可以通过电源管理系统与处理器401逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。电源403还可以包括一个或一个以上的直流或交流电源、再充电系统、电源故障检测电路、电源转换器或者逆变器、电源状态指示器等任意组件。
该计算机设备还可包括输入单元404,该输入单元404可用于接收输入的数字或字符信 息,以及产生与用户设置以及功能控制有关的键盘、鼠标、操作杆、光学或者轨迹球信号输入。
尽管未示出,计算机设备还可以包括显示单元等,在此不再赘述。具体在本实施例中,计算机设备中的处理器401会按照如下的指令,将一个或一个以上的应用程序的进程对应的可执行文件加载到存储器402中,并由处理器401来运行存储在存储器402中的应用程序,从而实现一种图像处理方法,该图像处理方法与上文实施例中的图像处理方法属于同一构思,其具体实现过程详见上文方法实施例。
在一个实施例中,提供了一种计算机设备,包括存储器和处理器,存储器中存储有计算机可读指令,该处理器执行计算机可读指令时实现上述图像处理方法的步骤。
在一个实施例中,提供了一种计算机可读存储介质,其上存储有计算机可读指令,计算机可读指令被处理器执行时实现上述图像处理方法的步骤。
在一个实施例中,提供了一种计算机程序产品,包括计算机可读指令,该计算机可读指令被处理器执行时实现上述图像处理方法的步骤。
需要说明的是,本申请所涉及的用户信息(包括但不限于用户设备信息、用户个人信息等)和数据(包括但不限于用于分析的数据、存储的数据、展示的数据等),均为经用户授权或者经过各方充分授权的信息和数据。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、数据库或其它介质的任何引用,均可包括非易失性和易失性存储器中的至少一种。非易失性存储器可包括只读存储器(Read-Only Memory,ROM)、磁带、软盘、闪存、光存储器、高密度嵌入式非易失性存储器、阻变存储器(ReRAM)、磁变存储器(Magnetoresistive Random Access Memory,MRAM)、铁电存储器(Ferroelectric Random Access Memory,FRAM)、相变存储器(Phase Change Memory,PCM)、石墨烯存储器等。易失性存储器可包括随机存取存储器(Random Access Memory,RAM)或外部高速缓冲存储器等。作为说明而非局限,RAM可以是多种形式,比如静态随机存取存储器(Static Random Access Memory,SRAM)或动态随机存取存储器(Dynamic Random Access Memory,DRAM)等。本申请所提供的各实施例中所涉及的数据库可包括关系型数据库和非关系型数据库中至少一种。非关系型数据库可包括基于区块链的分布式数据库等,不限于此。本申请所提供的各实施例中所涉及的处理器可为通用处理器、中央处理器、图形处理器、数字信号处理器、可编程逻辑器、基于量子计算的数据处理逻辑器等,不限于此。
以上所述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (20)

  1. 一种图像处理方法,由计算机设备执行,包括:
    获取待处理图像对,并对所述待处理图像对中的待处理图像进行图像特征提取,得到所述待处理图像的图像特征;
    在所述图像特征中提取出所述待处理图像对的关联特征,所述关联特征用于表征所述待处理图像对中的待处理图像之间的相互信息;
    根据所述关联特征,在所述待处理图像中识别出共视区域的共视图像,并计算所述共视图像之间的尺度差值;
    基于所述尺度差值,对所述共视图像的尺寸进行调整,得到调整后共视图像;及
    在每一所述调整后共视图像中提取出至少一个共视特征点,并基于所述共视特征点,对所述待处理图像对进行处理。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述关联特征,在所述待处理图像中识别出共视区域的共视图像,包括:
    获取预设区域特征,并采用训练后图像处理模型对所述预设区域特征进行特征提取,得到初始区域特征;
    对所述初始区域特征以及所述关联特征进行交叉特征提取,得到所述初始区域特征对应的共视区域特征;及
    基于所述共视区域特征以及所述关联特征,在所述待处理图像中识别出所述共视区域中的共视图像。
  3. 根据权利要求2所述的方法,其特征在于,所述预设区域特征包括多个区域子特征,所述采用训练后图像处理模型对所述预设区域特征进行特征提取,得到初始区域特征,包括:
    采用训练后图像处理模型对所述预设区域特征进行特征提取,得到所述预设区域特征中每一区域子特征对应的区域关联特征;
    基于所述区域关联特征,确定所述预设区域特征中每一区域子特征对应的区域关联权重;及
    根据所述区域关联权重,对所述预设区域特征中每一区域子特征进行融合,得到初始区域特征。
  4. 根据权利要求2所述的方法,其特征在于,所述对所述初始区域特征以及所述关联特征进行交叉特征提取,得到所述初始区域特征对应的共视区域特征,包括:
    对所述初始区域特征和所述关联特征进行特征提取,得到所述关联特征对应的图像关联特征,以及所述初始区域特征对应的初始区域关联特征;
    根据所述图像关联特征和所述初始区域关联特征确定所述关联特征对应的图像关联权重;及
    基于所述图像关联权重,对所述关联特征进行加权,得到共视图像特征,并将所述共视图像特征和所述初始区域特征进行融合,得到共视区域特征。
  5. 根据权利要求2所述的方法,其特征在于,所述基于所述共视区域特征以及所述关联特征,在所述待处理图像中识别出所述共视区域中的共视图像,包括:
    基于所述共视区域特征和关联特征,计算所述关联特征对应的共视权重;
    根据所述共视权重以及所述关联特征,在所述待处理图像中确定关注中心坐标;
    对所述共视区域特征进行回归处理,得到所述共视区域对应的相对中心点偏移;及
    根据所述关注中心坐标以及所述相对中心点偏移,在所述待处理图像中识别出所述共视区域中的共视图像。
  6. 根据权利要求5所述的方法,其特征在于,所述根据所述关注中心坐标以及所述相对中心点偏移,在所述待处理图像中识别出所述共视区域中的共视图像,包括:
    根据所述关注中心坐标以及所述相对中心点偏移,计算所述共视区域在所述待处理图像中的几何中心坐标以及边界尺寸信息;
    基于所述几何中心坐标以及所述边界尺寸信息,在所述待处理图像中确定出所述待处理图像的共视区域;及
    在所述待处理图像中将所述共视区域进行分割,得到所述共视区域中的共视图像。
  7. 根据权利要求5所述的方法,其特征在于,所述根据所述共视权重以及所述关联特征,在所述待处理图像中确定关注中心坐标,包括:
    根据所述共视权重以及所述关联特征,计算所述共视区域中每一预设坐标点的关注权重;
    基于所述关注权重对所述预设坐标点进行加权,得到加权后坐标点;及
    对所述加权后坐标点进行累加,得到所述待处理图像中的关注中心坐标。
  8. 根据权利要求2所述的方法,其特征在于,所述采用训练后图像模型对所述预设区域特征进行特征提取,得到初始区域特征之前,还包括:
    获取图像样本对,所述图像样本对中的图像样本中包括标注共视区域;
    采用预设图像处理模型预测所述图像样本对中每一图像样本的共视区域,得到预测共视区域;及
    根据所述标注共视区域和预测共视区域对所述预设图像处理模型进行训练,得到所述训练后图像处理模型。
  9. 根据权利要求8所述的方法,其特征在于,所述根据所述标注共视区域和预测共视区域对所述预设图像处理模型进行训练,得到所述训练后图像处理模型,包括:
    在所述预测共视区域中,提取出所述预测共视区域对应的预测几何中心坐标和预测边界尺寸信息;
    在所述标注共视区域中,提取出所述标注共视区域对应的标注几何中心坐标和标注边界尺寸信息;及
    根据所述预测几何中心坐标、预测边界尺寸信息、标注几何中心坐标以及标注边界尺寸信息,对所述预设图像处理模型进行训练,得到训练后图像处理模型。
  10. 根据权利要求9所述的方法,其特征在于,所述在所述预测共视区域中,提取出所述预测共视区域对应的预测几何中心坐标和预测边界尺寸信息,包括:
    在所述预测共视区域中,提取出所述预测共视区域对应的预测关注中心坐标和所述预测中心点偏移;及
    根据所述预测关注中心坐标以及所述预测中心点偏移,确定所述预测共视区域对应的预测几何中心坐标和预测边界尺寸信息。
  11. 根据权利要求10所述的方法,其特征在于,所述根据所述预测几何中心坐标、预测边界尺寸信息、标注几何中心坐标以及标注边界尺寸信息,对所述预设图像处理模型进行训练,得到训练后图像处理模型,包括:
    基于所述预测几何中心坐标和标注几何中心坐标,计算所述预设图像处理模型对应的循环一致性损失信息;
    基于所述预测几何中心坐标和预测边界尺寸信息,以及所述标注几何中心坐标和标注边界尺寸信息,分别计算所述预设图像处理模型对应的边界损失信息以及平均绝对误差损失信息;及
    将所述循环一致性损失信息、所述平均绝对误差损失信息以及所述边界损失信息,作为所述预设图像处理模型对应的损失信息,并根据所述损失信息对所述预设图像处理模型进行训练,得到训练后图像处理模型。
  12. 根据权利要求1所述的方法,其特征在于,所述在所述图像特征中提取出所述待处理图像对的关联特征,包括:
    对所述图像特征进行扁平化处理,得到所述待处理图像的扁平图像特征;
    对所述扁平图像特征进行特征提取,得到所述待处理图像对应的初始注意力特征;及
    对所述初始注意力特征进行交叉特征提取,得到所述待处理图像对中每一所述待处理图像的关联特征。
  13. 根据权利要求12所述的方法,其特征在于,所述扁平图像特征包含多个子扁平图像特征,所述对所述扁平图像特征进行特征提取,得到所述待处理图像对应的初始注意力特征,包括:
    对所述扁平图像特征进行特征提取,得到所述扁平图像特征中的每一子扁平图像特征对应的初始关联特征;
    基于所述初始关联特征,确定所述扁平图像特征中的每一子扁平图像特征对应的初始关联权重;及
    根据所述初始关联权重对所述扁平图像特征中的每一子扁平图像特征进行融合,得到所述待处理图像对应的初始注意力特征。
  14. 根据权利要求12所述的方法,其特征在于,所述对所述初始注意力特征进行交叉特征提取,得到所述待处理图像对中每一所述待处理图像的关联特征,包括:
    对所述图像特征以及所述初始注意力特征进行交叉特征提取,得到每一所述待处理图像对应的交叉关联特征;
    根据所述交叉关联特征,确定所述待处理图像对应的交叉关联权重;及
    基于所述交叉关联权重,对每一所述待处理图像对应的初始注意力特征进行加权,以得到所述待处理图像对应的关联特征。
  15. 根据权利要求1所述的方法,其特征在于,所述对所述待处理图像对中的待处理图像进行图像特征提取,得到所述待处理图像的图像特征,包括:
    对所述待处理图像对中的待处理图像进行特征映射,得到所述待处理图像对应的特征图;
    对所述待处理图像对应的特征图进行降维处理,得到降维后特征图;
    对降维后特征图进行多尺度的特征提取,得到所述待处理图像在每一尺度对应的尺度图像特征;及
    将所述待处理图像在每一尺度对应的尺度图像特征进行融合,得到所述待处理图像的图像特征。
  16. 根据权利要求1所述的方法,其特征在于,所述计算所述共视图像之间的尺度差 值,包括:
    获取每一所述待处理图像对应的共视图像的尺寸信息;
    基于所述尺寸信息计算所述待处理图像之间的至少一个尺寸差值;及
    在所述尺寸差值中筛选出满足预设条件的目标尺寸差值,并将所述目标尺寸差值作为所述共视图像之间的尺度差值。
  17. 根据权利要求1所述的方法,其特征在于,所述基于所述共视特征点,对所述待处理图像对进行处理,包括:
    对所述待处理图像对中每一所述待处理图像在所述调整后共视图像中的共视特征点进行特征点匹配,得到匹配后共视特征点;
    基于所述尺度差值以及所述调整后共视图像的尺寸信息,在所述待处理图像中确定所述匹配后共视特征点对应的源特征点;及
    基于所述源特征点,对所述待处理图像对进行处理。
  18. 一种图像处理装置,其特征在于,包括:
    获取单元,用于获取待处理图像对,并对所述待处理图像对中的待处理图像进行图像特征提取,得到所述待处理图像的图像特征;
    提取单元,用于在所述图像特征中提取出所述待处理图像对的关联特征,所述关联特征用于表征所述待处理图像对中的待处理图像之间的相互信息;
    识别单元,用于根据所述关联特征,在所述待处理图像中识别出共视区域的共视图像,并计算所述共视图像之间的尺度差值;
    调整单元,用于基于所述尺度差值,对所述共视图像的尺寸进行调整,得到调整后共视图像;
    处理单元,用于在每一所述调整后共视图像中提取出至少一个共视特征点,并基于所述共视特征点,对所述待处理图像对进行处理。
  19. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机可读指令,其特征在于,所述处理器执行所述计算机可读指令时实现权利要求1至17中任一项所述的方法的步骤。
  20. 一种计算机可读存储介质,其上存储有计算机可读指令,其特征在于,所述计算机可读指令被处理器执行时实现权利要求1至17中任一项所述的方法的步骤。
    所述的方法所述的方法21、一种计算机程序产品,包括计算机可读指令,其特征在于,所述计算机可读指令被处理器执行时实现权利要求1至17中任一项所述的方法的步骤。
PCT/CN2022/131464 2022-01-25 2022-11-11 图像处理方法、装置和计算机可读存储介质 WO2023142602A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/333,091 US20230326173A1 (en) 2022-01-25 2023-06-12 Image processing method and apparatus, and computer-readable storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210088988.6 2022-01-25
CN202210088988.6A CN114445633A (zh) 2022-01-25 2022-01-25 图像处理方法、装置和计算机可读存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/333,091 Continuation US20230326173A1 (en) 2022-01-25 2023-06-12 Image processing method and apparatus, and computer-readable storage medium

Publications (1)

Publication Number Publication Date
WO2023142602A1 true WO2023142602A1 (zh) 2023-08-03

Family

ID=81369789

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/131464 WO2023142602A1 (zh) 2022-01-25 2022-11-11 图像处理方法、装置和计算机可读存储介质

Country Status (3)

Country Link
US (1) US20230326173A1 (zh)
CN (1) CN114445633A (zh)
WO (1) WO2023142602A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220237838A1 (en) * 2021-01-27 2022-07-28 Nvidia Corporation Image synthesis using one or more neural networks
CN114445633A (zh) * 2022-01-25 2022-05-06 腾讯科技(深圳)有限公司 图像处理方法、装置和计算机可读存储介质
CN117115583B (zh) * 2023-08-09 2024-04-02 广东工业大学 基于交叉融合注意力机制的危险品检测方法及装置
CN117115571B (zh) * 2023-10-25 2024-01-26 成都阿加犀智能科技有限公司 一种细粒度智能商品识别方法、装置、设备及介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160124209A1 (en) * 2013-07-25 2016-05-05 Olympus Corporation Image processing device, image processing method, microscope system, and computer-readable recording medium
JP2018005435A (ja) * 2016-06-30 2018-01-11 株式会社 日立産業制御ソリューションズ 画像処理装置および画像処理方法
CN110399799A (zh) * 2019-06-26 2019-11-01 北京迈格威科技有限公司 图像识别和神经网络模型的训练方法、装置和系统
CN112232258A (zh) * 2020-10-27 2021-01-15 腾讯科技(深圳)有限公司 一种信息处理方法、装置及计算机可读存储介质
CN112967330A (zh) * 2021-03-23 2021-06-15 之江实验室 一种结合SfM和双目匹配的内窥图像三维重建方法
CN114445633A (zh) * 2022-01-25 2022-05-06 腾讯科技(深圳)有限公司 图像处理方法、装置和计算机可读存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160124209A1 (en) * 2013-07-25 2016-05-05 Olympus Corporation Image processing device, image processing method, microscope system, and computer-readable recording medium
JP2018005435A (ja) * 2016-06-30 2018-01-11 株式会社 日立産業制御ソリューションズ 画像処理装置および画像処理方法
CN110399799A (zh) * 2019-06-26 2019-11-01 北京迈格威科技有限公司 图像识别和神经网络模型的训练方法、装置和系统
CN112232258A (zh) * 2020-10-27 2021-01-15 腾讯科技(深圳)有限公司 一种信息处理方法、装置及计算机可读存储介质
CN112967330A (zh) * 2021-03-23 2021-06-15 之江实验室 一种结合SfM和双目匹配的内窥图像三维重建方法
CN114445633A (zh) * 2022-01-25 2022-05-06 腾讯科技(深圳)有限公司 图像处理方法、装置和计算机可读存储介质

Also Published As

Publication number Publication date
CN114445633A (zh) 2022-05-06
US20230326173A1 (en) 2023-10-12

Similar Documents

Publication Publication Date Title
Zhang et al. A review of deep learning-based semantic segmentation for point cloud
WO2023142602A1 (zh) 图像处理方法、装置和计算机可读存储介质
Zeng et al. Image retrieval using spatiograms of colors quantized by gaussian mixture models
Tao et al. Manifold ranking-based matrix factorization for saliency detection
CN107871106B (zh) 人脸检测方法和装置
CN111242208A (zh) 一种点云分类方法、分割方法及相关设备
US20230154170A1 (en) Method and apparatus with multi-modal feature fusion
Kang et al. Random forest with learned representations for semantic segmentation
JP7439153B2 (ja) 全方位場所認識のためのリフトされたセマンティックグラフ埋め込み
CN112927353A (zh) 基于二维目标检测和模型对齐的三维场景重建方法、存储介质及终端
CN112861575A (zh) 一种行人结构化方法、装置、设备和存储介质
WO2023036157A1 (en) Self-supervised spatiotemporal representation learning by exploring video continuity
Li et al. Improving synthetic 3D model-aided indoor image localization via domain adaptation
Afifi et al. Object depth estimation from a single image using fully convolutional neural network
Sundaram et al. FSSCaps-DetCountNet: fuzzy soft sets and CapsNet-based detection and counting network for monitoring animals from aerial images
Yang et al. Increaco: incrementally learned automatic check-out with photorealistic exemplar augmentation
Huang et al. An object detection algorithm combining semantic and geometric information of the 3D point cloud
Wang et al. Multistage model for robust face alignment using deep neural networks
Wang et al. Salient object detection using biogeography-based optimization to combine features
CN113793370A (zh) 三维点云配准方法、装置、电子设备及可读介质
Liu et al. Attention-embedding mesh saliency
Imamoglu et al. An integration of bottom-up and top-down salient cues on RGB-D data: saliency from objectness versus non-objectness
Feng et al. Point-guided contrastive learning for monocular 3-D object detection
KR20230071052A (ko) 이미지 처리 방법 및 장치
Mohanapriya et al. A novel foreground region analysis using NCP-DBP texture pattern for robust visual tracking

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22923396

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2022923396

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2022923396

Country of ref document: EP

Effective date: 20240416