CN116030156A

CN116030156A - Iterative method of image reconstruction model and image reconstruction method

Info

Publication number: CN116030156A
Application number: CN202310161883.3A
Authority: CN
Inventors: 朱优松; 李朝闻; 陈志扬; 赵朝阳; 唐明; 王金桥
Original assignee: Wuhan Artificial Intelligence Research Institute; Institute of Automation of Chinese Academy of Science
Current assignee: Wuhan Artificial Intelligence Research Institute; Institute of Automation of Chinese Academy of Science
Priority date: 2023-02-24
Filing date: 2023-02-24
Publication date: 2023-04-28
Anticipated expiration: 2043-02-24
Also published as: CN116030156B

Abstract

The invention relates to the technical field of image processing, and provides an iteration method and an image reconstruction method of an image reconstruction model, wherein the iteration method of the image reconstruction model comprises the following steps: masking is carried out based on the original image, so that a plurality of masking images are obtained; reconstructing mask areas in each mask image based on an initial image reconstruction model to obtain reconstructed images corresponding to each mask image; the method comprises the steps of determining the overlapping area between two reconstructed images in each reconstructed image, carrying out parameter iteration on an initial image reconstruction model based on the feature similarity between the area features of the overlapping areas in the two reconstructed images to obtain an image reconstruction model, solving the problem that the model in the traditional scheme has high uncertainty and inconsistency, enabling the overlapping areas between different reconstructed images to be kept consistent through a self-consistent mechanism, improving the training efficiency of the model, and optimizing the prediction accuracy of the model.

Description

Iterative method of image reconstruction model and image reconstruction method

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to an iteration method of an image reconstruction model and an image reconstruction method.

Background

As computer vision technology continues to develop, the acquisition of unlabeled data becomes easier. However, if the label is manually marked on massive label-free data, a great deal of time and energy are consumed, and the label is extremely easy to be misplaced and missed. The self-monitoring algorithm can perform model training without providing tagged data, and has obvious advantages in various aspects compared with the traditional supervised learning, such as avoiding the problems of supervision bias and long tail.

Currently, visual self-attention models perform well in a variety of computer vision tasks. However, it has been found by observation that a high mask rate can cause serious problems in that the model is highly uncertain and inconsistent, and in short, a high mask rate can introduce unreliable features, and thus, the output result of the model for the same mask tile is inconsistent with respect to different mask settings based on the original image.

Disclosure of Invention

The invention provides an iteration method and an image reconstruction method of an image reconstruction model, which are used for solving the defects that the model has high uncertainty and inconsistency in the prior art and realizing the improvement of model training efficiency and prediction accuracy.

The invention provides an iteration method of an image reconstruction model, which comprises the following steps:

masking is carried out based on the original image, so that a plurality of masking images are obtained;

reconstructing mask areas in each mask image based on an initial image reconstruction model to obtain reconstructed images corresponding to each mask image;

and determining the overlapping area between every two reconstructed images in each reconstructed image, and carrying out parameter iteration on the initial image reconstruction model based on the feature similarity between the area features of the overlapping areas in the two reconstructed images to obtain an image reconstruction model.

According to the iteration method of the image reconstruction model provided by the invention, the initial image reconstruction model is subjected to parameter iteration based on the feature similarity between the region features of the overlapping region in the two-by-two reconstructed images to obtain the image reconstruction model, and the method comprises the following steps:

determining the consistency loss of the initial image reconstruction model based on the feature similarity between the region features of the overlapping regions in the pairwise reconstructed images;

determining a reconstruction loss of the initial image reconstruction model based on the reconstruction regions in the respective reconstructed images and the image regions in the original image;

And carrying out parameter iteration on the initial image reconstruction model based on the reconstruction loss and the consistency loss to obtain an image reconstruction model.

According to the iterative method of the image reconstruction model provided by the invention, the reconstruction loss of the initial image reconstruction model is determined based on the reconstruction areas in each reconstructed image and the image areas in the original image, and the iterative method comprises the following steps:

determining an image area corresponding to the reconstruction area in the original image based on the reconstruction areas in the reconstruction images;

extracting regional features of a reconstruction region in each reconstruction image and regional features of an image region corresponding to the reconstruction region in the original image respectively;

and determining the reconstruction loss of the initial image reconstruction model based on the feature similarity between the regional features of the reconstruction regions in each reconstruction image and the regional features of the corresponding image regions of the reconstruction regions in the original image.

According to the iteration method of the image reconstruction model, the mask rates of the mask images are the same;

the visible regions in the respective mask images are different from each other, and the visible regions in the respective mask images constitute the original image.

According to the iteration method of the image reconstruction model provided by the invention, the initial image reconstruction model is constructed on the basis of a mask self-encoder and a self-consistent layer;

the self-consistent mechanism is used for determining the overlapping area between the two reconstructed images and guiding the consistency between the area characteristics of the overlapping area in the two reconstructed images.

The invention also provides an image reconstruction method, which comprises the following steps:

determining an image to be reconstructed;

extracting features of the image to be reconstructed based on a coding layer in an image reconstruction model to obtain image features of the image to be reconstructed, wherein the image reconstruction model is determined based on the iterative method of the image reconstruction model according to any one of the above;

and reconstructing the image to be reconstructed based on the image characteristics.

The invention also provides an iteration device of the image reconstruction model, which comprises the following steps:

a masking unit for masking based on the original image to obtain a plurality of masking images;

the reconstruction unit is used for reconstructing mask areas in each mask image based on an initial image reconstruction model to obtain reconstructed images corresponding to each mask image;

the iteration unit is used for determining the overlapping area between every two reconstructed images in each reconstructed image, and carrying out parameter iteration on the initial image reconstruction model based on the feature similarity between the area features of the overlapping area in every two reconstructed images to obtain an image reconstruction model.

The present invention also provides an image reconstruction apparatus including:

an image determining unit for determining an image to be reconstructed;

the feature extraction unit is used for extracting features of the image to be reconstructed based on a coding layer in an image reconstruction model to obtain image features of the image to be reconstructed, and the image reconstruction model is determined based on the iterative method of the image reconstruction model;

and the image reconstruction unit is used for reconstructing the image to be reconstructed based on the image characteristics.

The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing an iterative method of an image reconstruction model as described in any one of the above or an image reconstruction method as described in the above when executing the program.

The invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements an iterative method of an image reconstruction model as described in any of the above, or an image reconstruction method as described above.

The iteration method and the image reconstruction method of the image reconstruction model provided by the invention mask the original image to obtain a plurality of mask images; reconstructing mask areas in each mask image according to the initial image reconstruction model to obtain reconstructed images corresponding to each mask image; the method comprises the steps of determining the overlapping area between two reconstructed images in each reconstructed image, carrying out parameter iteration on an initial image reconstruction model based on the feature similarity between the area features of the overlapping areas in the two reconstructed images to obtain an image reconstruction model, solving the problem that the model in the traditional scheme has high uncertainty and inconsistency, enabling the overlapping areas between different reconstructed images to be kept consistent through a self-consistent mechanism, improving the training efficiency of the model, and optimizing the prediction accuracy of the model.

Drawings

In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of an iterative method of an image reconstruction model provided by the invention;

FIG. 2 is a general framework diagram of an iterative method of image reconstruction model provided by the present invention;

FIG. 3 is a schematic flow chart of an image reconstruction method provided by the invention;

FIG. 4 is a schematic structural diagram of an iterative apparatus of an image reconstruction model provided by the present invention;

FIG. 5 is a schematic view of an image reconstruction device according to the present invention;

fig. 6 is a schematic structural diagram of an electronic device provided by the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

As computer vision technology continues to develop, the acquisition of unlabeled data becomes easier. However, if the label is manually marked on massive label-free data, a great deal of time and energy are consumed, and the label is extremely easy to be misplaced and missed. The self-supervision algorithm can perform model training without providing data with labels, and can provide various tasks in the field of computer vision for pre-training the model; and compared with the traditional supervised algorithm, the method has obvious advantages in avoiding the problems of supervised prejudice and long tail and fitting massive business data.

Visual transformations (self-attention models) have significant advantages over traditional convolutional neural networks in building long-distance relational models, self-supervised learning, and fitting massive visual data. Currently, vision transformers have shown good results in a variety of computer vision tasks. However, high mask rates have been observed to cause significant problems, namely, high uncertainty and inconsistency in the model.

At its root, it is the high mask rate that introduces unreliable features, and thus, different combinations of visible patches sampled from the original image may produce inconsistent output results for predictions of the same mask tile, in short, for different mask settings based on the original image, the semantic big probability of the model being inconsistent for output results of the same mask tile.

In this regard, the present invention provides an iteration method of an image reconstruction model, which aims to make the overlapping areas between output results consistent through a self-consistent mechanism, thereby overcoming uncertainty and inconsistency of the model height, and improving performance and training efficiency of the model, and fig. 1 is a flow diagram of the iteration method of an image reconstruction model provided by the present invention, as shown in fig. 1, the method includes:

step 110, masking based on the original image to obtain a plurality of mask images;

in particular, the raw data required for the iteration needs to be acquired first before the model training is performed, and as applied here to the image reconstruction, corresponding to the image reconstruction model, the raw data can be understood as the raw image. Here, the original image may be a complete image under various types in various fields, for example, a person image, a landscape image, a nuclear magnetic resonance image, a CT (Computed Tomography, electronic computed tomography) image, etc., which may be acquired by an image acquisition device, may be downloaded through a network, or may be obtained by other means, and embodiments of the present invention are not limited in particular.

After the original image is obtained, the original image is further masked to obtain a plurality of mask images, that is, the original image is subjected to random masking, so that a plurality of mask images different from each other are obtained. Here, the image masking process can be understood as obtaining a plurality of mask images composed of image areas (visible areas) in the original image and masked areas (mask areas) in the original image by image cropping, tile grouping, or the like on the basis of the original image. Notably, the visible regions in each mask image are not completely overlapping.

Step 120, reconstructing mask areas in each mask image based on the initial image reconstruction model to obtain reconstructed images corresponding to each mask image;

specifically, after the step 110 is performed to obtain a plurality of mask images, step 120 may be performed to reconstruct mask areas in each mask image through an initial image reconstruction model, so as to obtain a reconstructed image corresponding to each mask image, where the specific process may include:

firstly, determining visible areas and mask areas in each mask image, wherein the visible areas correspond to image areas in an original image, and the mask areas correspond to masked areas; meanwhile, an initial image reconstruction model is required to be determined, wherein the initial image reconstruction model can be constructed by combining a self-consistent mechanism on the basis of a traditional mask image model;

then, an initial image reconstruction model may be applied to reconstruct the mask areas in each mask image, so as to obtain reconstructed images corresponding to each mask image, specifically, the mask images are input into the initial image reconstruction model, the initial image reconstruction model predicts the mask areas in the mask images according to the visible areas in the input mask images, so as to obtain reconstructed images corresponding to the mask areas, that is, the mask areas are reconstructed through the extracted area features of the visible areas, so as to obtain reconstructed images output by the initial image reconstruction model.

It is noted that the reconstructed image output by the model here is not a complete image, but an image containing only the reconstructed region obtained by reconstructing the mask region, in short, only the reconstructed region corresponding to the mask region, and no visible region in the mask image.

And 130, determining an overlapping region between two reconstructed images in each reconstructed image, and carrying out parameter iteration on the initial image reconstruction model based on the feature similarity between the region features of the overlapping region in the two reconstructed images to obtain an image reconstruction model.

Specifically, after obtaining the reconstructed images corresponding to each mask image, the overlapping area between every two reconstructed images can be determined, and according to the feature similarity between the area features of the overlapping area, model training is performed to obtain an image reconstruction model, and the specific process includes:

because artificial intelligence is equivalent to a self-consistent system, which is conducive to efficient learning and error correction, a self-consistent mechanism can be introduced herein to improve training efficiency and consistency of the model, and in particular, the self-consistent mechanism encourages consistency of overlapping regions in output results of model prediction for different inputs, i.e., the distance between the overlapping regions can be shortened by a loss function, so that the overlapping regions in the output results of model output remain as consistent as possible.

In view of this, in the embodiment of the present invention, the process of parameter iteration of the initial image reconstruction model may first determine, by a self-consistent mechanism, an overlapping region between two reconstructed images, that is, an overlapping region between two reconstructed images, and specifically may determine, from each reconstructed image, a plurality of image groups composed of two reconstructed images, and determine an overlapping region between the reconstructed regions of two reconstructed images in each image group.

Then, the region characteristics of the overlapped regions in the two reconstructed images can be extracted, namely, the feature extraction can be carried out on the overlapped regions of the two reconstructed images in each image group, and the image information of the overlapped regions is extracted, so that the region characteristics of the overlapped regions of the two overlapped images in each image group are obtained. Because the semantic information is mainly extracted in the feature extraction process, the extracted regional features can also be called as semantic features of overlapping regions;

then, calculating the feature similarity of the overlapping region in each image group according to the region features of the overlapping region of the two overlapping images, namely calculating the feature similarity between the region features of the overlapping region in the two overlapping images, wherein the feature similarity can be the semantic similarity between semantic features measured by cosine similarity, euclidean distance, min Shi distance and the like;

And then, carrying out parameter iteration on the initial image reconstruction model according to the feature similarity between the region features of the overlapped regions in the two reconstructed images, thereby obtaining an image reconstruction model, specifically, carrying out parameter adjustment on the initial image reconstruction model according to the feature similarity between the region features of the overlapped regions in the two reconstructed images, so that the feature similarity between the region features of the overlapped regions in the different reconstructed images correspondingly output by the adjusted initial image reconstruction model is as high as possible when facing different mask images, and in short, enabling the overlapped regions in the different reconstructed images output by the model to be as consistent as possible, and finally obtaining the image reconstruction model after training.

In the embodiment of the invention, a self-consistent mechanism is introduced and the self-consistency of the overlapped areas in different reconstructed images is utilized to train the model, so that the prediction of the model on the overlapped areas is consistent as much as possible, the error between the output results of the model on the same mask area can be greatly reduced by pulling the distance between the area features of the overlapped areas in different reconstructed images, the uncertainty and the inconsistency of the model are reduced, and the model training efficiency and the prediction accuracy are improved.

According to the iteration method of the image reconstruction model, masking is carried out on an original image to obtain a plurality of masking images; reconstructing mask areas in each mask image according to the initial image reconstruction model to obtain reconstructed images corresponding to each mask image; the method comprises the steps of determining the overlapping area between two reconstructed images in each reconstructed image, carrying out parameter iteration on an initial image reconstruction model based on the feature similarity between the area features of the overlapping areas in the two reconstructed images to obtain an image reconstruction model, solving the problem that the model in the traditional scheme has high uncertainty and inconsistency, enabling the overlapping areas between different reconstructed images to be kept consistent through a self-consistent mechanism, improving the training efficiency of the model, and optimizing the prediction accuracy of the model.

Based on the above embodiment, in step 130, the parameter iteration is performed on the initial image reconstruction model based on the feature similarity between the region features of the overlapping regions in the two reconstructed images, so as to obtain the image reconstruction model, which includes:

determining the consistency loss of an initial image reconstruction model based on the feature similarity between the region features of the overlapping regions in the pairwise reconstructed images;

Determining the reconstruction loss of an initial image reconstruction model based on the reconstruction regions in each reconstructed image and the image regions in the original image;

and carrying out parameter iteration on the initial image reconstruction model based on the reconstruction loss and the consistency loss to obtain the image reconstruction model.

Specifically, in step 130, according to the feature similarity between the region features of the overlapping regions in the two reconstructed images, the process of performing parameter iteration on the initial image reconstruction model to obtain the image reconstruction model includes the following steps:

firstly, self-consistent loss in the model training process can be determined according to the feature similarity between the region features of the overlapped regions in the two reconstructed images, namely, the loss of the overlapped regions in different reconstructed images on the semantic level can also be called semantic consistency loss, namely, the consistency loss of the initial image reconstruction model on the overlapped regions on the semantic level is calculated by utilizing the feature similarity between the region features of the overlapped regions in each image group;

meanwhile, the reconstruction loss of the initial image reconstruction model, namely the loss of the reconstruction process of the mask region in each mask image by the initial image reconstruction model, can be measured according to each reconstruction image output by the initial image reconstruction model and the original image, and the reconstruction loss can be measured through the reconstruction region in the reconstruction image and the image region in the original image because the reconstruction image only contains the reconstruction region corresponding to the mask region.

Specifically, from local, the reconstruction region in the reconstructed image and the image region corresponding to the reconstruction region in the original image can be directly utilized to measure the reconstruction loss of the initial image reconstruction model; the reconstruction loss can be measured from the whole level by considering not only the consistency between the reconstruction region and the corresponding image region but also the naturalness, the engagement degree, the harmony and the like between the reconstruction region and the surrounding visible region from the global point of view, namely, the mask images corresponding to the reconstruction loss can be fused on the basis of the reconstructed image to obtain a complete image, and then the reconstruction loss of the original image reconstruction model is measured by virtue of the complete image and the original image.

And then, carrying out parameter iteration on the initial image reconstruction model according to the reconstruction loss and the consistency loss to obtain an image reconstruction model, specifically, calculating the joint training loss of the initial image reconstruction model through the reconstruction loss and the consistency loss, and adjusting the parameters of the initial image reconstruction model based on the joint training loss, so that the feature similarity between the adjusted model and the region features of the overlapped region in the output reconstruction image is as high as possible for different mask images, and the output reconstruction image is as close to the original image as possible, thereby obtaining the trained image reconstruction model.

The combined training is performed by using a plurality of loss functions, so that the performance of the model can be greatly improved, the feature extraction capacity and the image reconstruction capacity of the image reconstruction model obtained by training are better, and the model parameters are adjusted according to the reconstruction loss, so that the output result of the model prediction is more approximate to a real original image.

In the embodiment of the invention, the loss in the training process of the initial image reconstruction model is determined from two different layers, the parameter adjustment is carried out according to the loss, the optimization of the model performance is realized from different angles, and the superposition of multiple optimizations can essentially improve the prediction capability of the image reconstruction model obtained by training, accelerate the model training process and improve the prediction accuracy.

Based on the above embodiment, determining a reconstruction loss of an initial image reconstruction model based on a reconstruction region in each reconstructed image and an image region in an original image, comprises:

determining an image area corresponding to the reconstruction area in the original image based on the reconstruction areas in each reconstruction image;

extracting the regional characteristics of the reconstruction regions in each reconstruction image and the regional characteristics of the corresponding image regions of the reconstruction regions in the original image respectively;

And determining the reconstruction loss of the initial image reconstruction model based on the feature similarity between the regional features of the reconstruction region in each reconstruction image and the regional features of the corresponding image region of the reconstruction region in the original image.

Specifically, the process of determining the reconstruction loss of the initial image reconstruction model according to the reconstruction region in each reconstructed image and the image region in the original image specifically includes:

firstly, determining corresponding image areas of reconstruction areas in each reconstruction image in an original image, namely positioning the corresponding image areas in the original image by taking the reconstruction areas in each reconstruction image as a reference, so as to obtain the corresponding image areas of the reconstruction areas in the original image;

then, the region characteristics of the reconstruction regions in each reconstruction image and the region characteristics of the image regions corresponding to the reconstruction regions in the original image can be extracted respectively, specifically, the feature extraction is performed on the reconstruction regions in each reconstruction image and the image regions corresponding to the reconstruction regions in the original image, and the image information (semantic information) of the corresponding regions is extracted, so that the region characteristics of the reconstruction regions in each reconstruction image and the region characteristics of the image regions corresponding to the reconstruction regions in the original image are obtained;

Then, the feature similarity between the region features of the reconstructed region in each reconstructed image and the region features of the corresponding image region in the original image can be determined, specifically, according to the region features of the reconstructed region in each reconstructed image and the region features of the corresponding image region in the original image, distance measurement is performed, that is, the similarity of the region features on the semantic level is measured through cosine similarity, euclidean distance, min Shi distance and the like between the region features of the reconstructed region and the region features of the corresponding image region, so as to obtain the feature similarity between the region features of the reconstructed region and the region features of the corresponding image region;

and then, determining the reconstruction loss of the initial image reconstruction model according to the feature similarity between the region features of the reconstruction region in each reconstruction image and the region features of the corresponding image region in the original image, namely measuring the reconstruction loss of the initial image reconstruction model in the mask region in each mask image according to the feature similarity between the region features of the reconstruction region and the region features of the corresponding image region, so as to obtain the reconstruction loss.

Based on the above embodiment, the mask rates of the respective mask images are the same;

the visible regions in the respective mask images are different from each other, and the visible regions in the respective mask images constitute an original image.

In the training process of the model, the high mask rate not only causes high uncertainty and inconsistency of the model, but also causes low utilization rate of images in the training process, so that the training efficiency of the model is low.

Here, the initial models of MAE (Masked Autoencoders, mask self encoder) and BERT (Bidirectional Encoder Representation from Transformers) as MIM (Masked Image Model, mask image model) and MLM (Masked Language Model, mask language model) are exemplified:

in the model training process, MAE only uses 25% of the whole image to train the model, in contrast to BERT which uses 85% of text corpus to train the model. Due to the insufficient data utilization rate of MIM, the training wheel number is about 40 times higher than that of MLM (1600 epochs in the former and 40 epochs in the latter), thus the training efficiency of MIM is far lower than that of MLM, and the training wheel number is too much. This is due not only to the difference in information density between the image and the language, but also to the difference in utilization of training data by the training process of MLM and MIM.

In view of this, in the embodiment of the present invention, it is proposed to perform model training by using the whole data, so as to improve the training efficiency of the model and accelerate the convergence rate of the model, that is, to apply the complete original image to participate in model training, to gradually divide the original image into a plurality of non-overlapping portions, each of which is generated by a random mask and has the same mask rate. The MIM task then reconstructs all mask images in parallel in one iteration and outputs reconstructed images.

When the random mask rate of MIM is

At the time, the data utilization rate is +.>

The method comprises the steps of carrying out a first treatment on the surface of the In short, the higher the random masking rate, the lower the data utilization, and an excessively low data utilization may result in insufficient training of the data, and the training efficiency is low. For example, for MAE and BERT, if the same +.>

Training the wheel, the ratio of MLM to MIM data utilization is +.>

. Furthermore, training a 1600 round model consumes a significant amount of time and resources. Therefore, in order to reduce training time and improve training efficiency, in the embodiment of the invention, the whole original image is selected to participate in training, so that the image area in the original image is fully utilized, and the data utilization rate is improved. />

Specifically, the original image may be image cropped to divide the complete original image into a plurality of tiles, which may then be randomly divided into non-overlapping tiles in accordance with the non-subsampling principle

Individual partThe mask rates of the various parts are the same, so that all the image blocks in the original image can be ensured to be taken, in short, the visible areas in the mask images generated by the random mask are different from each other, and the complete original image can be formed by the visible areas in the mask images.

After that, an initial image reconstruction model may be applied to reconstruct the mask regions in each mask image to obtain reconstructed images corresponding to each mask image, and specifically, each portion may be input in parallel to the initial image reconstruction model to obtain an output result. Wherein each mask image of the input can be expressed as

The respective reconstructed image output can then be expressed as +.>

。

While for each mask image input

Data, which can be obtained by means of an initial image reconstruction model +.>

And any two reconstructed images will overlap with a ratio of +.>

. For any two mask images, where the mask area may be represented as 1 and the visible area as 0, the overlap area may be obtained by the following formula:

；

in the method, in the process of the invention,

and->

Mask images respectively represented in matrix form +. >

And mask image->

0 in the matrix represents the visible region in the corresponding mask image, and 1 represents the mask region; />

Representation->

And->

Overlapping areas between.

While

And->

The self-consistent loss between them can be expressed by the following formula:

；

in the method, in the process of the invention,

representation->

And->

Self-consistent loss between->

And->

Respectively->

And->

Corresponding output result (reconstructed image),>

the stop gradient is indicated, and for the output result of any mask image, it is calculated K-2 times with the output results of other mask images.

Further, the self-consistent loss (consistency loss) of the initial image reconstruction model is expressed as:

；

in the method, in the process of the invention,

self-consistent loss representing an initial image reconstruction model,/->

Representing the original image +.>

Is the number of mask images.

Based on the above embodiment, the initial image reconstruction model is constructed on the basis of the mask self-encoder and the self-consistent layer;

Specifically, in the embodiment of the present invention, the initial image reconstruction model may be constructed based on a mask self-encoder (MAE) and a self-consistent mechanism, that is, based on the mask self-encoder, the initial image reconstruction model is constructed by combining the self-consistent mechanism.

The self-consistent mechanism can determine the overlapping area between every two reconstructed images in each reconstructed image, and can pull the distance between the area characteristics of the overlapping area in every two reconstructed images, so that the prediction of the overlapping area in every two reconstructed images is consistent as much as possible.

Based on the above embodiment, fig. 2 is a general frame diagram of an iterative method of an image reconstruction model provided by the present invention, as shown in fig. 2, the method includes:

firstly, masking is carried out based on an original image, so that a plurality of masking images are obtained; the mask rate of each mask image is the same here; the visible areas in the mask images are different from each other, and constitute an original image;

then, reconstructing mask areas in each mask image based on the initial image reconstruction model to obtain reconstructed images corresponding to each mask image;

and then, determining the overlapping area between every two reconstructed images in each reconstructed image, and carrying out parameter iteration on the initial image reconstruction model based on the feature similarity between the area features of the overlapping areas in the two reconstructed images to obtain the image reconstruction model.

Based on the feature similarity between the region features of the overlapping regions in the two reconstructed images, performing parameter iteration on the initial image reconstruction model to obtain an image reconstruction model, wherein the method comprises the following steps: determining the consistency loss of an initial image reconstruction model based on the feature similarity between the region features of the overlapping regions in the pairwise reconstructed images; determining the reconstruction loss of an initial image reconstruction model based on the reconstruction regions in each reconstructed image and the image regions in the original image; and carrying out parameter iteration on the initial image reconstruction model based on the reconstruction loss and the consistency loss to obtain the image reconstruction model.

Further, determining a reconstruction loss of the initial image reconstruction model based on the reconstruction regions in the respective reconstructed images and the image regions in the original image, comprising: determining an image area corresponding to the reconstruction area in the original image based on the reconstruction areas in each reconstruction image; extracting the regional characteristics of the reconstruction regions in each reconstruction image and the regional characteristics of the corresponding image regions of the reconstruction regions in the original image respectively; and determining the reconstruction loss of the initial image reconstruction model based on the feature similarity between the regional features of the reconstruction region in each reconstruction image and the regional features of the corresponding image region of the reconstruction region in the original image.

The initial image reconstruction model is constructed on the basis of a mask self-encoder and a self-consistent mechanism; and the self-consistent mechanism is used for determining the overlapping area between the two reconstructed images and guiding the consistency between the area characteristics of the overlapping area in the two reconstructed images.

It should be noted that the image reconstruction model obtained by training herein can be understood as a high-efficiency mask automatic encoder (effective mask AutoEncoders, EMAE) with self-consistency, which learns visual characterization by using a large data set (such as ImageNet), and can improve the training efficiency of the model and simultaneously optimize the performance of the model to a great extent and reduce the inconsistency and uncertainty thereof through the utilization of a complete original image and the incorporation of a self-consistency mechanism.

Furthermore, on the public dataset ImageNet, the EMAE can reach the highest accuracy in the mask image model with only 300 training rounds using the ViT-Base (Vision Transformer-Base) structure. Furthermore, EMAE has always had optimal transfer performance over various downstream tasks (object detection, semantic segmentation), which enables more efficient use of data and reliable visual characterization.

The method provided by the embodiment of the invention carries out mask masking on the original image to obtain a plurality of mask images; reconstructing mask areas in each mask image according to the initial image reconstruction model to obtain reconstructed images corresponding to each mask image; the method comprises the steps of determining the overlapping area between two reconstructed images in each reconstructed image, carrying out parameter iteration on an initial image reconstruction model based on the feature similarity between the area features of the overlapping areas in the two reconstructed images to obtain an image reconstruction model, solving the problem that the model in the traditional scheme has high uncertainty and inconsistency, enabling the overlapping areas between different reconstructed images to be kept consistent through a self-consistent mechanism, improving the training efficiency of the model, and optimizing the prediction accuracy of the model.

The invention also provides an image reconstruction method, fig. 3 is a schematic flow chart of the image reconstruction method provided by the invention, as shown in fig. 3, the method comprises the following steps:

step 310, determining an image to be reconstructed;

step 320, extracting features of the image to be reconstructed based on the coding layer in the image reconstruction model to obtain image features of the image to be reconstructed, wherein the image reconstruction model is determined based on the iterative method of the image reconstruction model as described in any one of the above;

step 330, reconstructing an image to be reconstructed based on the image features.

Specifically, before image reconstruction, an image to be reconstructed, that is, an image to be reconstructed, is first determined, where the image to be reconstructed may be a complete image in various fields, for example, a person image, a landscape image, a nuclear magnetic resonance image, a CT image, etc., which may be acquired by an image acquisition device, may also be acquired by downloading through a network, or may be acquired by other means, and embodiments of the present invention are not limited specifically.

Then, the coding layer in the image reconstruction model can be applied to perform feature extraction on the image to be reconstructed to extract the image information of the image to be reconstructed, so as to obtain the visual representation of the image to be reconstructed, namely the image features of the image to be reconstructed, specifically, the parameters of the coding layer in the image reconstruction model trained in advance are directly loaded, the image to be reconstructed is input into the coding layer, the coding layer can perform feature extraction on the input image to be reconstructed to extract the visual representation of the image to be reconstructed, and finally the image features of the image to be reconstructed output by the coding layer can be obtained.

It should be noted that, before the image to be reconstructed is input into the coding layer, an image reconstruction model may be obtained by training in advance, and the iterative process of the image reconstruction model specifically includes:

firstly, determining an original image, and masking based on the original image to obtain a plurality of mask images; then, reconstructing mask areas in each mask image based on the initial image reconstruction model to obtain reconstructed images corresponding to each mask image; and then, determining the overlapping area between every two reconstructed images in each reconstructed image, and carrying out parameter iteration on the initial image reconstruction model based on the feature similarity between the area features of the overlapping areas in the two reconstructed images, thereby obtaining the image reconstruction model with the iteration completed.

Then, the image to be reconstructed can be reconstructed according to the image characteristics to obtain a reconstructed image corresponding to the image to be reconstructed, specifically, the image reconstruction task is performed by taking the image characteristics of the image to be reconstructed as a reference, so as to obtain the reconstructed image corresponding to the image to be reconstructed.

It is noted that the trained image reconstruction model herein, wherein the coding layer can be applied to other tasks in the visual field besides the image reconstruction task, such as image classification, object detection, semantic segmentation, etc.

The image reconstruction method provided by the invention is used for determining the image to be reconstructed, extracting the characteristics of the image to be reconstructed based on the coding layer in the image reconstruction model to obtain the image characteristics of the image to be reconstructed, wherein the image reconstruction model is determined based on the iterative method of the image reconstruction model; based on image characteristics, reconstructing an image to be reconstructed, directly loading an encoding layer in an image reconstruction model with iteration completed, and extracting characteristics by applying the encoding layer, so that accuracy and accuracy of visual characterization obtained by extraction can be ensured, performances of various tasks in the visual field are improved, processes of the various tasks are accelerated, and execution efficiency is improved.

The following describes an iteration apparatus of an image reconstruction model provided by the present invention, and the iteration apparatus of the image reconstruction model described below and the iteration method of the image reconstruction model described above may be referred to correspondingly with each other.

Fig. 4 is a schematic structural diagram of an iteration apparatus of an image reconstruction model provided by the present invention, as shown in fig. 4, the apparatus includes:

a masking unit 410, configured to mask based on the original image, to obtain a plurality of mask images;

A reconstruction unit 420, configured to reconstruct mask areas in each mask image based on an initial image reconstruction model, so as to obtain reconstructed images corresponding to each mask image;

the iteration unit 430 is configured to determine an overlapping region between two reconstructed images in each reconstructed image, and perform parameter iteration on the initial image reconstruction model based on feature similarity between region features of the overlapping region in the two reconstructed images, so as to obtain an image reconstruction model.

The iteration device of the image reconstruction model provided by the invention masks an original image to obtain a plurality of mask images; reconstructing mask areas in each mask image according to the initial image reconstruction model to obtain reconstructed images corresponding to each mask image; the method comprises the steps of determining the overlapping area between two reconstructed images in each reconstructed image, carrying out parameter iteration on an initial image reconstruction model based on the feature similarity between the area features of the overlapping areas in the two reconstructed images to obtain an image reconstruction model, solving the problem that the model in the traditional scheme has high uncertainty and inconsistency, enabling the overlapping areas between different reconstructed images to be kept consistent through a self-consistent mechanism, improving the training efficiency of the model, and optimizing the prediction accuracy of the model.

Based on the above embodiment, the iteration unit 430 is configured to:

Based on the above embodiment, the initial image reconstruction model is constructed on the basis of a mask self-encoder and a self-consistent layer;

The image reconstruction device provided by the present invention will be described below, and the image reconstruction device described below and the image reconstruction method described above may be referred to correspondingly to each other.

Fig. 5 is a schematic structural diagram of an image reconstruction apparatus according to the present invention, as shown in fig. 5, the apparatus includes:

an image determining unit 510 for determining an image to be reconstructed;

the feature extraction unit 520 is configured to perform feature extraction on the image to be reconstructed based on a coding layer in an image reconstruction model, so as to obtain image features of the image to be reconstructed, where the image reconstruction model is determined based on the iterative method of the image reconstruction model as set forth in any one of the above;

An image reconstruction unit 530, configured to reconstruct the image to be reconstructed based on the image features.

The image reconstruction device provided by the invention determines an image to be reconstructed, performs feature extraction on the image to be reconstructed based on a coding layer in an image reconstruction model to obtain the image features of the image to be reconstructed, wherein the image reconstruction model is determined based on the iterative method of the image reconstruction model; based on image characteristics, reconstructing an image to be reconstructed, directly loading an encoding layer in an image reconstruction model with iteration completed, and extracting characteristics by applying the encoding layer, so that accuracy and accuracy of visual characterization obtained by extraction can be ensured, performances of various tasks in the visual field are improved, processes of the various tasks are accelerated, and execution efficiency is improved.

Fig. 6 illustrates a physical schematic diagram of an electronic device, as shown in fig. 6, which may include: processor 610, communication interface (Communications Interface) 620, memory 630, and communication bus 640, wherein processor 610, communication interface 620, and memory 630 communicate with each other via communication bus 640. The processor 610 may invoke logic instructions in the memory 630 to perform an iterative method of an image reconstruction model or an image reconstruction method, wherein the iterative method of an image reconstruction model comprises: masking is carried out based on the original image, so that a plurality of masking images are obtained; reconstructing mask areas in each mask image based on an initial image reconstruction model to obtain reconstructed images corresponding to each mask image; and determining the overlapping area between every two reconstructed images in each reconstructed image, and carrying out parameter iteration on the initial image reconstruction model based on the feature similarity between the area features of the overlapping areas in the two reconstructed images to obtain an image reconstruction model. The image reconstruction method comprises the following steps: determining an image to be reconstructed; extracting features of the image to be reconstructed based on a coding layer in an image reconstruction model to obtain image features of the image to be reconstructed, wherein the image reconstruction model is determined based on the iterative method of the image reconstruction model according to any one of the above; and reconstructing the image to be reconstructed based on the image characteristics.

Further, the logic instructions in the memory 630 may be implemented in the form of software functional units and stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform an iterative method or an image reconstruction method of an image reconstruction model provided by the above methods, wherein the iterative method of an image reconstruction model comprises: masking is carried out based on the original image, so that a plurality of masking images are obtained; reconstructing mask areas in each mask image based on an initial image reconstruction model to obtain reconstructed images corresponding to each mask image; and determining the overlapping area between every two reconstructed images in each reconstructed image, and carrying out parameter iteration on the initial image reconstruction model based on the feature similarity between the area features of the overlapping areas in the two reconstructed images to obtain an image reconstruction model. The image reconstruction method comprises the following steps: determining an image to be reconstructed; extracting features of the image to be reconstructed based on a coding layer in an image reconstruction model to obtain image features of the image to be reconstructed, wherein the image reconstruction model is determined based on the iterative method of the image reconstruction model according to any one of the above; and reconstructing the image to be reconstructed based on the image characteristics.

In yet another aspect, the present invention further provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform an iterative method or an image reconstruction method of an image reconstruction model provided by the above methods, wherein the iterative method of the image reconstruction model comprises: masking is carried out based on the original image, so that a plurality of masking images are obtained; reconstructing mask areas in each mask image based on an initial image reconstruction model to obtain reconstructed images corresponding to each mask image; and determining the overlapping area between every two reconstructed images in each reconstructed image, and carrying out parameter iteration on the initial image reconstruction model based on the feature similarity between the area features of the overlapping areas in the two reconstructed images to obtain an image reconstruction model. The image reconstruction method comprises the following steps: determining an image to be reconstructed; extracting features of the image to be reconstructed based on a coding layer in an image reconstruction model to obtain image features of the image to be reconstructed, wherein the image reconstruction model is determined based on the iterative method of the image reconstruction model according to any one of the above; and reconstructing the image to be reconstructed based on the image characteristics.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. An iterative method of an image reconstruction model, comprising:

2. The iterative method of image reconstruction model according to claim 1, wherein the performing parameter iteration on the initial image reconstruction model based on feature similarity between region features of overlapping regions in two-by-two reconstructed images to obtain an image reconstruction model comprises:

3. The iterative method of image reconstruction model according to claim 2, wherein said determining a reconstruction loss of the initial image reconstruction model based on the reconstruction regions in the respective reconstructed images and the image regions in the original image comprises:

4. An iterative method of an image reconstruction model according to any one of claims 1 to 3, wherein the mask rates of the respective mask images are the same;

5. An iterative method of an image reconstruction model according to any one of claims 1 to 3, wherein the initial image reconstruction model is constructed on the basis of a masked self-encoder and a self-consistent mechanism;

6. An image reconstruction method, comprising:

determining an image to be reconstructed;

performing feature extraction on the image to be reconstructed based on a coding layer in an image reconstruction model to obtain image features of the image to be reconstructed, wherein the image reconstruction model is determined based on the iterative method of the image reconstruction model according to any one of claims 1 to 5;

7. An iterative apparatus for reconstructing an image model, comprising:

8. An image reconstruction apparatus, comprising:

an image determining unit for determining an image to be reconstructed;

a feature extraction unit, configured to perform feature extraction on the image to be reconstructed based on an encoding layer in an image reconstruction model, to obtain image features of the image to be reconstructed, where the image reconstruction model is determined based on the iterative method of the image reconstruction model according to any one of claims 1 to 5;

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements an iterative method of an image reconstruction model according to any one of claims 1 to 5 or an image reconstruction method according to claim 6 when the program is executed.

10. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor implements an iterative method of an image reconstruction model according to any one of claims 1 to 5 or an image reconstruction method according to claim 6.