CN115661603B - Image generation method based on modeless layout completion - Google Patents

Image generation method based on modeless layout completion Download PDF

Info

Publication number
CN115661603B
CN115661603B CN202211612018.8A CN202211612018A CN115661603B CN 115661603 B CN115661603 B CN 115661603B CN 202211612018 A CN202211612018 A CN 202211612018A CN 115661603 B CN115661603 B CN 115661603B
Authority
CN
China
Prior art keywords
frame
modeless
layout
hidden space
hidden
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211612018.8A
Other languages
Chinese (zh)
Other versions
CN115661603A (en
Inventor
吴敬宇
李泽健
孙凌云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202211612018.8A priority Critical patent/CN115661603B/en
Publication of CN115661603A publication Critical patent/CN115661603A/en
Application granted granted Critical
Publication of CN115661603B publication Critical patent/CN115661603B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses an image generation method based on modeless layout completion, which comprises the steps of classifying, combining, extracting and scaling standard frames in a modeless layout diagram to obtain a training sample, inputting the training sample into a training model to complete the standard frames to be completed, wherein the training model comprises a category hidden space module, a boundary frame hidden space module and a modal standard frame deriving module, inputting the modeless layout diagram into the modeless completion model to obtain a completed modal layout diagram through a modeless completion model trained by a loss function, and inputting the modal layout diagram into a generation model to obtain a scene image. The method can accurately generate the scene image based on the modeless layout.

Description

Image generation method based on modeless layout completion
Technical Field
The invention belongs to the field of image data processing, and particularly relates to an image generation method based on non-modal layout completion.
Background
In recent years, a layout-diagram (layout) -based generation model has received great attention because it can more explicitly represent scene information. The layout is a very important concept in the image generation process, and the layout information contains object types and spatial position information in a scene, and is a powerful structural representation of the image. Compared with other scene priori information, the biggest characteristic of the layout is that the category and the spatial position of each object in the complex scene can be described. Therefore, the generation network based on the prior of the layout diagram is expected to solve the problems of lower precision and lower accuracy in the generated image.
Chinese patent CN114241052a discloses a method for generating a new view image of a multi-object scene based on a layout, which includes inputting the layout of a plurality of images to a layout predictor to obtain a layout under a new view; inputting a plurality of images, sampling each object instance in the images, connecting the images with a camera pose matrix along a channel direction to construct an input tensor, and inputting the constructed tensor to a pixel predictor to obtain images of all objects under a new view angle; inputting the layout diagram under the new view angle and the images of all objects under the new view angle into a scene generator, sequentially passing through an encoder and a fusion device to obtain a fusion characteristic containing all object information, and generating a scene image through a decoder. According to the method, the network is guided to generate the scene image through the layout image information of the scene, the depth image of the input image is not relied on, the generated image is clearer and more real, and the problems of lower precision and lower accuracy in the existing generated image are solved.
The Chinese patent CN114241052A discloses a semantic image analogy method based on a single image generation countermeasure network, and the technical scheme provided by the invention can train a generation model special for a given image under the condition of giving any image and a semantic segmentation diagram thereof, and the model can recombine source images according to different expected semantic layouts to generate images conforming to target semantic layouts so as to achieve the effect of semantic image analogy. The visual quality and the coincidence accuracy of the result generated by the method are both optimal.
However, the above patent must divide the picture by using the original picture and the corresponding real/pre-training model, and in many application scenarios, for example, the user draws a modeless layout, and can obtain a relatively accurate scene image only by the modeless layout, and determine whether the thought of the layout is accurate by the scene image, where the modeless layout is a layout with a shielding relationship between objects, and the existing modeless layout is marked with only a visible part of the object in the picture, and the masking part is not considered, so that the incompleteness of the scene marking information is caused, and the model defaults that each layout represents a complete object when training, so that the masking relationship existing in the real scene is ignored, and the model cannot understand the relationship between objects in the complex real scene accurately.
Disclosure of Invention
The invention provides an image generation method based on modeless layout completion, which can accurately generate a scene image based on a modeless layout.
An image generation method based on modeless layout completion, comprising:
constructing a training sample set to obtain a real scene image, and a modeless layout diagram and a modal layout diagram corresponding to the real scene image; combining frames with overlapping areas or intersecting edges in a modeless layout chart into a first modeless frame group, combining the first modeless frame groups with the same frames to obtain a second modeless frame group, sequentially extracting and scaling the second modeless frame group to obtain modeless frame combined images, taking each modeless frame combined image as a training sample, and constructing a training sample set by a plurality of modeless frame combined images;
the training model is constructed and comprises a category hidden space module, a boundary frame hidden space module and a modal frame deriving module, any non-modal frame in the training sample is used as a frame to be complemented, other frames are used as a covering frame, the object categories of the frame to be complemented and the covering frame are converted into category hidden space features through label embedding in the category hidden space module, and the category hidden space features are fully connected to obtain category hidden space feature vectors; respectively downsampling the boundary frames of the to-be-complemented frame and the hidden frame by using the boundary frame hidden space module to obtain boundary frame hidden space feature vectors of the to-be-complemented frame and the hidden frame; combining the boundary frame hidden space feature vector and the category hidden space feature vector through a mode frame derivation module to obtain a prediction mode frame hidden space feature vector, and upsampling the prediction mode frame hidden space feature vector to obtain a prediction mode frame;
constructing a loss function based on the prediction mode standard frame and the corresponding standard frame in the mode layout diagram, training a training model based on a training sample set through the loss function to obtain a modeless layout completion model, inputting the modeless layout diagram into the modeless layout completion model to obtain a prediction mode layout diagram, and inputting the prediction mode layout diagram into an image generation model to obtain a scene image.
The frame in the modeless layout is used for marking the category of the object and the size and the position of the visible range;
the frame in the modal layout is used for marking the category of the object, the size of the visible range and the shielding range, and the positions of the visible range and the shielding range.
The step of sequentially extracting and scaling the second modeless frame combination to obtain a modeless frame combination image comprises the following steps:
and expanding the boundary of the second modeless frame combination by adopting a maximum value method based on the extreme values of the height, the width and the abscissa of the second modeless frame group, extracting the expanded second modeless frame combination to obtain a second modeless frame combination image, and scaling the second modeless frame combination image to a given resolution to obtain a modeless frame combination image.
The class hidden space module comprises a label embedding layer and a full connection layer, the object classes of the to-be-complemented standard frame and the hidden standard frame are converted into class hidden space features through the label embedding layer, and the class hidden space features are fully connected through the full connection layer to obtain class hidden space feature vectors.
The method comprises the steps that a boundary frame hidden space module respectively downsamples a boundary frame to be complemented and a boundary frame of a covering boundary frame to obtain boundary frame hidden space feature vectors of the boundary frame to be complemented and the covering boundary frame, wherein:
the boundary frame hidden space module comprises a plurality of downsampling submodules which are connected in sequence, each downsampling submodule comprises a downsampling unit and a maximum pooling layer which are connected in sequence, each downsampling unit comprises a plurality of downsampling subunits which are connected in sequence, and each downsampling subunit comprises a convolution layer, a regularization layer and an activation layer in sequence.
The mode frame deriving module comprises a plurality of full-connection layers and a plurality of upsampling submodules, wherein the boundary frame hidden space feature vector and the category hidden space feature vector are combined through the plurality of full-connection layers to obtain a prediction mode frame hidden space feature vector, and the prediction mode frame hidden space feature vector is upsampled through the plurality of upsampling submodules to obtain a prediction mode frame.
Constructing a loss function based on the prediction mode frame and the corresponding frame in the mode layout diagram
Figure 185726DEST_PATH_IMAGE001
The method comprises the following steps:
Figure 634025DEST_PATH_IMAGE002
wherein ,
Figure 70821DEST_PATH_IMAGE003
is the parameter of the ultrasonic wave to be used as the ultrasonic wave,N 1 for the number of bounding boxes of the modality frames in the frame to be complemented,N 2 for the number of bounding boxes of the modeless boxes to be filled,
Figure 669293DEST_PATH_IMAGE004
is the firstoThe bounding boxes of the label frames are covered,
Figure 218086DEST_PATH_IMAGE005
is the first to be complemented in the framesThe bounding box of the modeless bounding box,
Figure 571707DEST_PATH_IMAGE006
is the first to be complemented in the framerThe bounding boxes of the individual modality frames,
Figure 245265DEST_PATH_IMAGE007
is the first to be complemented in the framerThe category of the frame of each modality,
Figure 647427DEST_PATH_IMAGE008
is the first to be complemented in the framesThe category of the non-modal box,
Figure 50727DEST_PATH_IMAGE009
is the firstoThe category of the individual mask frames,
Figure 575249DEST_PATH_IMAGE010
in order to train the model,
Figure 234638DEST_PATH_IMAGE011
as a function of the cross-over loss,M t is a real mode frame.
Measuring accuracy of a modeless layout completion model based on IoU variant indicators, the IoU variant indicators comprising a first IoU variant indicator and a second IoU variant indicator, wherein:
first IoU variant index
Figure 502808DEST_PATH_IMAGE012
The method comprises the following steps:
Figure 432718DEST_PATH_IMAGE013
second IoU variant index
Figure 393721DEST_PATH_IMAGE014
The method comprises the following steps:
Figure 41871DEST_PATH_IMAGE015
wherein ,
Figure 848153DEST_PATH_IMAGE016
is the first in the modeless layoutiThe number of bounding boxes is a function of the number of bounding boxes,mrepresenting the bounding box as in the original modeless layout,
Figure 898149DEST_PATH_IMAGE017
is the first in the true module layoutiThe number of bounding boxes is a function of the number of bounding boxes,arepresenting that the bounding box is the bounding box in the real module layout, the bounding box of the original modeless layout corresponds to the bounding box of the real modal layout one by one,Fthe model is completed for the modeless layout,Nis the number of bounding boxes.
Compared with the prior art, the invention has the beneficial effects that:
according to the method, the prediction mode layout is obtained by complementing each modeless frame in the modeless layout one by one, and the scene image is accurately obtained through the generator based on the prediction mode layout.
According to the invention, the classification relation vector and the boundary frame relation vector are respectively obtained by fusing the classification of the to-be-complemented modeless frame and other frames and the characteristics of the boundary frame in the modeless layout diagram in the hidden space, and the classification relation vector and the boundary frame relation vector are fused and then up-sampled, so that the to-be-complemented modeless frame is complemented to obtain the accurate prediction mode frame, and the corresponding object and the position relation of the corresponding object and other objects in the scene diagram can be completely presented based on the prediction mode frame.
Drawings
FIG. 1 is a flowchart of an image generation method based on modeless layout completion according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a training sample set obtained according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an image generating method based on modeless layout completion according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a loss function construction according to an embodiment of the present invention;
fig. 5 is a graph showing the comparison of effects provided by the embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and technical effects of the present invention more apparent, the present invention will be further described in detail below with reference to the accompanying drawings.
The invention provides an image generation method based on modeless layout completion, as shown in fig. 1, comprising the following steps:
(1) And obtaining a training sample set and a label based on the real image and the modal layout and the modeless layout corresponding to the real image.
Performing frame labeling on a real image to obtain a modeless layout and a modal layout, wherein the modeless layout is the object type of the object in the real image and the size and the position of the object in a visible range; the modal layout is a model of the object class that labels the objects in the real image, the size of the visible and obscured ranges, and the location of the visible and obscured ranges,the invention takes the standard frame in the modal layout as a label, and the standard frame comprises an object category and a framebbox) The bezel includes a position (the coordinates of the upper left corner) and a size (height and width).
As shown in fig. 2, first, the present invention provides a method for classifying and combining frames in a modeless layout, which specifically includes the following steps: combining two frames with overlapping areas into a first modeless frame group, and then combining two frames with intersecting edges in a modeless layout diagram into the first modeless frame group; traversing the first modeless frame group, and combining the first modeless frame group with the same frame to obtain a second modeless frame group.
Then, the invention provides a method for extracting and scaling the obtained second modeless frame group, which comprises the following specific steps: obtaining maximum height value of second non-modal frame group
Figure 935113DEST_PATH_IMAGE018
Maximum width value
Figure 195193DEST_PATH_IMAGE019
And a minimum abscissa and ordinate value
Figure 8428DEST_PATH_IMAGE020
And expanding the boundary frame of the second modeless frame group based on the extremum to obtain the boundary frame of the second modeless frame group after leaving the edge
Figure 912930DEST_PATH_IMAGE021
And extracting the expanded second modeless frame combination to obtain a second modeless frame combination image, and then scaling the second modeless frame combination image to a picture with the resolution of 256 multiplied by 256. Each modeless frame combination image is used as a training sample, and a plurality of modeless frame combination images construct a training sample set.
(2) Constructing a training model which comprises a hidden space modulebranch cate ) Boundary frame hidden space modulebranch modal ) And a mode frame deduction modulebranch amodal ) Any non-modal frame in the training sample is used as the frame to be complemented, and other frames are used as the mask frames, wherein, as shown in figure 3,branch cate is used for analyzing the interrelationship between the categories according to the category C of the given complement frame and the mask frame to obtain the category hidden space feature vector,branch modal for analyzing the hidden spatial relationship between the bounding box to be complemented and the bounding box in the mask bounding box and expressing the relationship as a bounding box hidden spatial feature vector,branch amodal and combining the obtained two feature vectors to derive a possible hidden space feature vector of the modal boundary frame, namely, a hidden space feature vector of the prediction modal boundary frame, deriving a prediction modal frame through a series of up-sampling and full connection, taking each frame in the training sample as a frame to be complemented, taking other frames as a covering frame, and complementing the frame to be complemented through the steps to complete the training sample.
The hidden space module provided by the inventionbranch cate ) The method comprises a label embedding layer and a full connection layer, wherein object categories of a to-be-complemented standard frame and a hidden standard frame are converted into category hidden space features through the label embedding layer, the category hidden space features are fully connected through the full connection layer to obtain category hidden space feature vectors, and the vectors are 512 multiplied by 1 dimensionality, so that category relations of the to-be-complemented standard frame and the hidden standard frame are obtained.
The invention provides a boundary frame hidden space modulebranch modal ) The device comprises 5 downsampling sub-modules which are sequentially connected, wherein each downsampling sub-module comprises downsampling units and a maximum pooling layer which are sequentially connected, each downsampling unit comprises 2 downsampling sub-units which are sequentially connected, and each downsampling sub-unit sequentially comprises a convolution layer, a regularization layer and an activation layer. Finally, the hidden space feature vector of the boundary box with the dimension of 512 multiplied by 16 is obtained.
The modal frame deducing module provided by the inventionbranch amodal ) Comprising 2 full connection layersAnd 5 upsampling submodules, wherein the boundary frame hidden space feature vector and the category hidden space feature vector are combined through 2 full-connection layers to obtain a prediction mode frame hidden space feature vector with the dimension of 512 multiplied by 17, and the prediction mode frame hidden space feature vector is upsampled through the 5 upsampling submodules to obtain the prediction mode frame.
(3) Training the training model based on the training sample set by adopting the constructed loss function to obtain a non-modal layout completion model (ALCN), and constructing the loss function by predicting a modal frame and a corresponding frame in the modal layout as shown in figure 4
Figure 481315DEST_PATH_IMAGE001
The method comprises the following steps:
Figure 838478DEST_PATH_IMAGE002
wherein ,
Figure 455404DEST_PATH_IMAGE003
is the parameter of the ultrasonic wave to be used as the ultrasonic wave,N 1 for the number of bounding boxes of the modality frames in the frame to be complemented,N 2 for the number of bounding boxes of the modeless boxes to be filled,
Figure 339046DEST_PATH_IMAGE004
is the firstoThe bounding boxes of the label frames are covered,
Figure 953698DEST_PATH_IMAGE005
is the first to be complemented in the framesThe bounding box of the modeless bounding box,
Figure 922791DEST_PATH_IMAGE006
is the first to be complemented in the framerThe bounding boxes of the individual modality frames,
Figure 77829DEST_PATH_IMAGE007
is the first to be complemented in the framerThe category of the frame of each modality,
Figure 455459DEST_PATH_IMAGE008
is the first to be complemented in the framesThe category of the non-modal box,
Figure 365646DEST_PATH_IMAGE009
is the firstoThe category of the individual mask frames,
Figure 697401DEST_PATH_IMAGE010
in order to train the model,
Figure 656130DEST_PATH_IMAGE011
as a function of the cross-over loss,M t is a real mode frame. By regulating and controlling the super-parameters, the model can complement the non-modal frame in the frame to be complemented, and the complement of the modal frame is reduced.
According to the method, a training model is trained through a loss function based on a training sample set to obtain a modeless layout completion model, a modeless layout diagram is input into the modeless layout completion model to obtain a prediction mode layout diagram, and the prediction mode layout diagram is input into a layout diagram to an image generation model to obtain a scene image.
(4) The invention also provides IoU variant indexes for evaluating the non-modal layout completion model completion effect, the non-modal layout completion model obtained in the step (3) is evaluated through IoU variant indexes, the accuracy of the non-modal layout completion model is measured based on IoU variant indexes, ioU variant indexes comprise a first IoU variant index and a second IoU variant index, wherein:
first IoU variant index
Figure 248785DEST_PATH_IMAGE012
The method comprises the following steps:
Figure 205240DEST_PATH_IMAGE013
second IoU variant index
Figure 148925DEST_PATH_IMAGE014
The method comprises the following steps:
Figure 911345DEST_PATH_IMAGE015
wherein ,
Figure 499452DEST_PATH_IMAGE016
is the first in the modeless layoutiThe number of bounding boxes is a function of the number of bounding boxes,mrepresenting the bounding box as in the original modeless layout,
Figure 485863DEST_PATH_IMAGE017
is the first in the true module layoutiThe number of bounding boxes is a function of the number of bounding boxes,arepresenting that the bounding box is the bounding box in the real module layout, the bounding box of the original modeless layout corresponds to the bounding box of the real modal layout one by one,Fthe model is completed for the modeless layout,Nis the number of bounding boxes. . Of these two indices, the two indices are,
Figure 296605DEST_PATH_IMAGE022
the complementation effect of the modeless layout complementation model under different difficulty levels is measured, because IoU between the modeless layout and the real modal layout is lower, the part of the model needing complementation is relatively more;
Figure 393874DEST_PATH_IMAGE023
the accuracy of the modeless layout completion model is measured because IoU between the modeless layout and the real modeless layout is very high, meaning that it is very close, and a little erroneous change will result in a drop in the results of the index.
(5) And (3) generating a corresponding complement modal layout according to the non-modal frame layout diagram of the arbitrarily input object by using the non-modal layout complement model obtained in the step (3), and visualizing the shielding relation between the objects in the scene. The method comprises the following specific steps:
(5-1) drawing bounding boxes of the modeless markup boxes to be completed and categories of each bounding box.
And (5-2) inputting the drawn modeless layout completion model obtained in the step (3) into the modeless annotation frame to be completed to obtain a completed modal layout diagram, comparing differences between the modeless layout diagram and the modal layout diagram, and highlighting the shielding relation between objects in the scene.
(6) Generating a high-quality scene image by using the completed modal layout obtained in the step (5), wherein the method comprises the following specific steps of: and (5) inputting the complement annotation frame obtained in the step (5) into an image generation model to obtain a generated scene image. Fig. 5 shows an example of a set of generated images, which are from left to right, respectively a modeless layout, a picture generated based on the modeless layout, a picture generated by the completed modeless layout, and a real picture, as shown in fig. 5, it can be found from the result that the quality effect of the picture generated after the modeless layout is completed into the modeless layout by using the method provided by the invention is relatively better.

Claims (7)

1. An image generation method based on modeless layout completion, comprising:
constructing a training sample set to obtain a real scene image, and a modeless layout diagram and a modal layout diagram corresponding to the real scene image; combining frames with overlapping areas or intersecting edges in a modeless layout chart into a first modeless frame group, combining the first modeless frame groups with the same frames to obtain a second modeless frame group, sequentially extracting and scaling the second modeless frame group to obtain modeless frame combined images, taking each modeless frame combined image as a training sample, and constructing a training sample set by a plurality of modeless frame combined images;
the training model is constructed and comprises a category hidden space module, a boundary frame hidden space module and a modal frame deriving module, any non-modal frame in the training sample is used as a frame to be complemented, other frames are used as a covering frame, the object categories of the frame to be complemented and the covering frame are converted into category hidden space features through label embedding in the category hidden space module, and the category hidden space features are fully connected to obtain category hidden space feature vectors; respectively downsampling the boundary frames of the to-be-complemented frame and the hidden frame by using the boundary frame hidden space module to obtain boundary frame hidden space feature vectors of the to-be-complemented frame and the hidden frame; combining the boundary frame hidden space feature vector and the category hidden space feature vector through a mode frame derivation module to obtain a prediction mode frame hidden space feature vector, and upsampling the prediction mode frame hidden space feature vector to obtain a prediction mode frame;
constructing a loss function based on the prediction mode standard frame and the corresponding standard frame in the mode layout diagram, training a training model based on a training sample set through the loss function to obtain a modeless layout completion model, inputting the modeless layout diagram to the modeless layout completion model to obtain a prediction mode layout diagram, and inputting the prediction mode layout diagram to an image generation model to obtain a scene image;
constructing a loss function based on the prediction mode frame and the corresponding frame in the mode layout diagram
Figure QLYQS_1
The method comprises the following steps:
Figure QLYQS_2
wherein ,
Figure QLYQS_3
is the parameter of the ultrasonic wave to be used as the ultrasonic wave,N 1 for the number of bounding boxes of the modality frames in the frame to be complemented,N 2 for the number of bounding boxes of the modeless boxes in the boxes to be complemented, +.>
Figure QLYQS_7
Is the firstoBoundary boxes of the mask frame +.>
Figure QLYQS_11
Is the first to be complemented in the framesBounding box of each modeless frame, +.>
Figure QLYQS_5
Is the first to be complemented in the framerBoundary box of each mode frame, +.>
Figure QLYQS_6
Is the first to be complemented in the framerCategory of individual modality frame->
Figure QLYQS_9
Is the first to be complemented in the framesCategory of modeless frame +.>
Figure QLYQS_10
Is the firstoCategory of individual mask frame->
Figure QLYQS_4
For training the model->
Figure QLYQS_8
As a function of the cross-over loss,M t is a real mode frame.
2. The modeless layout completion-based image generation method of claim 1, wherein a frame in the modeless layout diagram is used to label a category of an object, and a size and a position of a visual range; the frame in the modal layout is used for marking the category of the object, the size of the visible range and the shielding range, and the positions of the visible range and the shielding range.
3. The method for generating an image based on modeless layout completion of claim 1, wherein the sequentially extracting and scaling the second modeless frame combination to obtain the modeless frame combination image comprises:
and expanding the boundary of the second modeless frame combination by adopting a maximum value method based on the extreme values of the height, the width and the abscissa of the second modeless frame group, extracting the expanded second modeless frame combination to obtain a second modeless frame combination image, and scaling the second modeless frame combination image to a given resolution to obtain a modeless frame combination image.
4. The image generation method based on modeless layout completion of claim 1, wherein the class hidden space module comprises a label embedding layer and a full connection layer, object classes of a frame to be completed and a hidden frame are converted into class hidden space features through the label embedding layer, and the class hidden space features are fully connected through the full connection layer to obtain class hidden space feature vectors.
5. The modeless layout completion-based image generation method of claim 1, wherein the boundary frames of the frame to be completed and the mask frame are respectively downsampled by the boundary frame hidden space module to obtain boundary frame hidden space feature vectors of the frame to be completed and the mask frame, wherein:
the boundary frame hidden space module comprises a plurality of downsampling submodules which are connected in sequence, each downsampling submodule comprises a downsampling unit and a maximum pooling layer which are connected in sequence, each downsampling unit comprises a plurality of downsampling subunits which are connected in sequence, and each downsampling subunit comprises a convolution layer, a regularization layer and an activation layer in sequence.
6. The modeless layout completion-based image generation method of claim 1, wherein the modality frame derivation module comprises a plurality of fully connected layers and a plurality of upsampling submodules, wherein the bounding box hidden spatial feature vector and the class hidden spatial feature vector are combined through the plurality of fully connected layers to obtain a prediction modality frame hidden spatial feature vector, and the prediction modality frame hidden spatial feature vector is upsampled through the plurality of upsampling submodules to obtain the prediction modality frame.
7. The modeless layout completion based image generation method of claim 1 wherein the accuracy of the modeless layout completion model is measured based on a IoU variant indicator, the IoU variant indicator comprising a first IoU variant indicator and a second IoU variant indicator, wherein:
first IoU variant index
Figure QLYQS_12
The method comprises the following steps: />
Figure QLYQS_13
Second IoU variant index
Figure QLYQS_14
The method comprises the following steps: />
Figure QLYQS_15
wherein ,
Figure QLYQS_16
is the first in the modeless layoutiThe number of bounding boxes is a function of the number of bounding boxes,mrepresenting the bounding box as the bounding box in the original modeless layout, +.>
Figure QLYQS_17
Is the first in the true module layoutiThe number of bounding boxes is a function of the number of bounding boxes,arepresenting that the bounding box is the bounding box in the real module layout, the bounding box of the original modeless layout corresponds to the bounding box of the real modal layout one by one,Fthe model is completed for the modeless layout,Nis the number of bounding boxes. />
CN202211612018.8A 2022-12-15 2022-12-15 Image generation method based on modeless layout completion Active CN115661603B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211612018.8A CN115661603B (en) 2022-12-15 2022-12-15 Image generation method based on modeless layout completion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211612018.8A CN115661603B (en) 2022-12-15 2022-12-15 Image generation method based on modeless layout completion

Publications (2)

Publication Number Publication Date
CN115661603A CN115661603A (en) 2023-01-31
CN115661603B true CN115661603B (en) 2023-04-25

Family

ID=85023010

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211612018.8A Active CN115661603B (en) 2022-12-15 2022-12-15 Image generation method based on modeless layout completion

Country Status (1)

Country Link
CN (1) CN115661603B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110851626A (en) * 2019-11-05 2020-02-28 武汉联图时空信息科技有限公司 Layer layout based time-space data visual analysis method and system
CN113196296A (en) * 2018-12-17 2021-07-30 微软技术许可有限责任公司 Detecting objects in a crowd using geometric context
CN114119803A (en) * 2022-01-27 2022-03-01 浙江大学 Scene image generation method based on causal graph

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100287519A1 (en) * 2009-05-11 2010-11-11 Anaglobe Technology, Inc. Method and system for constructing a customized layout figure group
US20220067983A1 (en) * 2020-08-28 2022-03-03 Nvidia Corporation Object image completion
CN114241052B (en) * 2021-12-27 2023-09-08 江苏贝思旺科技有限公司 Method and system for generating new view image of multi-object scene based on layout
CN114187491B (en) * 2022-02-17 2022-05-17 中国科学院微电子研究所 Method and device for detecting shielding object

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113196296A (en) * 2018-12-17 2021-07-30 微软技术许可有限责任公司 Detecting objects in a crowd using geometric context
CN110851626A (en) * 2019-11-05 2020-02-28 武汉联图时空信息科技有限公司 Layer layout based time-space data visual analysis method and system
CN114119803A (en) * 2022-01-27 2022-03-01 浙江大学 Scene image generation method based on causal graph

Also Published As

Publication number Publication date
CN115661603A (en) 2023-01-31

Similar Documents

Publication Publication Date Title
CN108549893B (en) End-to-end identification method for scene text with any shape
JP7206309B2 (en) Image question answering method, device, computer device, medium and program
CN111428586A (en) Three-dimensional human body posture estimation method based on feature fusion and sample enhancement
CN108122239A (en) Use the object detection in the image data of depth segmentation
CN113673425A (en) Multi-view target detection method and system based on Transformer
CN113609896A (en) Object-level remote sensing change detection method and system based on dual-correlation attention
CN110781850A (en) Semantic segmentation system and method for road recognition, and computer storage medium
CN115063425B (en) Reading knowledge graph-based structured inspection finding generation method and system
CN111626994A (en) Equipment fault defect diagnosis method based on improved U-Net neural network
JP7174812B2 (en) Querying semantic data from unstructured documents
CN114730486B (en) Method and system for generating training data for object detection
CN113657409A (en) Vehicle loss detection method, device, electronic device and storage medium
Zhang et al. Multiple adverse weather conditions adaptation for object detection via causal intervention
CN113239928A (en) Method, apparatus and program product for image difference detection and model training
CN116645592A (en) Crack detection method based on image processing and storage medium
Dong et al. Multiple spatial residual network for object detection
CN114639101A (en) Emulsion droplet identification system, method, computer equipment and storage medium
CN114445620A (en) Target segmentation method for improving Mask R-CNN
CN115661603B (en) Image generation method based on modeless layout completion
CN113191204A (en) Multi-scale blocking pedestrian detection method and system
CN115937520A (en) Point cloud moving target segmentation method based on semantic information guidance
US11804042B1 (en) Prelabeling of bounding boxes in video frames
CN114332473A (en) Object detection method, object detection device, computer equipment, storage medium and program product
Nouri et al. Global visual saliency: Geometric and colorimetrie saliency fusion and its applications for 3D colored meshes
CN110472728B (en) Target information determining method, target information determining device, medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant