WO2023056833A1 - 背景图生成、图像融合方法、装置、电子设备及可读介质 - Google Patents

背景图生成、图像融合方法、装置、电子设备及可读介质 Download PDF

Info

Publication number
WO2023056833A1
WO2023056833A1 PCT/CN2022/119181 CN2022119181W WO2023056833A1 WO 2023056833 A1 WO2023056833 A1 WO 2023056833A1 CN 2022119181 W CN2022119181 W CN 2022119181W WO 2023056833 A1 WO2023056833 A1 WO 2023056833A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
target image
target
background
frame
Prior art date
Application number
PCT/CN2022/119181
Other languages
English (en)
French (fr)
Inventor
杜宗财
路浩威
侯晓霞
Original Assignee
北京字节跳动网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字节跳动网络技术有限公司 filed Critical 北京字节跳动网络技术有限公司
Publication of WO2023056833A1 publication Critical patent/WO2023056833A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/40Filling a planar surface by adding surface attributes, e.g. colour or texture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration by the use of more than one image, e.g. averaging, subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation

Definitions

  • Embodiments of the present disclosure relate to the technical field of image processing, for example, to a background image generation, image fusion method, device, electronic device, and readable medium.
  • Image fusion refers to the image processing of the collected related images to maximize the extraction of beneficial information in the image to obtain a comprehensive image. For example, when generating a cover for a video, multiple frames of images in the video can be fused to obtain a high-quality fused image that can reflect key content in the video as the cover. In the process of fusing multiple frames of images into one frame of images, it is usually necessary to generate a unified background image.
  • the method for generating the background image is mainly to smooth the foreground object (noise point) through a large amount of data, that is, average all images to obtain the background image.
  • This method has strong limitations, and requires that each frame of image corresponds to the same viewing angle, and needs enough images to ensure a smooth effect.
  • multiple frames of images are not necessarily at the same viewing angle, and the scene in the video is complex and changeable.
  • the smoothing effect of the background image generated by this method is poor, especially in the area where the instance and the background are connected, it is easy to be distorted or deformed. The quality of the generated background images cannot be guaranteed.
  • the present disclosure provides a background image generation and image fusion method, device, electronic equipment and readable medium to generate a high-quality background image.
  • an embodiment of the present disclosure provides a method for generating a background image, including:
  • the region where the removed instance is located in the target image is filled to obtain the filling result of the target image, wherein the setting image includes the at least two frames a target image in the target images that is different from the target image;
  • a background image is generated according to the filling results of all the target images.
  • the embodiment of the present disclosure also provides an image fusion method, including:
  • the embodiment of the present disclosure also provides a background image generation device, including:
  • the segmentation module is configured to perform instance segmentation on each frame of the target image in at least two frames of the target image, and obtain a background segmentation map corresponding to each frame of the target image without instances;
  • the filling module is configured to, for each frame of the target image, fill in the area where the removed instance in the target image is located according to the background segmentation map of the set image, and obtain a filling result of the target image, wherein the set image including a target image different from the target image in the at least two frames of target images;
  • the generation module is configured to generate a background image according to the filling results of all the target images.
  • the embodiment of the present disclosure also provides a fusion device, including:
  • an acquisition module configured to acquire at least two frames of target images
  • the background image generation module is configured to generate a background image according to the filling results of the regions where the removed instances are located in all the target images;
  • the fusion module is configured to fuse all instances in the target image into the background image to obtain a fusion image.
  • an embodiment of the present disclosure further provides an electronic device, including:
  • a storage device configured to store a program
  • the processor When the program is executed by the processor, the processor implements the background image generation method described in the first aspect or the image fusion method described in the second aspect.
  • the embodiments of the present disclosure further provide a computer-readable medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the background image as described in the first aspect is realized.
  • FIG. 1 is a flowchart of a method for generating a background image in Embodiment 1 of the present disclosure
  • FIG. 2 is a flow chart of a method for generating a background image in Embodiment 2 of the present disclosure
  • Fig. 3 is a schematic diagram of filling the area where the removed instance is located in the target image in Embodiment 2 of the present disclosure
  • FIG. 4 is a flow chart of a method for generating a background image in Embodiment 3 of the present disclosure
  • Fig. 5 is a schematic diagram of an expanded area corresponding to an example in the target image in Embodiment 3 of the present disclosure
  • Fig. 6 is the flowchart of obtaining the restoration result of each target image according to the filling results of all target images in Embodiment 3 of the present disclosure
  • FIG. 7 is a schematic diagram of a background image generated according to a target image in Embodiment 3 of the present disclosure.
  • FIG. 8 is a flowchart of an image fusion method in Embodiment 4 of the present disclosure.
  • FIG. 9 is a schematic diagram of a fused image in Embodiment 4 of the present disclosure.
  • FIG. 10 is a schematic structural diagram of a background image generation device in Embodiment 5 of the present disclosure.
  • FIG. 11 is a schematic structural diagram of a background image generation device in Embodiment 6 of the present disclosure.
  • FIG. 12 is a schematic diagram of a hardware structure of an electronic device in Embodiment 7 of the present disclosure.
  • the term “comprise” and its variations are open-ended, ie “including but not limited to”.
  • the term “based on” is “based at least in part on”.
  • the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one further embodiment”; the term “some embodiments” means “at least some embodiments.” Relevant definitions of other terms will be given in the description below.
  • FIG. 1 is a flowchart of a method for generating a background image in Embodiment 1 of the present disclosure.
  • the method is applicable to the situation of extracting the background image based on multiple frames of images, for example, the method obtains the background image by merging the features in the background segmentation images of the multiple frames of images.
  • the method can be executed by a background image generation device, which can be implemented by software and/or hardware, and integrated on the electronic device.
  • the electronic device in this embodiment may be a device with image processing functions such as a computer, a notebook computer, a server, a tablet computer, or a smart phone.
  • the method for generating a background image in Embodiment 1 of the present disclosure includes the following steps:
  • S110 Perform instance segmentation on each frame of the target image in at least two frames of the target image, and obtain a background segmentation map corresponding to each frame of the target image without instances.
  • the target image mainly refers to an image including feature information for generating a background image.
  • a unified background image can be generated by fusing the feature information about the background in all target images.
  • the backgrounds in all target images are for the same or similar scenes, but the viewing angles can be different.
  • the target images may include instances (such as people, vehicles, etc.) and backgrounds, and the positions of the same instance in each target image may be different.
  • the main purpose of instance segmentation is to identify the instance in the target image and separate the instance in the target image from the background, and the remaining part after removing the instance is the background segmentation map.
  • SOLOv2 SOLOv2 is an improvement based on SOLO
  • SOLOv2 is an improvement based on SOLO
  • a set image can be used (for example, the set image can be all target images except the target image, or it can be The background segmentation maps of part of the target images (or a set number of target images) in all target images except the target image are filled in the region of the removed instance in the target image, and the filling result of the target image is obtained.
  • the background image can be obtained by synthesizing the filling results of all target images.
  • the region where the feature used to fill the region where the removed instance is located in the target image is located corresponds to the region where the removed instance is located in the target image.
  • the region where the instance to be removed in the target image is located is an area with a size A*A in the upper left corner
  • the regions in the image where the instances are removed.
  • the corresponding regions described in the following embodiments refer to the regions in the background segmentation image of the setting image corresponding to the regions where the instances to be removed in the target image are located.
  • the region where the removed instance is located in the target image is filled, which can be: the background segmentation map of the setting image (mainly the corresponding area in the background segmentation map of the setting image)
  • the feature information of the target image is averaged, and then the averaged result is used to fill the area where the removed instance is located in the target image.
  • X background segmentation maps of the set image
  • B 1 , B 2 , ..., B X the average of the features of the background segmentation maps of the set image
  • the features of the corresponding region in the averaged result B can be used to fill the region where the removed instance is located in the target image.
  • the background segmentation map of the set image can also be: averagely or randomly divide the area where the removed instance is located in the target image into N parts; and then use each setting The feature information of the background segmentation map of the image (mainly setting the corresponding regions in the background segmentation map of the image) is filled in one of them. For example, divide the area where the removed instance is located in the target image into A1 and A2, then you can use the background segmentation map 1 of the setting image 1 to fill in the features of the corresponding area in A1, and use the background segmentation map 2 of the setting image 2 to fill in The features of the corresponding area fill A2.
  • the background segmentation map of the set image is also removed from the instance, in order to ensure the effectiveness of filling, in the process of assigning the region where the removed instance in the target image is located to different set images, It is necessary to ensure that the assigned background segmentation map of the set image is characteristic in the corresponding area, rather than a completely blank area. For example, ensure that in the background segmentation image 1 of the set image 1, the corresponding area of A1 has the content of the background part, and cannot be completely blank area of the removed instance.
  • the feature information of the target image fills the area where the removed instance is located until the area where the removed instance is located in the target image is completely filled, or all the set images have been used for filling.
  • the features of the corresponding area in the background segmentation map 1 of the setting image 1 are first used to fill in, but since the background segmentation map 1 of the setting image 1 is also removed from the instance, If the removed instance is located in the corresponding area, some features in the corresponding area are vacant, so after filling the features of the corresponding area to the area where the removed instance is located in the target image, the removed There are still vacancies in the area where the instance is located. In this case, the features of the corresponding area in the background segmentation image 2 of the set image 2 can be used to fill in, and so on, until the area where the removed instance is located in the target image is completely Padding, or, all set images have been used for padding.
  • the background segmentation map of the set image to fill the area where the removed instance is located in the target image, it can also be: for the area where the removed instance is located in the target image, the background segmentation map of all the set images (mainly set The corresponding region in the background segmentation map of the image) averages the features of the common area and fills the area where the removed instance is located in the target image, and then uses the background segmentation map of all the set images for the remaining unfilled areas Fill in the feature information.
  • the area where the removed instance in the target image is located is the area with the size A*A in the upper left corner
  • the set image 1 to set image N all contain the size A'* in the area with the size A*A in the upper left corner.
  • the common area of A' if A' is smaller than A, then the feature information in the common area with the size of A'*A' in the upper left corner of the image can be averaged and filled into the A*A area in the target image, for For the remaining part of the A*A area in the target image except for the A'*A' area, you can refer to the above arbitrary filling process and use the background segmentation maps of all the set images to fill in together.
  • the features in the filling results of all target images are fused to generate a background image.
  • the background image can be obtained by averaging the padding results of all target images, so as to fully reuse the background features of all target images.
  • the process of generating the background image can be divided into two stages.
  • the region of the removed instance in each frame of the target image can be filled with the background segmentation image of the set image, and each frame can be obtained
  • the filling result corresponding to the target image, the filling result can be understood as a rough background image
  • the background image is generated according to the filling results of all target images, which can be understood as the repair process of the rough background image, which can be Optimize the characteristics of the background in all target images, and the resulting background image is more refined.
  • the filling results of all target images can be averaged to obtain the background image; or, in order to make the segmentation of the instance and the background smoother, the region where the instance of each target image is located can also be dilated, and then for the dilated The region, and then perform a second round of filling or averaging operations, so as to fuse all the features of the filling results and obtain a high-quality background image.
  • the background segmentation image of the set image is used to fill in, and the background image is generated by synthesizing the filling results of each frame of the target image, which is fully reproduced. All the features of the background in the target image are used to make the segmentation of the instance and the background smoother, thereby generating a high-quality background image.
  • FIG. 2 is a flow chart of a method for generating a background image in Embodiment 2 of the present disclosure.
  • the process of filling the region where the instance of the target image is removed is described according to the background segmentation map of the set image.
  • the region where the removed instance in the target image is located is filled according to the background segmentation map of the set image, and the filling result of the target image is obtained, which includes: for each frame of the target image, according to each The feature information of the corresponding area in the background segmentation map of the set image fills the area where the removed instance is located in the target image until the filling operation is completed according to the feature information of the corresponding area in the background segmentation map of the last set image, or until The region where the removed instance is located in the target image is completely filled, and the filling result of the target image is obtained.
  • the features of all background segmentation maps can be utilized to the maximum extent, and high-quality background maps can be efficiently generated.
  • the method for generating a background image in Embodiment 2 of the present disclosure includes the following steps:
  • S210 Perform instance segmentation on each frame of the target image in at least two frames of the target image, and obtain a background segmentation map corresponding to each frame of the target image without instances.
  • FIG. 3 is a schematic diagram of filling the region where the removed instance is located in the target image in Embodiment 2 of the present disclosure.
  • N is an integer greater than 2
  • the blank character-shaped area in each target image represents the area where the removed character instance is located, and the character instance is in a different target image
  • the position or movement of the may vary.
  • the feature information in the background segmentation map after removing the person instance in the target image 1 is represented by a grid; the feature information in the background segmentation map after removing the person instance in the target image 2 is represented by a slash; the target image N-1 removes the person
  • the feature information in the background segmentation map after the instance is represented by a point texture; the feature information in the background segmentation map after removing the person instance in the target image N is represented by a vertical line.
  • the shape of the character shown by the dotted line is the corresponding area
  • the feature information represented by the oblique line in this area can be used to fill the area in the target image 1 after the person instance is removed, but obviously, the character shape shown by the dotted line in the background segmentation image of the target image 2 also contains a part of the blank ( is because the person instance in the target image 2 is also removed), therefore, only using the feature information in the corresponding area in the background segmentation map of the target image 2 cannot completely fill the area after the person instance is removed in the target image 1, then You can continue to use the feature information of the corresponding area in the background segmentation map of the next target image to fill in; assuming that the next set image is the target image N-1, then the character shape shown by the dotted line in the background segmentation map of the target image N-1
  • the feature information of the diagonal part comes from the corresponding area of the background segmentation map of the target image 2
  • the feature information of the point part comes from the corresponding area of the background segmentation map of the target image N-1
  • the feature information of the vertical line part The information comes from the corresponding regions of the background segmentation map of the target image N.
  • the padding results of the target images 2 to N can be obtained.
  • the background image can be generated according to the filling results of all target images.
  • the filling operation on the current target image can be ended. Get the filling result of the current target image without using the background segmentation map of the subsequent set image to fill in; if the feature information of the corresponding area of the background segmentation map of the current set image is used to fill in, the current target image removes the person The region after the instance has not been completely filled. In this case, it can be determined whether there is a background segmentation map of the set image that has not been used for filling.
  • S240 Determine whether the current set image is the last set image, and execute S250 based on the judgment result that the current set image is the last set image; and execute S250 based on the judgment result that the current set image is not the last set image.
  • the background segmentation map of the set image is not used for filling (that is, the current set image is not the last set image)
  • the background segmentation map of the next set image can be used for filling.
  • the filling operation on the current target image can be ended to obtain the filling result of the current target image.
  • S260 Determine whether the current target image is the last target image. Based on the determination result that the current target image is the last target image, perform S280; based on the determination result that the current target image is not the last target image, perform S270.
  • the removed areas in the target image are filled in sequence according to the feature information of the corresponding area in the background segmentation image of each set image.
  • the area where the instance is located can maximize the use of the characteristics of each background segmentation map and efficiently generate a high-quality background image; on this basis, use the background segmentation map of each target image to generate a background map, which can synthesize all targets.
  • the characteristics of the background part of the image ensure that the background image is consistent with the background of all target images and generate high-quality background images.
  • FIG. 4 is a flow chart of a method for generating a background image in Embodiment 3 of the present disclosure.
  • the process of generating the background image according to the filling results of all target images is described.
  • the process of generating the background image can be divided into two stages.
  • the region of the removed instance in each frame of the target image can be filled with the background segmentation image of the set image, and each The padding results corresponding to the frame target images; in the second stage, the background image is generated according to the padding results of all target images.
  • the background image is generated according to the filling results of all target images, including: performing expansion processing on the region where the instance in each target image is located to obtain the corresponding expansion area of each target image; for each frame of target image, according to The feature information of the corresponding area in the filling results of all target images repairs the corresponding expansion area of the target image, and obtains the repair result of the target image; generates a background image according to the repair results of all target images.
  • the filling results of all target images are repaired, and the edge of the instance can be smoothed to obtain a background image with higher precision.
  • the method for generating a background image in Embodiment 3 of the present disclosure includes the following steps:
  • S310 Perform instance segmentation on each frame of the target image in at least two frames of the target image, and obtain a background segmentation map corresponding to each frame of the target image without instances.
  • dilation processing is performed on each instance in the target image, which can be understood as adding pixel values to the edge of the instance, so that the overall pixel area of the instance is expanded, so that the dilated area includes as much as possible the edge of the instance that is not easy to repair.
  • Adding pixel values can be achieved through convolution templates or convolution kernels.
  • FIG. 5 is a schematic diagram of an expanded region corresponding to an example in a target image in Embodiment 3 of the present disclosure.
  • the character-shaped area shown by the bold dotted line is the expanded area obtained by dilating the area where the instance is located in the target image, and the expanded area should be larger than the area where the original instance is located (oblique Lines, dotted textures, and character-shaped areas formed by vertical lines), the edges of the original instance should be included in the expansion area.
  • all the padding results obtained in the first stage can be used for inpainting to make the edge of the instance smoother.
  • the filling results of all target images to repair its dilated area, including: filling all target images (including the currently repaired target image and other target images)
  • the characteristic information of the region corresponding to the dilated region is averaged, and the averaged result is filled into the dilated region corresponding to the target image, so as to obtain the inpainting result of the target image.
  • the padding results of all target images to repair its dilated area, and it can also be a padding operation similar to the first stage, for example, use the padding results of other target images to match the dilation
  • the feature information of the corresponding area of the area fills the expansion area again, for example, it can be filled again after averaging the features of the area corresponding to the expansion area in the filling results of other target images, or the expansion area can be averaged or randomly divided into several Then use the features of the area corresponding to the expansion area in the filling results of each other target image to fill one of them, etc. On this basis, the inpainting result of the target image can be obtained.
  • an inpainting result with the highest image quality can be selected from all inpainting results of target images as a background image according to requirements.
  • generating the background image according to the inpainting results of all target images includes: averaging the inpainting results of all target images to obtain the background image.
  • the edge of the instance can be smoothed by making full use of the feature information of other target images.
  • the corresponding expansion area of the target image is repaired, and the repair result of the target image is obtained, including: in each iteration process, for For each frame of the target image, average the feature information of the area corresponding to the expansion area in the filling results of all target images, and fill the averaged result into the expansion area corresponding to the target image to obtain the restoration of the target image in this iteration process Result; based on the judgment result that the repair result of the target image in this iteration process does not meet the set conditions, enter the next iteration process; based on the judgment result that the repair result of the target image in this iteration process meets the set conditions, stop iteration, the inpainting result of the target image in this iteration process is used as the inpainting result of the target image.
  • the repair operation in the second stage can be iteratively executed multiple times until the set condition is met.
  • the set condition is: the repair result obtained in this iteration of any target image is the same as the repair result obtained in the previous iteration. If the feature difference of the iterative inpainting result is within the allowable range, the iteration can be stopped. At this time, the inpainting result corresponding to each target image has fully integrated the feature information in all the filling results, and the edge transition is smooth and the accuracy is higher. You can get Higher quality restoration results.
  • FIG. 6 is a flow chart of obtaining the inpainting result of each target image according to the padding results of all target images in Embodiment 3 of the present disclosure. As shown in Figure 6, the inpainting results of each target image are obtained according to the filling results of all target images, including:
  • the setting conditions include: the characteristic difference between the inpainting result of the target image in this iteration process and the corresponding inpainting result in the previous iteration process is within an allowable range.
  • the setting condition may also be that the number of iterations reaches a specified number of times, or the duration of iterations reaches a specified duration, and the like.
  • S440 Determine whether the current target image is the last target image. Based on the determination result that the current target image is the last target image, perform S460; based on the determination result that the current target image is not the last target image, perform S450.
  • the iteration can be stopped.
  • the repair results can be used as the final repair results; if the error is large, the iteration will not stop, and if there is still a filling result of the target image that has not been repaired during this iteration, you can continue to select the next target image as Repair the filling result of the current target image; if the current target image is the last target image, that is, the filling result of each target image in this iteration process has been repaired, then this iteration process is completed and enters the next iteration .
  • the process of iteratively repairing the padding results of all target images includes:
  • the times of inpainting for each target image may be different.
  • the filling results of the 10-frame target image are sequentially repaired
  • the third frame target image is repaired
  • the third frame target image is obtained in the second iteration
  • the error between the inpainting result and the inpainting result obtained in the first iteration is already small, then the iteration can be stopped.
  • the inpainting result of the target image in frames 1-3 is actually repaired by two iterations, and the frame 4-10
  • the padding result of the target image is actually repaired iteratively.
  • the filling result obtained in the first stage is actually a rough background image.
  • the repair operation in the second stage can improve the accuracy of filling, and the incorrect pixel values in the dilated area will be gradually repaired by the correct pixel values.
  • the correct pixel values of the background part outside the instance will not change with the iterations, ensuring that the generated background image fully integrates the feature information of all target images, and the edge processing effect is better, and the transition between the instance and the background is more natural.
  • B i,k represents the filling result obtained by filling the region of the instance removed in the i-th frame target image with the background segmentation map of the k-th frame target image
  • F i,k represents filling with the background segmentation map of the k-th frame target image
  • the masks of the dilated regions corresponding to the instances in each frame of the target image are denoted as
  • the filling results of the target image in each frame are respectively recorded as B 1 , B 2 , ..., B N , then the inpainting result of the i-th frame target image can be obtained according to the following formula:
  • Mean( ⁇ ) represents the matrix average function.
  • the background segmentation map of the removal instance corresponding to each frame of the target image it also includes: selecting a frame of the target image as a reference frame, and determining all the target images except the reference frame according to the feature point matching algorithm an affine transformation matrix with the reference frame; according to the affine transformation matrix, align the background segmentation maps of all target images except the reference frame with the background segmentation maps of the reference frame.
  • the background of each target image is not completely aligned due to different shooting angles, jitter or errors, etc., and the background is generated directly according to the background segmentation map of all target images image, there will be local distortion, deformation or blurring, etc., which will affect the accuracy and visual effect of the background image.
  • one frame of target image can be selected as a reference frame, and the background segmentation maps of all other target images are aligned with the reference frame.
  • the reference frame may be the target image with the highest image quality, the first target image, the last frame of the target image, or the target image in the middle.
  • the affine transformation matrix between all target images and the reference frame is determined.
  • the affine transformation matrix is used to describe the transformation relationship of the matched feature points from the target image to the reference frame.
  • Affine transformation includes linear Transform and translate transforms.
  • the feature point matching algorithm may be a scale-invariant feature transform (Scale-invariant Feature Transform, SIFT) algorithm.
  • the key feature points of the background part of each target image are first extracted, and these key feature points will not disappear due to factors such as illumination, scale, rotation, etc., and then, according to the feature vector of each key point, the target image and The key points in the reference frame are compared in pairs, and several pairs of feature points that match each other between the target image and the reference frame are found, so as to establish the corresponding relationship between the feature points and obtain the affine transformation matrix.
  • the frame of the target image may also be discarded.
  • FIG. 7 is a schematic diagram of a background image generated according to a target image in Embodiment 3 of the present disclosure. As shown in Figure 7, after registering (aligning) multiple frames of target images, remove the instances, use the feature information of the background part, and go through a two-stage algorithm (namely, filling and repairing operations), to obtain a high-quality background image. It can fully preserve the characteristics of the background part in each original target image, and the smoothing effect of the edge of the processing instance is better.
  • a two-stage algorithm namely, filling and repairing operations
  • the method for generating a background image in this embodiment improves the accuracy and image quality of generating a background image by selecting a frame of target image as a reference frame, and aligning the background segmentation images of all other target images with the reference frame; In the first stage, the rough background images of all target images are obtained, and in the second stage, the dilated regions of the instances are iteratively repaired to fuse all the features of the filling results, so that the generated background images fully reuse the background segmentation images of all target images. Feature information, and the processing effect on the edge of the instance is smoother, the transition between the instance and the background is more natural, and the quality of the background image is improved.
  • FIG. 8 is a flowchart of an image fusion method in Embodiment 4 of the present disclosure.
  • This method can be applied to the situation of fusing multiple frames of images into one image, for example, generating a unified background image based on multiple frames of images, and merging instances in each frame of images into the generated background image.
  • the application scenario of the method may be to extract multiple frames of images from a video, and generate a fused image based on the extracted multiple frames of images as the cover of the video; it may also be to generate a fused image based on a group of images, As the logo or folder icon of the group of images, or a thumbnail that can reflect the main content of the group of images, etc. can be obtained.
  • the method can be executed by an image fusion device, which can be implemented by software and/or hardware, and integrated on electronic equipment.
  • the electronic device in this embodiment may be a device with image processing functions such as a computer, a notebook computer, a server, a tablet computer, or a smart phone. It should be noted that for technical details not exhaustively described in this embodiment, reference may be made to any of the foregoing embodiments.
  • the method for generating a background image in Embodiment 1 of the present disclosure includes the following steps:
  • the target image mainly refers to an image containing background features, and a unified background image can be extracted by fusing the background features in all target images.
  • the background in all target images is for the same scene, but the viewing angles can vary.
  • the target image can be read from an electronic device, or downloaded from a database, can be a multi-frame image taken continuously, or a multi-frame image extracted from a video, etc.
  • acquiring at least two frames of target images includes: identifying action sequence frames in the video based on an action recognition algorithm, and using the action sequence frames as target images.
  • an effective action sequence frame can be identified from a video, and instances in each action sequence frame (taking a character as an example) can express a complete action or behavior in a coherent chronological order.
  • These action sequence frames can be used as target images.
  • a human body pose recognition (Open-pose) algorithm is used to estimate the pose of a character instance in a video. Exemplarily, first extract the position coordinates of the human body joint points in each frame image of the video, and calculate the distance variation matrix of the human body joint points between two adjacent frames accordingly; then segment the video, and use the corresponding The distance variation matrix generates video features; finally, the trained classifier is used to classify the video features.
  • Open-pose Open-pose
  • the video features corresponding to a video belong to the feature sequence of actions or behaviors in the preset behavior library, then this video corresponds to Each frame of is the action sequence frame.
  • Another example is to use the instance segmentation algorithm to extract the outline of the characters in each key frame and express the pose, and to extract the key features of the pose through the clustering algorithm. Based on these key features, use the Dynamic Time Warping (DTW) algorithm to complete the action recognition wait.
  • the action recognition algorithm can be implemented through the Temporal Shift Module (TSM) or Temporal Segment Networks (TSN) model, which is trained on the Kinetics-400 dataset and can be used to recognize 400 kinds of actions , which can meet the needs of identifying and displaying the action of the instance in the cover.
  • TSM Temporal Shift Module
  • TSN Temporal Segment Networks
  • the degree of background difference between each action sequence frame may be judged, and if the background difference degree is within the allowable range, image fusion is performed on each action sequence frame.
  • obtaining at least two frames of target images includes: determining the similarity between key frames in the video based on a pre-trained network; dividing the key frames into multiple groups according to the similarity; One of the grouped keyframes serves as the target image.
  • key frames mainly refer to frames that can reflect the key content or scene changes of the video, such as frames containing main characters in the video, frames belonging to highlight clips or classic clips, frames with obvious changes in the scene, and frames containing key actions of characters.
  • Frames, etc. can be used as keyframes.
  • VGG Computer vision geometry group
  • VGG19 VGG19 network is one of the structures of VGG network.
  • the angle between the two vectors can represent their similarity.
  • the feature vector of frame i is F i
  • the feature vector of frame j is F j
  • the similarity is expressed as: Among them, ⁇ > represents the inner product operation, and
  • the images in the video can be divided into several groups according to the similarity, and the group with the largest number of frames is selected as the target image to be fused.
  • the region where the removed instance is located in each frame of the target image may be filled.
  • the texture feature of the background part after removing the instance can be used to fill the area where the removed instance is located, so as to complete the background restoration of the frame of target image and obtain the filling result of the target image;
  • you can use the set image it can be all target images except this target image, or it can be part of target images or a set number of target images in all target images except this target image
  • the feature information of the background segmentation map of the target image fills the region of the removed instance in the target image, that is, the features in the background segmentation map of the set image are migrated and fused into the region where the instance of the target image is removed, and the target The padding result of the image.
  • the features in the filling results of all target images can be fused to generate a background image.
  • the background image is generated by synthesizing the filling results of each frame of the target image, fully reusing the features of all target images, and generating a high-quality background image.
  • the instances in each target image are separated from the background, and multiple backgrounds can be used to generate a unified background image; all instances can be fused in the background image to obtain a fused image.
  • instances can be cropped out from each target image and added to a background map generated from the padding results of all target images.
  • a single static image can be used to display instances and backgrounds in multiple frames of target images, which effectively reduces computing resources and storage space occupation.
  • operations such as cropping, scaling, rotating, and splicing can also be performed on the instance to be added.
  • the instances in each target image can also be arranged in the background image sequentially (for example, from left to right, or from right to left, etc.) according to time sequence, and each instance can also be placed in the background image
  • the arrangement position of is consistent with its relative position in the original target image, so that the fused image is visually closer to the shape of the instance in the original target image; or, it is also possible to make the instance in each target image appear in the background image Arrange freely.
  • the image fusion method can be used to extract the background image of any video based on temporal redundant information, and fuse instances in multiple target images into the background image.
  • the process can include:
  • Frame extraction extract multiple frames of images from the video according to the set number of frames per interval (such as 20 frames), and select key frames according to the image quality algorithm;
  • Scene clustering is performed according to the inter-frame similarity of the key frames, and the key frames in the class (ie, a group) containing the largest number of key frames are used as the target image;
  • Instance Segmentation Separate instances in each target image from the background
  • Image registration align the background segmentation images of all target images according to the affine transformation matrix
  • Two-stage algorithm fill and repair the regions where instances are removed in all target images to obtain the background image
  • Instance Fusion Add instances in all target images to the background image to obtain a fused image.
  • the degree of fusion between the instances in each of the target images and the background image decreases sequentially according to the time sequence of each of the target images.
  • FIG. 9 is a schematic diagram of a fused image in Embodiment 4 of the present disclosure.
  • the five person instances in the fused image may come from five target images, and the five target images may come from a video, which expresses a skateboard jumping action.
  • the instances in each target image can be arranged to the appropriate position in the background image.
  • five target images in the video are used to express the actions of character instances, and these target images need to be made into dynamic images, which requires a large amount of calculation and takes up a lot of space.
  • the image fusion in this embodiment The method, using the fused image, can effectively fuse the feature information of multiple target images, and use limited resources to display rich image content.
  • the five character instances in the fused image completed a skateboard jump from right to left, from take-off, airborne to landing, the timing of the character instance on the left is later, and the leftmost character instance corresponds The last target image, and the character instance on the left, the lower the degree of fusion with the background image, which can also be understood as the lower the transparency.
  • the time sequence of each instance which has the effect of visual persistence, making the displayed actions or behaviors more specific and clearer. vivid.
  • the method for generating the background image according to the padding results of all regions where the removed instances are located in the target image is determined according to any of the above-mentioned embodiments.
  • the image fusion method in this embodiment can display the rich features of multi-frame target images by using the fused images.
  • the background image is generated by synthesizing the filling results of each frame of target images, which fully reuses the features of all target images, and generates A high-quality background image can also improve the quality of the fused image.
  • FIG. 10 is a schematic structural diagram of a background image generation device in Embodiment 5 of the present disclosure. Please refer to the foregoing embodiments for details that are not exhaustive in this embodiment.
  • the device includes:
  • the segmentation module 610 is configured to perform instance segmentation on each frame of the target image in at least two frames of the target image, and obtain a background segmentation map corresponding to each frame of the target image without instances;
  • the filling module 620 is configured to, for each frame of the target image, fill in the area where the removed instance in the target image is located according to the background segmentation map of the set image, and obtain a filling result of the target image, wherein the set The image includes a target image different from the target image in the at least two frames of target images;
  • the generation module 630 is configured to generate a background image according to the filling results of all the target images.
  • the background image generating device of this embodiment fills the background segmentation image of each frame of target image, and synthesizes the filling results of each frame of target image to generate a background image, which fully reuses the characteristics of the background in all target images, so that the examples and The segmentation of the background is smoother, resulting in a high-quality background image.
  • the filling module 620 is set to:
  • the generating module 630 includes:
  • the expansion unit is configured to perform expansion processing on the region where the instance in each of the target images is located, to obtain an expanded region corresponding to each of the target images;
  • the repairing unit is configured to, for each frame of the target image, repair the dilated region corresponding to the target image according to the feature information of the corresponding region in the filling results of all the target images, and obtain the repair result of the target image;
  • a generation unit configured to generate the background image according to the restoration results of all the target images.
  • the repair unit is set as:
  • the generation unit is configured to average the restoration results of all the target images to obtain the background image.
  • the repair unit is set as:
  • the feature information of the area corresponding to the expansion area in the filling results of all the target images is averaged, and the averaged result is filled to the corresponding area of the target image Inflate the region to obtain the repair result of the target image in this iteration process;
  • the iteration is stopped, and the repair result of the target image in this iteration process is taken as the repair result of the target image.
  • the set conditions include:
  • the characteristic difference between the restoration result of the target image in this iteration process and the corresponding restoration result in the previous iteration process is within the allowable range.
  • the device also includes:
  • the calculation module is configured to select a frame of target image as a reference frame after obtaining the background segmentation map corresponding to each frame of target image, and determine all target images except the reference frame according to the feature point matching algorithm. an affine transformation matrix between the reference frames;
  • the alignment module is configured to align the background segmentation maps of all target images except the reference frame with the background segmentation maps of the reference frame according to the affine transformation matrix.
  • the above-mentioned background image generation device can execute the background image generation method provided by any embodiment of the present disclosure, and has corresponding functional modules and beneficial effects for executing the method.
  • FIG. 11 is a schematic structural diagram of a background image generation device in Embodiment 6 of the present disclosure. Please refer to the foregoing embodiments for details that are not exhaustive in this embodiment. As shown in Figure 11, the device includes:
  • An acquisition module 710 configured to acquire at least two frames of target images
  • the image fusion module 720 is configured to generate a background image according to the filling results of the regions where the removed instances are located in all the target images;
  • the fusion module 730 is configured to fuse all instances in the target image into the background image to obtain a fusion image.
  • the image fusion device of this embodiment can display the rich features of multi-frame target images by using the fused images.
  • the background image is generated by synthesizing the filling results of each frame of target images, which fully reuses the features of each target image, and generates A high-quality background image also improves the quality of the fusion image.
  • acquiring at least two frames of target images includes: identifying action sequence frames in the video based on an action recognition algorithm, and using the action sequence frames as the target images.
  • obtaining at least two frames of target images includes:
  • a key frame in a group with the largest number of key frames is used as the target image.
  • the degree of fusion between the instances in each of the target images and the background image decreases sequentially according to the time sequence of each of the target images.
  • the method for generating the background image according to the filling results of the regions where the removed instances are located in all the target images is determined according to any of the above embodiments.
  • the above image fusion device can execute the image fusion method provided by any embodiment of the present disclosure, and has corresponding functional modules and beneficial effects for executing the method.
  • FIG. 12 is a schematic diagram of a hardware structure of an electronic device in Embodiment 7 of the present disclosure.
  • FIG. 12 shows a schematic structural diagram of an electronic device 800 suitable for implementing the embodiments of the present disclosure.
  • the electronic device 800 in the embodiment of the present disclosure includes a computer, a notebook computer, a server, a tablet computer, or a smart phone, etc., which have an image processing function.
  • the electronic device 800 shown in FIG. 12 is merely an example.
  • the electronic device 800 may include at least one processing device (such as a central processing unit, a graphics processing unit, etc.) 801, which may be stored in a read-only memory (Read Only Memory, ROM) 802 according to the The program loaded into the random access memory (Random Access Memory, RAM) 803 by the device 808 executes various appropriate actions and processes.
  • At least one processing device 801 implements the background image generation and image fusion methods provided in the present disclosure.
  • various programs and data necessary for the operation of the electronic device 800 are also stored.
  • the processing device 801, the ROM 802, and the RAM 803 are connected to each other through a bus 805.
  • An input/output (Input/Output, I/O) interface 804 is also connected to the bus 805 .
  • an input device 806 including, for example, a touch screen, a touchpad, a keyboard, a mouse, a camera, a microphone, an accelerometer, a gyroscope, etc.; including, for example, a liquid crystal display (Liquid Crystal Display, LCD) , an output device 807 such as a speaker, a vibrator, etc.; a storage device 808 including, for example, a magnetic tape, a hard disk, etc., which is configured to store at least one program; and a communication device 809.
  • the communication means 809 may allow the electronic device 800 to communicate with other devices wirelessly or by wire to exchange data. While FIG. 12 shows electronic device 800 having various means, it is to be understood that implementing or having all of the means shown is not a requirement. More or fewer means may alternatively be implemented or provided.
  • embodiments of the present disclosure include a computer program product, which includes a computer program carried on a computer-readable medium, where the computer program includes program codes for executing the methods shown in the flowcharts.
  • the computer program may be downloaded and installed from a network via communication means 809, or from storage means 808, or from ROM 802.
  • the processing device 801 When the computer program is executed by the processing device 801, the above-mentioned functions defined in the methods of the embodiments of the present disclosure are performed.
  • the above-mentioned computer-readable medium in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or a combination of the above two.
  • Examples of computer-readable storage media may be electrical, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or a suitable combination of the above.
  • Examples of computer readable storage media may include: an electrical connection having at least one lead, a portable computer diskette, a hard disk, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (such as electronically programmable Programmable read-only memory (Electronic Programable Read Only Memory, EPROM) or flash memory), optical fiber, portable compact disk read-only memory (Compact Disc-Read Only Memory, CD-ROM), optical storage device, magnetic storage device, or the above-mentioned suitable The combination.
  • a computer-readable storage medium may be a tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying computer-readable program code therein. Such propagated data signals may take many forms, including electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium may also be a computer-readable medium other than a computer-readable storage medium, and the computer-readable signal medium may transmit, propagate, or transport a program for use by or in conjunction with an instruction execution system, apparatus, or device.
  • the program code contained on the computer-readable medium can be transmitted by an appropriate medium, including: electric wire, optical cable, radio frequency (Radio Frequency, RF), etc., or a suitable combination of the above.
  • the client and the server can communicate using currently known or future-developed network protocols such as Hyper Text Transfer Protocol (Hyper Text Transfer Protocol, HTTP), and can communicate with digital data in other forms or media
  • the communication eg, communication network interconnections.
  • Examples of communication networks include local area network (Local Area Network, LAN), wide area network (Wide Area Network, WAN), Internet (for example, Internet) and peer-to-peer network (for example, Ad hoc peer-to-peer network), and currently known or networks developed in the future.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device, or may exist independently without being incorporated into the electronic device.
  • the above-mentioned computer-readable medium carries at least one program, and when the above-mentioned at least one program is executed by the electronic device, the electronic device: performs instance segmentation on each frame of the target image in at least two frames of target images to obtain the corresponding The background segmentation map of the removed instance; for each frame of the target image, according to the background segmentation map of the set image, the region where the removed instance is located in the target image is filled, and the filling result of the target image is obtained, wherein the set The fixed image includes a target image different from the target image in the at least two frames of target images; and a background image is generated according to filling results of all the target images.
  • make the electronic device acquire at least two frames of the target image; generate a background image according to the filling results of the regions where the instances removed in all the target images are located; fuse the instances in all the target images into the background In the figure, a fused image is obtained.
  • Computer program code for carrying out the operations of the present disclosure may be written in at least one programming language, or a combination thereof, including object-oriented programming languages such as Java, Smalltalk, C++, and conventional procedural programming languages Design Language - such as "C" or similar programming language.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer may be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (e.g. via the Internet using an Internet Service Provider). .
  • LAN local area network
  • WAN wide area network
  • Internet Service Provider e.g. via the Internet using an Internet Service Provider.
  • each block in the flowchart or block diagram may represent a module, program segment, or part of code that contains at least one programmable logic function for implementing the specified logical function.
  • Execute instructions may also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments described in the present disclosure may be implemented by software or by hardware. It should be noted that the name of a unit does not constitute a limitation of the unit itself in some cases.
  • FPGAs Field-Programmable Gate Arrays
  • ASICs Application Specific Integrated Circuits
  • ASSPs Application Specific Standard Parts
  • SOC System on Chip
  • Complex Programmable Logic Device Complex Programmable Logic Device, CPLD
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may comprise an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a suitable combination of the foregoing. Examples of machine-readable storage media may include at least one wire-based electrical connection, a portable computer disk, a hard disk, Random Access Memory (RAM), Read Only Memory (ROM), Erasable Programmable Read Only Memory (EPROM or Flash). flash memory), optical fiber, compact disc read only memory (CD-ROM), optical storage, magnetic storage, or a suitable combination of the foregoing.
  • RAM Random Access Memory
  • ROM Read Only Memory
  • EPROM or Flash Erasable Programmable Read Only Memory
  • Example 1 provides a method for generating a background image, including:
  • the region where the removed instance is located in the target image is filled to obtain the filling result of the target image, wherein the setting image includes the at least two frames a target image in the target images that is different from the target image;
  • a background image is generated according to the filling results of all the target images.
  • Example 2 According to the method described in Example 1, for each frame of the target image, fill in the region where the removed instance in the target image is located according to the background segmentation map of the set image, and obtain the filling result of the target image, including :
  • Example 3 According to the method described in Example 1, the background image is generated according to the filling results of all the target images, including:
  • the corresponding expansion area of the target image is repaired, and the repair result of the target image is obtained;
  • the background image is generated according to the inpainting results of all the target images.
  • Example 4 According to the method described in Example 3, the said expansion area corresponding to the target image is repaired according to the feature information of the corresponding area in the filling results of all the target images, and the repair result of the target image is obtained, including:
  • Example 5 According to the method described in Example 3, the background image is generated according to the repair results of all the target images, including:
  • the restoration results of all the target images are averaged to obtain the background image.
  • Example 6 According to the method described in Example 3, for each frame of the target image, according to the feature information of the corresponding area in the filling results of all the target images, the expansion area corresponding to the target image is repaired, and the repair of the target image is obtained. Results, including:
  • the feature information of the area corresponding to the expansion area in the filling results of all the target images is averaged, and the averaged result is filled to the corresponding area of the target image Inflate the region to obtain the repair result of the target image in this iteration process;
  • Example 7 According to the method described in Example 6, the setting conditions include:
  • the characteristic difference between the restoration result of the target image in this iteration process and the corresponding restoration result in the previous iteration process is within the allowable range.
  • Example 8 According to the method described in Example 1, after obtaining the background segmentation map of the removal instance corresponding to each frame of the target image, it also includes:
  • the background segmentation maps of all the target images except the reference frame are aligned with the background segmentation maps of the reference frame.
  • Example 9 provides an image fusion method, including:
  • Example 10 According to the method described in Example 9, the acquisition of at least two frames of target images includes:
  • An action sequence frame in the video is identified based on an action recognition algorithm, and the action sequence frame is used as the target image.
  • Example 11 According to the method described in Example 9, the acquisition of at least two frames of target images includes:
  • a key frame in a group with the largest number of key frames is used as the target image.
  • Example 12 According to the method described in Example 9, the degree of fusion between the instances in each of the target images and the background image decreases sequentially according to the time sequence of each of the target images.
  • Example 13 According to the method described in Example 9, the method of generating the background image according to the filling results of the regions where all the removed instances in the target image are located is determined according to any one of Examples 1-8.
  • Example 14 provides a device for generating a background image, including:
  • the segmentation module is configured to perform instance segmentation on each frame of the target image in at least two frames of the target image, and obtain a background segmentation map corresponding to each frame of the target image without instances;
  • the filling module is configured to, for each frame of the target image, fill in the area where the removed instance in the target image is located according to the background segmentation map of the set image, and obtain a filling result of the target image, wherein the set image including a target image different from the target image in the at least two frames of target images;
  • the generation module is configured to generate a background image according to the filling results of all the target images.
  • Example 15 provides an image fusion device, including:
  • an acquisition module configured to acquire at least two frames of target images
  • the background image generation module is configured to generate a background image according to the filling results of the regions where the removed instances are located in all the target images;
  • the fusion module is configured to fuse all instances in the target image into the background image to obtain a fusion image.
  • Example 16 provides an electronic device, comprising:
  • a storage device configured to store a program
  • the processor When the program is executed by the processor, the processor implements the method for generating a background image as described in any one of Examples 1-8, or the method for image fusion as described in any one of Examples 9-13 .
  • Example 17 provides a computer-readable medium, on which a computer program is stored, and when the computer program is executed by a processor, the implementation as in Examples 1-8 is implemented.

Abstract

本公开公开了一种背景图生成、图像融合方法、装置、电子设备及可读介质。该背景图生成方法包括:对至少两帧目标图像中的每帧目标图像进行实例分割,得到每帧目标图像对应的去除实例的背景分割图;对于每帧目标图像,根据设定图像的背景分割图填补目标图像中被去除的实例所在的区域,得到目标图像的填补结果,其中,设定图像包括至少两帧目标图像中与目标图像不同的目标图像;根据所有目标图像的填补结果生成背景图。

Description

背景图生成、图像融合方法、装置、电子设备及可读介质
本公开要求在2021年10月9日提交中国专利局、申请号为202111175973.5的中国专利申请的优先权,该申请的全部内容通过引用结合在本公开中。
技术领域
本公开实施例涉及图像处理技术领域,例如涉及一种背景图生成、图像融合方法、装置、电子设备及可读介质。
背景技术
图像融合是指对采集到的相关图像进行图像处理,以最大限度的提取图像中的有利信息从而得到一张综合性的图像。例如,在为一个视频生成封面时,可以对视频中的多帧图像进行融合,以得到高质量的、能够反映视频中关键内容的融合图像作为封面。在将多帧图像融合为一帧图像的过程中,通常需要生成一个统一的背景图。
相关技术中,生成背景图的方法主要是通过大量的数据来平滑前景目标(噪声点),即,对所有图像取平均得到背景图。这种方法的局限性较强,需要各帧图像都对应于相同的视角,并且需要有足够多的图像才能够保证平滑的效果。然而,在实际应用中,多帧图像不一定在相同视角,视频中的场景复杂多变,采用该方法生成的背景图平滑效果差,尤其是在实例与背景交接的区域,容易失真或变形,无法保证生成的背景图的质量。
发明内容
本公开提供了一种背景图生成、图像融合方法、装置、电子设备及可读介质,以生成高质量的背景图。
第一方面,本公开实施例提供一种背景图生成方法,包括:
对至少两帧目标图像中的每帧目标图像进行实例分割,得到每帧目标图像对应的去除实例的背景分割图;
对于每帧目标图像,根据设定图像的背景分割图填补所述目标图像中被去除的实例所在的区域,得到所述目标图像的填补结果,其中,所述设定图像包括所述至少两帧目标图像中与所述目标图像不同的目标图像;
根据所有所述目标图像的填补结果生成背景图。
第二方面,本公开实施例还提供了一种图像融合方法,包括:
获取至少两帧目标图像;
根据对于所有所述目标图像中被去除的实例所在的区域的填补结果生成背 景图;
将所有所述目标图像中的实例融合在所述背景图中,得到融合图像。
第三方面,本公开实施例还提供了一种背景图生成装置,包括:
分割模块,被设置为对至少两帧目标图像中的每帧目标图像进行实例分割,得到每帧目标图像对应的去除实例的背景分割图;
填补模块,被设置为对于每帧目标图像,根据设定图像的背景分割图填补所述目标图像中被去除的实例所在的区域,得到所述目标图像的填补结果,其中,所述设定图像包括所述至少两帧目标图像中与所述目标图像不同的目标图像;
生成模块,被设置为根据所有所述目标图像的填补结果生成背景图。
第四方面,本公开实施例还提供了一种融合装置,包括:
获取模块,被设置为获取至少两帧目标图像;
背景图生成模块,被设置为根据对于所有所述目标图像中被去除的实例所在的区域的填补结果生成背景图;
融合模块,被设置为将所有所述目标图像中的实例融合在所述背景图中,得到融合图像。
第五方面,本公开实施例还提供了一种电子设备,包括:
处理器;
存储装置,被设置为存储程序;
在所述程序被所述处理器执行时,所述处理器实现如第一方面所述的背景图生成方法或如第二方面所述的图像融合方法。
第六方面,本公开实施例还提供了一种计算机可读介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如第一方面所述的背景图生成方法或如第二方面所述的图像融合方法。
附图说明
图1是本公开实施例一中的背景图生成方法的流程图;
图2是本公开实施例二中的背景图生成方法的流程图;
图3是本公开实施例二中的对目标图像中被去除的实例所在的区域进行填补的示意图;
图4是本公开实施例三中的背景图生成方法的流程图;
图5是本公开实施例三中的目标图像中的实例对应的膨胀区域的示意图;
图6是本公开实施例三中的根据所有目标图像的填补结果得到每个目标图 像的修复结果的流程图;
图7是本公开实施例三中的根据目标图像生成的背景图的示意图;
图8是本公开实施例四中的图像融合方法的流程图;
图9是本公开实施例四中的融合图像的示意图;
图10是本公开实施例五中的背景图生成装置的结构示意图;
图11是本公开实施例六中的背景图生成装置的结构示意图;
图12是本公开实施例七中的电子设备的硬件结构示意图。
具体实施方式
下面将参照附图描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用。
应当理解,本公开的方法实施方式中记载的各个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。
需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。
下述实施例中,每个实施例中同时提供了可选特征和示例,不应将每个编号的实施例仅视为一个技术方案。
实施例一
图1是本公开实施例一中的背景图生成方法的流程图。该方法可适用于根据多帧图像提取背景图的情况,例如,该方法通过将多帧图像的背景分割图中的特征互相融合,得到背景图。该方法可以由背景图生成装置来执行,该装置可由软件和/或硬件实现,并集成在电子设备上。本实施例中的电子设备可以是计算机、笔记本电脑、服务器、平板电脑或智能手机等具有图像处理功能的设备。
如图1所示,本公开实施例一中的背景图生成方法,包括如下步骤:
S110、对至少两帧目标图像中的每帧目标图像进行实例分割,得到每帧目标图像对应的去除实例的背景分割图。
本实施例中,目标图像主要指包含用于生成背景图的特征信息的图像。将所有目标图像中关于背景的特征信息融合在一起,可以生成一张统一的背景图。所有目标图像中的背景是针对相同或相近场景的,但视角可以有差别。此外,目标图像中可包括实例(如人物、车辆等)和背景,每个目标图像中同一实例的位置可以不同。
实例分割的主要目的是,识别目标图像中的实例,并将目标图像中的实例与背景分离,去除实例后剩余的部分即为背景分割图。可选的,采用基于位置和尺寸单独分离实例(Seperate Object instances by Location and sizes,SOLO)算法对目标图像进行实例分割,例如可以通过SOLOv2(SOLOv2是基于SOLO的改进)算法按照位置和尺寸分割实例,具有较高的精度,并且兼具实时性,能够提高生成背景图的效率。
S120、对于每帧目标图像,根据设定图像的背景分割图填补所述目标图像中被去除的实例所在的区域,得到所述目标图像的填补结果,其中,所述设定图像包括所述至少两帧目标图像中与所述目标图像不同的目标图像。
本实施例中,对于任意一帧目标图像,其中被去除的实例所在的区域是没有特征的,可以使用设定图像(如设定图像可以是除该目标图像以外所有的目标图像,也可以是除该目标图像以外所有的目标图像中的部分目标图像或设定数量的目标图像)的背景分割图填补该目标图像中被去除实例的区域,得到该目标图像的填补结果。在填补过程中,设定图像的背景分割图中的特征被迁移并且融合到目标图像被去除的实例所在的区域中。在此基础上,可以综合所有目标图像的填补结果得到背景图。
可以理解的是,设定图像的背景分割图中,用于填补目标图像中被去除的实例所在的区域的特征所在的区域,与目标图像中被去除的实例所在的区域相对应。例如,目标图像中被去除的实例所在的区域为左上角尺寸为A*A的区域,则可以使用设定图像的背景分割图中左上角尺寸为A*A的区域内的特征信息来填补目标图像中被去除的实例所在的区域。需要说明的是,以下实施例中所述的相应区域,除有特别限定的以外,均是指设定图像的背景分割图中,与目标图像中被去除的实例所在的区域相应的区域。
可选的,根据设定图像的背景分割图填补目标图像中被去除的实例所在的区域,可以是:对设定图像的背景分割图(主要是设定图像的背景分割图中的相应区域)的特征信息取平均,然后用取平均的结果填补目标图像中被去除的实例所在的区域。例如,设定图像的背景分割图有X个(X大于或等于1),记为B 1,B 2,…,B X,对设定图像的背景分割图的特征取平均可以表示为:
Figure PCTCN2022119181-appb-000001
该取平均的结果B中相应区域的特征即可用于填补目标图像中被去除的实例所在的区域。
根据设定图像的背景分割图填补目标图像中被去除的实例所在的区域,还可以是:将目标图像中被去除的实例所在的区域平均或者随机分为N份;然后分别利用每个设定图像的背景分割图(主要是设定图像的背景分割图中的相应区域)的特征信息,填补其中的一份。例如,将目标图像中被去除的实例所在的区域分为A1和A2,则可以利用设定图像1的背景分割图1在相应区域的特征填补A1,利用设定图像2的背景分割图2在相应区域的特征填补A2。需要注意的是,由于设定图像的背景分割图也是被去除了实例的,为了保证填补的有效性,在将目标图像中被去除的实例所在的区域分给不同的设定图像的过程中,需要保证所分到的设定图像的背景分割图在相应区域内是有特征的,而不是完全空白的区域。例如,保证设定图像1的背景分割图1中,A1的相应区域是有背景部分的内容的,而不能完全是被去除实例的空白区域。
根据设定图像的背景分割图填补目标图像中被去除的实例所在的区域,还可以是:依次用每个设定图像的背景分割图(主要是设定图像的背景分割图中的相应区域)的特征信息填补目标图像中被去除的实例所在的区域,直至目标图像中被去除的实例所在的区域被完全填补,或者,所有的设定图像都已被用于填补。例如,对于目标图像中被去除的实例所在的区域,先利用设定图像1的背景分割图1相应区域的特征进行填补,但由于设定图像1的背景分割图1也是被去除了实例的,如果被去除的实例位于该相应区域内,则该相应区域内有一部分特征是空缺的,因此将该相应区域的特征填补至目标图像中被去除的实例所在的区域后,目标图像中被去除的实例所在的区域中仍然存在空缺,这种情况下,可再利用设定图像2的背景分割图2相应区域的特征进行填补,以此类推,直至目标图像中被去除的实例所在的区域被完全填补,或者,所有的设定图像都已被用于填补。
根据设定图像的背景分割图填补目标图像中被去除的实例所在的区域,还可以是:对于目标图像中被去除的实例所在的区域,将所有设定图像的背景分割图(主要是设定图像的背景分割图中的相应区域)的公共区域的特征取平均并填补目标图像中被去除的实例所在的区域,然后对于剩余的未被填补的区域,再利用所有设定图像的背景分割图的特征信息填补。例如,目标图像中被去除的实例所在的区域为左上角尺寸为A*A的区域,设定图像1至设定图像N在左上角尺寸为A*A的区域内都包含尺寸为A’*A’的公共区域,A’小于A,则可以对所有设定图像左上角尺寸为A’*A’的公共区域内的特征信息取平均并填补至目标图像中的A*A区域内,对于目标图像中A*A区域中除去A’*A’的区域剩余的部分,可参见上述任意的填补过程利用所有设定图像的背景分割图共同填补。
S130、根据所有所述目标图像的填补结果生成背景图。
本实施例中,将所有目标图像的填补结果中的特征融合,生成背景图。例如,可以对所有目标图像的填补结果取平均得到背景图,以充分复用所有目标图像中背景的特征。
可选的,生成背景图的过程可以分为两个阶段,在第一阶段中,对于每帧目标图像中被去除实例的区域,都可以采用设定图像的背景分割图来填补,得到每帧目标图像对应的填补结果,填补结果可以理解为一种粗略的背景图;在第二阶段中,根据所有目标图像的填补结果生成背景图,该阶段可以理解为对粗略背景图的修复过程,可优化所有目标图像中背景的特征,得到的背景图更为精细。例如,可以对所有目标图像的填补结果取平均,得到背景图;或者,为了使实例与背景的分割处更平滑,还可以对每个目标图像的实例所在的区域进行膨胀处理,然后针对膨胀后的区域,再进行第二轮填补或者取平均等操作,从而融合所有填补结果的特征,得到高质量的背景图。
本实施例中的背景图生成方法,对于每帧目标图像被去除的实例所在的区域,都使用设定图像的背景分割图进行填补,并综合各帧目标图像的填补结果生成背景图,充分复用了所有目标图像中背景的特征,使实例与背景的分割处更平滑,从而生成高质量的背景图。
实施例二
图2是本公开实施例二中的背景图生成方法的流程图。本实施例二在上述实施例的基础上,对根据设定图像的背景分割图对目标图像被去除的实例所在的区域进行填补的过程进行说明。未在本实施例中详尽描述的技术特征可参见上述任意实施例。
本实施例中,对于每帧目标图像,根据设定图像的背景分割图填补目标图像中被去除的实例所在的区域,得到目标图像的填补结果,包括:对于每帧目标图像,依次根据每个设定图像的背景分割图中相应区域的特征信息填补目标图像中被去除的实例所在的区域,直至根据最后一个设定图像的背景分割图中相应区域的特征信息的填补操作完成,或者,直至目标图像中被去除的实例所在的区域被完全填补,得到目标图像的填补结果。在此基础上,可以最大限度的利用所有背景分割图的特征,并且高效地生成高质量的背景图。
如图2所示,本公开实施例二中的背景图生成方法,包括如下步骤:
S210、对至少两帧目标图像中的每帧目标图像进行实例分割,得到每帧目标图像对应的去除实例的背景分割图。
S220、对于当前目标图像,根据当前设定图像的背景分割图中相应区域的特征信息填补当前目标图像中被去除的实例所在的区域。
图3是本公开实施例二中的对目标图像中被去除的实例所在的区域进行填补的示意图。如图3所示,假设共有N(N为大于2的整数)个目标图像,每 个目标图像中的空白人物形状的区域表示去除的人物实例所在的区域,该人物实例在不同的目标图像中的位置或动作可能不同。目标图像1中去除人物实例后的背景分割图中的特征信息用网格表示;目标图像2中去除人物实例后的背景分割图中的特征信息用斜线表示;目标图像N-1中去除人物实例后的背景分割图中的特征信息用点状纹理表示;目标图像N中去除人物实例后的背景分割图中的特征信息用竖线表示。
以对目标图像1(即当前目标图像)中去除人物实例的区域进行填补为例,在目标图像2(即当前设定图像)的背景分割图中,虚线所示的人物形状即为相应区域,此区域内的斜线所表示的特征信息可以用来填补目标图像1中去除人物实例后的区域,但显然,目标图像2的背景分割图中虚线所示的人物形状中也包含了一部分空白(是由于目标图像2中的人物实例也被去除造成的),因此,只利用目标图像2的背景分割图中相应区域内的特征信息并不能完全填补目标图像1中去除人物实例后的区域,则可以继续采用下一个目标图像的背景分割图中相应区域的特征信息来填补;假设下一个设定图像为目标图像N-1,则目标图像N-1的背景分割图中虚线所示的人物形状内的点状纹理所表示的特征,可用来继续填补目标图像1中去除人物实例后的区域;但仍不能完全填补,因此还需要利用目标图像N的背景分割图中虚线所示的人物形状内的竖线所表示的特征信息,继续填补目标图像1中去除人物实例后的区域,至此可得到目标图像1的填补结果。在填补结果中,斜线部分的特征信息来自于目标图像2的背景分割图的相应区域,点状部分的特征信息来自于目标图像N-1的背景分割图的相应区域,竖线部分的特征信息来自于目标图像N的背景分割图的相应区域。
基于类似的原理,可以得到目标图像2至N的填补结果。在此基础上可以根据所有目标图像的填补结果生成背景图。
S230、判断当前目标图像中被去除的实例所在的区域是否被完全填补,基于当前目标图像中被去除的实例所在的区域被完全填补的判断结果,执行S250;基于当前目标图像中被去除的实例所在的区域没有被完全填补的判断结果,执行S240。
本实施例中,如果利用当前设定图像的背景分割图的相应区域的特征信息进行填补后,当前目标图像中去除人物实例后的区域被完全填补,则可结束对当前目标图像的填补操作,得到当前目标图像的填补结果,而无需再采用后续的设定图像的背景分割图进行填补;如果利用当前设定图像的背景分割图的相应区域的特征信息进行填补后,当前目标图像中去除人物实例后的区域还未被完全填补,这种情况下,可以确定是否还有设定图像的背景分割图还未被用于填补。
S240、判断当前设定图像是否为最后一个设定图像,基于当前设定图像为最后一个设定图像的判断结果,执行S250;基于当前设定图像不为最后一个设 定图像的判断结果,执行S290。
本实施例中,如果还有设定图像的背景分割图未被用于填补(即当前设定图像不是最后一个设定图像),则可以继续采用下一个设定图像的背景分割图进行填补,直至采用最后一个设定图像的背景分割图相应区域的特征信息完成填补后,此时无论是否能够完全填补,都可以结束对当前目标图像的填补操作,得到当前目标图像的填补结果。
S250、得到所述目标图像的填补结果。
S260、判断当前目标图像是否为最后一个目标图像,基于当前目标图像为最后一个目标图像的判断结果,执行S280;基于当前目标图像不为最后一个目标图像的判断结果,执行S270。
S270、将下一个目标图像作为当前目标图像。
S280、根据所有目标图像的填补结果生成背景图。
S290、将下一个设定图像作为当前设定图像。
本实施例中的背景图生成方法,通过对每帧目标图像进行实例分割,对于每帧目标图像,依次根据每个设定图像的背景分割图中相应区域的特征信息填补目标图像中被去除的实例所在的区域,可以最大限度的利用每个背景分割图的特征,并且高效地生成高质量的背景图;在此基础上,利用每个目标图像的背景分割图生成背景图,可综合所有目标图像的背景部分的特征,保证背景图与所有目标图像的背景的一致性,生成高质量的背景图。
实施例三
图4是本公开实施例三中的背景图生成方法的流程图。本实施例三在上述实施例的基础上,对根据所有目标图像的填补结果生成背景图的过程进行说明。未在本实施例中详尽描述的技术特征可参见上述任意实施例。
本实施例中,生成背景图的过程可以分为两个阶段,在第一阶段中,对于每帧目标图像中被去除实例的区域,都可以采用设定图像的背景分割图来填补,得到每帧目标图像对应的填补结果;在第二阶段中,根据所有目标图像的填补结果生成背景图。
本实施例中,根据所有目标图像的填补结果生成背景图,包括:对每个目标图像中的实例所在的区域进行膨胀处理,得到每个目标图像对应的膨胀区域;对于每帧目标图像,根据所有目标图像的填补结果中相应区域的特征信息修复目标图像对应的膨胀区域,得到目标图像的修复结果;根据所有目标图像的修复结果生成背景图。在此基础上,对所有目标图像的填补结果进行修复,可以对实例边缘作平滑处理,得到精度更高的背景图。
如图4所示,本公开实施例三中的背景图生成方法,包括如下步骤:
S310、对至少两帧目标图像中的每帧目标图像进行实例分割,得到每帧目标图像对应的去除实例的背景分割图。
S320、对于每帧目标图像,根据设定图像的背景分割图填补目标图像中被去除的实例所在的区域,得到目标图像的填补结果。
S330、对每个目标图像中的实例所在的区域进行膨胀处理,得到每个目标图像对应的膨胀区域。
本实施例中,对每个目标图像中的实例进行膨胀处理,可以理解为在实例的边缘添加像素值,使得实例整体的像素区域扩张,以使膨胀区域尽可能包含不易修复的实例边缘。添加像素值可以通过卷积模板或卷积核实现。
图5是本公开实施例三中的目标图像中的实例对应的膨胀区域的示意图。如图5所示,加粗虚线所示的人物形状的区域即为对目标图像中的实例所在的区域进行膨胀处理后得到的膨胀区域,该膨胀区域应尽可能大于原实例所在的区域(斜线、点状纹理以及竖线所构成的人物形状的区域),原实例的边缘应包含在膨胀区域内。在此基础上,对于每个目标图像对应的膨胀区域,都可以利用在第一阶段中得到的所有填补结果进行修复,使实例边缘更平滑。
S340、对于每帧目标图像,根据所有目标图像的填补结果中相应区域的特征信息修复目标图像对应的膨胀区域,得到目标图像的修复结果。
可选的,本实施例中,对于每帧目标图像,利用所有目标图像的填补结果对其膨胀区域进行修复,包括:对所有目标图像(包括当前被修复的目标图像以及其他目标图像)的填补结果中与膨胀区域对应的区域的特征信息取平均,将取平均的结果填补至目标图像对应的膨胀区域,从而得到该目标图像的修复结果。
在一些实施例中,对于每帧目标图像,利用所有目标图像的填补结果对其膨胀区域进行修复,还可以是类似第一阶段的填补操作,例如,使用其他目标图像的填补结果中与该膨胀区域相应区域的特征信息再次填补该膨胀区域,例如可以是将其他目标图像的填补结果中与该膨胀区域相应区域的特征取平均后再次填补,也可以是将该膨胀区域平均或者随机分为若干份,然后分别利用其他的每个目标图像的填补结果中与该膨胀区域相应区域的特征,分别填补其中的一份等。在此基础上,可以得到目标图像的修复结果。
S350、根据所有目标图像的修复结果生成背景图。
例如,可以根据需求从所有目标图像的修复结果中选取一张图像质量最高的修复结果作为背景图。
可选的,本实施例中,根据所有目标图像的修复结果生成背景图,包括:对所有目标图像的修复结果取平均,得到背景图。在此基础上,对实例的边缘,可充分利用其他目标图像的特征信息进行平滑处理。
在一实施例中,对于每帧目标图像,根据所有目标图像的填补结果中相应 区域的特征信息修复目标图像对应的膨胀区域,得到目标图像的修复结果,包括:在每次迭代过程中,对于每帧目标图像,对所有目标图像的填补结果中与膨胀区域对应的区域的特征信息取平均,将取平均的结果填补至目标图像对应的膨胀区域,得到目标图像在本次迭代过程中的修复结果;基于目标图像在本次迭代过程中的修复结果不满足设定条件的判断结果,进入下一次迭代过程;基于目标图像在本次迭代过程中的修复结果满足设定条件的判断结果,停止迭代,将目标图像在本次迭代过程中的修复结果作为目标图像的修复结果。
本实施例中,第二阶段的修复操作可以迭代执行多次,直至满足设定条件,例如,设定条件为:任意一个目标图像的填补结果在本次迭代中得到的修复结果与在上一次迭代的修复结果的特征差异在允许范围内,则可以停止迭代,此时每个目标图像对应的修复结果已经充分融合了所有填补结果中的特征信息,且边缘过渡平滑,精度更高,可以得到更高质量的修复结果。
图6是本公开实施例三中的根据所有目标图像的填补结果得到每个目标图像的修复结果的流程图。如图6所示,根据所有目标图像的填补结果得到每个目标图像的修复结果,包括:
S410、对每个目标图像中的实例所在的区域进行膨胀处理,得到每个目标图像对应的膨胀区域。
S420、对于当前目标图像,对所有目标图像的填补结果中与膨胀区域对应的区域的特征信息取平均,将取平均的结果填补至当前目标图像对应的膨胀区域,得到当前目标图像在本次迭代过程中的修复结果。
S430、判断当前目标图像在本次迭代过程中的修复结果是否满足设定条件,基于当前目标图像在本次迭代过程中的修复结果满足设定条件的判断结果,执行S460,得到每个目标图像最终的修复结果;基于当前目标图像在本次迭代过程中的修复结果不满足设定条件的判断结果,执行S440。
可选的,设定条件包括:目标图像在本次迭代过程中的修复结果与在上一次迭代过程对应的修复结果的特征差异在允许范围内。
在一些实施例中,设定条件也可以为迭代次数达到指定次数,或者迭代时长达到指定时长等。
S440、判断当前目标图像是否为最后一个目标图像,基于当前目标图像为最后一个目标图像的判断结果,执行S460;基于当前目标图像不为最后一个目标图像的判断结果,执行S450。
S450、将下一个目标图像作为当前目标图像。
S460、进入下一次迭代。
本实施例中,如果当前目标图像在本次迭代得到的修复结果与上一次迭代得到的修复结果的误差较小,则可以停止迭代,此时,每个目标图像在最后一次被修复时得到的修复结果,都可以作为最终的修复结果;如果误差较大,则 不停止迭代,并且,如果还有目标图像的填补结果没有在本次迭代过程中被修复,则可以继续选择下一个目标图像作为当前目标图像,对其填补结果进行修复;如果当前目标图像为最后一个目标图像,即本次迭代过程中每个目标图像的填补结果都已被修复,则本次迭代过程完成,进入下一次迭代。
示例性的,对所有目标图像的填补结果迭代修复的过程包括:
假设共有N个目标图像,对应N个填补结果;
在第1次迭代中,对于第一阶段得到的目标图像j的填补结果B j(1≤j≤N)中的膨胀区域,对所有目标图像的填补结果B 1、B 2,...,B N中与该膨胀区域相应区域的特征信息取平均并填补至该膨胀区域,以修复B j中的膨胀区域,得到目标图像j在第1次迭代中的修复结果B j1
然后进入第2次迭代,同样的,对于目标图像j的修复结果B j1中的膨胀区域,对B 1、B 2,...,B N中与该膨胀区域相应区域的特征信息取平均并填补至该膨胀区域内,以修复B j1中的膨胀区域,得到目标图像j在第2次迭代中的修复结果B j2
以此类推,直至在一次迭代过程中,任意一个目标图像的修复结果与上一次迭代的修复结果的特征差异在允许范围内,则停止迭代,然后,对此时所有目标图像的修复结果取平均,得到背景图。
需要说明的是,在停止修复操作时,每个目标图像被修复的次数可能不同。例如,在第1次迭代过程中,对10帧目标图像的填补结果依次进行修复,在第2次迭代过程中,修复到第3帧目标图像时,第3帧目标图像在第2次迭代得到的修复结果与其在第1次迭代得到的修复结果的误差已经较小,则可以停止迭代,此时,第1-3帧目标图像的修复结果其实经过两次迭代修复,而第4-10帧目标图像的填补结果其实经过一次迭代修复。
在一些实施例中,也不排除在满足设定条件时,需要等待所有目标图像的填补结果都在本次迭代过程中被修复之后,再停止修复操作。这种情况下,在停止修复操作时,所有目标图像被修复的次数是相同的。
需要说明的是,在第一阶段中得到的填补结果其实是粗略背景图,第二阶段中的修复操作可以提高填补的精度,膨胀区域内不正确的像素值会被正确的像素值逐渐修复,而在实例以外背景部分的正确像素值并不会随着迭代而改变,保证生成的背景图充分综合了所有目标图像的特征信息,且边缘的处理效果更好,实例与背景的过渡更自然。
本实施例提供的两阶段算法(填补和修复)的原理如下:
假设有N帧目标图像,记为I 1,I 2,…,I N,各帧目标图像中的实例的掩码分别记为M 1,M 2,…,M N,则通过如下公式可得到第i(i=1,2,…,N)帧目标图像所对应的粗略的填补结果B i
Figure PCTCN2022119181-appb-000002
B i=B i,N
其中,B i,k表示用第k帧目标图像的背景分割图填补第i帧目标图像中去除的实例的区域得到的填补结果,F i,k表示用第k帧目标图像的背景分割图填补第i帧目标图像中去除的实例的区域后,该区域中还剩余的区域(即还需要继续利用其他帧的背景分割图进行填补的区域),其中,k不等于i,k=1,2,…,N,
Figure PCTCN2022119181-appb-000003
表示逐元素矩阵乘法运算。
由于实例分割算法在实例轮廓边缘往往不够准确,因此还需要精细修复。将各帧目标图像中的实例对应的膨胀区域的掩码分别记为
Figure PCTCN2022119181-appb-000004
各帧目标图像的填补结果分别记为B 1,B 2,...,B N,则可以按照如下公式得到第i帧目标图像的修复结果:
Figure PCTCN2022119181-appb-000005
当第m次迭代满足如下设定条件,结束迭代,停止修复操作:
Figure PCTCN2022119181-appb-000006
其中,Mean(·)表示求矩阵平均值函数。
迭代结束后,将所有目标图像的修复结果取平均,得到最终的背景图:
Figure PCTCN2022119181-appb-000007
可选的,在得到每帧目标图像对应的去除实例的背景分割图之后,还包括:选取一帧目标图像作为参考帧,根据特征点匹配算法确定除所述参考帧以外的所有所述目标图像与所述参考帧之间的仿射变换矩阵;根据所述仿射变换矩阵,将除所述参考帧以外的所有所述目标图像的背景分割图与所述参考帧的背景分割图对齐。
本实施例中,所有目标图像虽然是对于同一场景的,但由于拍摄角度不同、存在抖动或误差等,每个目标图像的背景并不是完全对齐的,直接根据所有目标图像的背景分割图生成背景图,会存在局部的失真、变形或模糊等,影响背景图的准确性和视觉效果。为此,在根据所有目标图像的背景分割图生成背景图之前,可以选取一帧目标图像作为参考帧,并使其他所有目标图像的背景分割图与该参考帧对齐。示例性的,参考帧可以是图像质量最高的目标图像、首个目标图像、最后一帧目标图像或者位于中间的目标图像等。
例如,根据特征点匹配算法确定所有目标图像与参考帧之间的仿射变换矩阵,仿射变换矩阵用于描述相匹配的特征点由目标图像到参考帧中的变换关系,仿射变换包括线性变换和平移变换。特征点匹配算法可以是尺度不变特征变换(Scale-invariant Feature Transform,SIFT)算法。示例性的,首先提取每个目标图像的背景部分关键的特征点,这些关键的特征点不会因光照、尺度、旋转等因素而消失,然后,根据每个关键点的特征向量对目标图像与参考帧中的关键点进行两两比较,找出目标图像与参考帧之间相互匹配的若干对特征点,从而 建立特征点的对应关系,得到仿射变换矩阵。可选的,如果一帧目标图像中,可供配准的关键的特征点数量少于设定阈值,也可以抛弃该帧目标图像。
图7是本公开实施例三中的根据目标图像生成的背景图的示意图。如图7所示,将多帧目标图像配准(对齐)后,去除其中的实例,利用背景部分的特征信息,经过两阶段算法(即填补和修复操作),可得到高质量的背景图,其能充分保留原每个目标图像中背景部分的特征,且处理实例边缘的平滑效果较好。
本实施例中的背景图生成方法,通过选取一帧目标图像作为参考帧,并使其他所有目标图像的背景分割图与该参考帧对齐,提高生成背景图的准确性和图像质量;通过在第一阶段中得到所有目标图像的粗略背景图,在第二阶段中对实例的膨胀区域进行迭代修复以融合所有填补结果的特征,使得生成的背景图充分复用了所有目标图像的背景分割图的特征信息,且对实例边缘的处理效果更平滑,实例与背景的过渡更自然,提高背景图的质量。
实施例四
图8是本公开实施例四中的图像融合方法的流程图。该方法可适用于将多帧图像融合为一张图像的情况,例如,根据多帧图像生成统一的背景图,并将各帧图像中的实例融合在生成的背景图中。该方法的应用场景可以是,从视频中提取多帧图像,并根据提取的多帧图像生成一张融合图像,作为该视频的封面;还可以是,根据一组图像生成一张融合图像,作为这组图像的标识或文件夹图标,或者可得到能够反映这组图像主要内容的缩略图等。该方法可以由图像融合装置来执行,该装置可由软件和/或硬件实现,并集成在电子设备上。本实施例中的电子设备可以是计算机、笔记本电脑、服务器、平板电脑或智能手机等具有图像处理功能的设备。需要说明的是,未在本实施例中详尽描述的技术细节可参见上述任意实施例。
如图8所示,本公开实施例一中的背景图生成方法,包括如下步骤:
S510、获取至少两帧目标图像。
本实施例中,目标图像主要指包含背景特征的图像,将所有目标图像中的背景特征融合在一起,可以提取一张统一的背景图。所有目标图像中的背景是针对同一场景的,但视角可以有差别。此外,目标图像中可以有实例部分和背景部分,每个目标图像中同一实例的位置可以不同。目标图像可以是从电子设备中读取的,或者是从数据库中下载的,可以是连拍的多帧图像,也可以是从视频中提取的多帧图像等。
可选的,获取至少两帧目标图像包括:基于动作识别算法识别视频中的动作序列帧,将动作序列帧作为目标图像。
在一实施例,利用动作识别算法,可以从视频中识别有效的动作序列帧, 每个动作序列帧中的实例(以人物为例)按照时序顺序连贯起来可以表达出一个完整的动作或行为,这些动作序列帧可作为目标图像。例如,利用人体姿态识别(Open-pose)算法对视频中的人物实例进行姿态估计。示例性的,首先提取视频的各帧图像中人体关节点的位置坐标,据此计算相邻两帧之间人体关节点的距离变化量矩阵;然后对视频进行分段,利用每段视频对应的距离变化量矩阵生成视频特征;最后利用训练好的分类器对视频特征进行分类,如果能够识别到一段视频对应的视频特征属于预设行为库中的动作或行为的特征序列,则这段视频对应的各帧即为动作序列帧。又如,利用实例分割算法提取各关键帧中人物的轮廓并进行姿势表达,通过聚类算法提取姿势的关键特征,基于这些关键特征,利用动态时间规整(Dynamic Time Warping,DTW)算法完成动作识别等。再如,动作识别算法可以通过时间移位模块(Temporal Shift Module,TSM)或时间分段网络(Temporal Segment Networks,TSN)模型实现,该模型基于Kinetics-400数据集训练,可用于识别400种动作,能够满足识别并在封面中展示实例的动作的需求。
可选的,在识别到有效的动作序列帧的情况下,可判断各动作序列帧之间的背景差异程度,如背景差异程度在允许范围内,则对各动作序列帧进行图像融合。
可选的,获取至少两帧目标图像包括:基于预训练的网络确定视频中的关键帧之间的相似度;根据所述相似度将所述关键帧划分为多个分组;将关键帧数量最多的一个分组中的关键帧作为所述目标图像。
本实施例中,关键帧主要指能够反映视频关键内容或者场景变化的帧,例如包含视频中的主要人物的帧、属于精彩片段或经典片段的帧、场景发生明显变化的帧以及包含人物关键动作的帧等,都可以作为关键帧。从视频中提取出的关键帧至少为两个,在此基础上,可对各关键帧进行分组,并将关键帧数量最多的一个分组中的关键帧作为目标图像,用于图像融合。
示例性的:按照每间隔设定帧数(如20帧)选一帧的方式,从视频中选取一定数量的帧,减少数据量,并利用图像评估算法从中抽取关键帧;然后,可以根据各关键帧的帧间相似度,例如色调、场景内容或包含的实例是否相同等进行聚类;最后,可以基于预训练的卷积神经网络在特征空间做相似度度量,例如,采用计算机视觉几何组(Visual Geometry Group,VGG)网络,例如可以是VGG19网络(VGG19网络是VGG网络的其中一种结构),各帧图像输入VGG19后可以得到1000维的向量,两个向量的夹角可以代表它们的相似度。假设第i帧的特征向量为F i,假设第j帧的特征向量为F j,则相似度表示为:
Figure PCTCN2022119181-appb-000008
其中,<·>表示内积运算,||·||表示向量的模长。根据相似度可以将视频中的图像分为几个分组,选择帧数量最多的一个分组作为待融合的目标图像。
S520、根据对于所有所述目标图像中被去除的实例所在的区域的填补结果生成背景图。
本实施例中,每帧目标图像中被去除的实例所在的区域可以被填补。例如,对于一帧目标图像,可以利用其去除实例后的背景部分的纹理特征,对去除的实例所在的区域进行填补,以完成对该帧目标图像的背景修复,得到目标图像的填补结果;又如,对于一帧目标图像,可以利用设定图像(可以是除该目标图像以外所有的目标图像,也可以是除该目标图像以外所有的目标图像中的部分目标图像或设定数量的目标图像)的背景分割图的特征信息填补该目标图像中被去除实例的区域,即,将设定图像的背景分割图中的特征迁移并且融合到该目标图像被去除的实例所在的区域中,得到目标图像的填补结果。在此基础上,可以将所有目标图像的填补结果中的特征融合,生成背景图。在此过程中,综合各帧目标图像的填补结果生成背景图,充分复用了所有目标图像的特征,生成高质量的背景图。
S530、将所有所述目标图像中的实例融合在所述背景图中,得到融合图像。
本实施例中,每个目标图像中的实例与背景分离,多个背景可用于生成统一的背景图;所有实例可以融合在该背景图中,得到一张融合图像。
例如,可以从每个目标图像中抠出实例,并将这些实例添加到背景图中,该背景图根据所有目标图像的填补结果生成。这种情况下,利用静态的单张图像即可展示出多帧目标图像中的实例与背景,有效减少了计算资源和存储空间占用。在此过程中,还可以对要添加的实例进行裁剪、缩放、旋转以及拼接等操作。
可选的,每个目标图像中的实例还可以按照时序顺序依次(例如从左到右,或者从右到左等)排布在背景图中,而且还可以使每个实例在该背景图中的排布位置,与其在原目标图像中的中的相对位置保持一致,使得融合图像在视觉上更贴近原目标图像中实例的形态;或者,也可以使每个目标图像中的实例在背景图中自由排布。
以生成视频封面的应用场景为例,采用图像融合方法,可以基于时序冗余信息提取任意视频的背景图,并将多个目标图像中的实例融合在该背景图中。该过程可以包括:
抽帧:从视频中按照每间隔设定帧数(如20帧)抽取多帧图像,并根据图像质量算法选取关键帧;
场景聚类:根据关键帧的帧间相似度进行聚类,将包含的关键帧数量最多的一类(即一个分组)中的关键帧作为目标图像;
实例分割:将每个目标图像中的实例与背景分离;
图像配准:根据仿射变换矩阵,将所有目标图像的背景分割图对齐;
两阶段算法:对所有目标图像中去除实例的区域进行填补和修复,得到背 景图;
实例融合:将所有目标图像中的实例添加至背景图中,得到融合图像。
可选的,每个所述目标图像中的实例与所述背景图的融合度按照每个所述目标图像的时序顺序依次降低。
图9是本公开实施例四中的融合图像的示意图。如图9所示,融合图像中的五个人物实例可来源于五个目标图像,五个目标图像可来源于一段视频,其表达的是一个滑板跳跃的动作。为了使根据目标图像生成的融合图像更贴近原视频内容,在得到统一的背景图后,可以将每个目标图像中的实例排布到该背景图中的合适位置。可以理解的是,通常情况下,利用视频中的五个目标图像表达出人物实例的动作,需要将这些目标图像做成动态图像,计算量大、占用空间也大,而本实施例的图像融合方法,利用融合图像,可有效融合多个目标图像的特征信息,利用有限的资源展示丰富的图像内容。
此外,融合图像中的五个人物实例从右到左,完成了一次从起跳、腾空到落地的滑板跳跃的动作,越左侧的人物实例的时序越靠后,最左侧的人物实例对应于最后一个目标图像,而越左侧的人物实例,其与背景图的融合度越低,也可以理解为透明度越低。在此基础上,在通过静态的融合图像展示多个动作序列帧的实例的同时,也能够体现出每个实例的时序先后,具有视觉暂留的效果,使得所展示的动作或行为更具体更生动。
可选的,根据对于所有所述目标图像中被去除的实例所在的区域的填补结果生成背景图的方法根据上述任意实施例确定。
本实施例中的图像融合方法,利用融合图像可展示出多帧目标图像的丰富的特征,此外,通过综合各帧目标图像的填补结果生成背景图,充分复用了所有目标图像的特征,生成高质量的背景图,也可以提高融合图像的质量。
实施例五
图10是本公开实施例五中的背景图生成装置的结构示意图。本实施例尚未详尽的内容请参考上述实施例。
如图10所示,该装置包括:
分割模块610,被设置为对至少两帧目标图像中的每帧目标图像进行实例分割,得到每帧目标图像对应的去除实例的背景分割图;
填补模块620,被设置为对于每帧目标图像,根据设定图像的背景分割图填补所述目标图像中被去除的实例所在的区域,得到所述目标图像的填补结果,其中,所述设定图像包括所述至少两帧目标图像中与所述目标图像不同的目标图像;
生成模块630,被设置为根据所有所述目标图像的填补结果生成背景图。
本实施例的背景图生成装置,对于每帧目标图像的背景分割图都进行填补,并综合各帧目标图像的填补结果生成背景图,充分复用了所有目标图像中背景的特征,使实例与背景的分割处更平滑,从而生成高质量的背景图。
在上述基础上,填补模块620,是被设置为:
对于每帧目标图像,依次根据每个设定图像的背景分割图中相应区域的特征信息填补所述目标图像中被去除的实例所在的区域,直至根据最后一个设定图像的背景分割图中相应区域的特征信息的填补操作完成,或者,直至所述目标图像中被去除的实例所在的区域被完全填补,得到所述目标图像的填补结果。
在上述基础上,生成模块630,包括:
膨胀单元,被设置为对每个所述目标图像中的实例所在的区域进行膨胀处理,得到每个所述目标图像对应的膨胀区域;
修复单元,被设置为对于每帧目标图像,根据所有所述目标图像的填补结果中相应区域的特征信息修复所述目标图像对应的膨胀区域,得到所述目标图像的修复结果;
生成单元,被设置为根据所有所述目标图像的修复结果生成所述背景图。
在上述基础上,修复单元,是被设置为:
对所有所述目标图像的填补结果中与所述膨胀区域对应的区域的特征信息取平均,将取平均的结果填补至所述目标图像对应的膨胀区域,得到所述目标图像的修复结果。
在上述基础上,生成单元,是被设置为:对所有所述目标图像的修复结果取平均,得到所述背景图。
在上述基础上,修复单元,是被设置为:
在每次迭代过程中,对于每帧目标图像,对所有所述目标图像的填补结果中与所述膨胀区域对应的区域的特征信息取平均,将取平均的结果填补至所述目标图像对应的膨胀区域,得到所述目标图像在本次迭代过程中的修复结果;
若不满足设定条件,则进入下一次迭代过程;
若满足设定条件,则停止迭代,将所述目标图像在本次迭代过程中的修复结果作为所述目标图像的修复结果。
在上述基础上,所述设定条件包括:
所述目标图像在本次迭代过程中的修复结果与在上一次迭代过程对应的修复结果的特征差异在允许范围内。
在上述基础上,该装置还包括:
计算模块,被设置为在得到每帧目标图像对应的去除实例的背景分割图之后,选取一帧目标图像作为参考帧,根据特征点匹配算法确定除所述参考帧以 外的所有所述目标图像与所述参考帧之间的仿射变换矩阵;
对齐模块,被设置为根据所述仿射变换矩阵,将除所述参考帧以外的所有所述目标图像的背景分割图与所述参考帧的背景分割图对齐。
上述背景图生成装置可执行本公开任意实施例所提供的背景图生成方法,具备执行方法相应的功能模块和有益效果。
实施例六
图11是本公开实施例六中的背景图生成装置的结构示意图。本实施例尚未详尽的内容请参考上述实施例。如图11所示,该装置包括:
获取模块710,被设置为获取至少两帧目标图像;
图像融合模块720,被设置为根据对于所有所述目标图像中被去除的实例所在的区域的填补结果生成背景图;
融合模块730,被设置为将所有所述目标图像中的实例融合在所述背景图中,得到融合图像。
本实施例的图像融合装置,利用融合图像可展示出多帧目标图像的丰富的特征,此外,通过综合各帧目标图像的填补结果生成背景图,充分复用了每个目标图像的特征,生成高质量的背景图,也提高了融合图像的质量。
在上述基础上,获取至少两帧目标图像包括:基于动作识别算法识别视频中的动作序列帧,将所述动作序列帧作为所述目标图像。
在上述基础上,获取至少两帧目标图像包括:
基于预训练的网络确定视频中的关键帧之间的相似度;
根据所述相似度将所述关键帧划分为多个分组;
将关键帧数量最多的一个分组中的关键帧作为所述目标图像。
在上述基础上,每个所述目标图像中的实例与所述背景图的融合度按照每个所述目标图像的时序顺序依次降低。
在上述基础上,根据对于所有所述目标图像中被去除的实例所在的区域的填补结果生成背景图的方法根据上述任意实施例确定。
上述图像融合装置可执行本公开任意实施例所提供的图像融合方法,具备执行方法相应的功能模块和有益效果。
实施例七
图12是本公开实施例七中的电子设备的硬件结构示意图。图12示出了适于用来实现本公开实施例的电子设备800的结构示意图。本公开实施例中的电 子设备800包括计算机、笔记本电脑、服务器、平板电脑或智能手机等具有图像处理功能的设备。图12示出的电子设备800仅仅是一个示例。
如图12所示,电子设备800可以包括至少一个处理装置(例如中央处理器、图形处理器等)801,其可以根据存储在只读存储器(Read Only Memory,ROM)802中的程序或者从存储装置808加载到随机访问存储器(Random Access Memory,RAM)803中的程序而执行各种适当的动作和处理。至少一个处理装置801实现如本公开提供的背景图生成、图像融合方法。在RAM 803中,还存储有电子设备800操作所需的各种程序和数据。处理装置801、ROM 802以及RAM 803通过总线805彼此相连。输入/输出(Input/Output,I/O)接口804也连接至总线805。
通常,以下装置可以连接至I/O接口804:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置806;包括例如液晶显示器(Liquid Crystal Display,LCD)、扬声器、振动器等的输出装置807;包括例如磁带、硬盘等的存储装置808,存储装置808被设置为存储至少一个程序;以及通信装置809。通信装置809可以允许电子设备800与其他设备进行无线或有线通信以交换数据。虽然图12示出了具有各种装置的电子设备800,但是应理解的是,并不要求实施或具备全部示出的装置。可以替代地实施或具备更多或更少的装置。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置809从网络上被下载和安装,或者从存储装置808被安装,或者从ROM 802被安装。在该计算机程序被处理装置801执行时,执行本公开实施例的方法中限定的上述功能。
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的组合。计算机可读存储介质例可以是电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者以上的合适组合。计算机可读存储介质的示例可以包括:具有至少一个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(如电子可编程只读存储器(Electronic Programable Read Only Memory,EPROM)或闪存)、光纤、便携式紧凑磁盘只读存储器(Compact Disc-Read Only Memory,CD-ROM)、光存储器件、磁存储器件、或者上述的合适的组合。在本公开中,计算机可读存储介质可以是包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括电磁信号、光信号或上述的合适的组合。计算机 可读信号介质还可以是计算机可读存储介质以外的计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用适当的介质传输,包括:电线、光缆、射频(Radio Frequency,RF)等,或者上述的合适的组合。
在一些实施方式中,客户端、服务器可以利用诸如超文本传输协议(Hyper Text Transfer Protocol,HTTP)之类的当前已知或未来研发的网络协议进行通信,并且可以与其他形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(Local Area Network,LAN),广域网(Wide Area Network,WAN),网际网(例如,互联网)以及端对端网络(例如,Ad hoc端对端网络),以及当前已知或未来研发的网络。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读介质承载有至少一个程序,当上述至少一个程序被该电子设备执行时,使得该电子设备:对至少两帧目标图像中的每帧目标图像进行实例分割,得到每帧目标图像对应的去除实例的背景分割图;对于每帧目标图像,根据设定图像的背景分割图填补所述目标图像中被去除的实例所在的区域,得到所述目标图像的填补结果,其中,所述设定图像包括所述至少两帧目标图像中与所述目标图像不同的目标图像;根据所有所述目标图像的填补结果生成背景图。或者,使得该电子设备:获取至少两帧目标图像;根据对于所有所述目标图像中被去除的实例所在的区域的填补结果生成背景图;将所有所述目标图像中的实例融合在所述背景图中,得到融合图像。
可以以至少一种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括面向对象的程序设计语言诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言-诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络包括局域网(LAN)或广域网(WAN)连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含至少一个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行, 它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。需要注意,单元的名称在一些情况下并不构成对该单元本身的限定。
本文中以上描述的功能可以至少部分地由至少一个硬件逻辑部件来执行。例如,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(Field-Programmable Gate Array,FPGA)、专用集成电路(Application Specific Integrated Circuit,ASIC)、专用标准产品(Application Specific Standard Parts,ASSP)、片上系统(System on Chip,SOC)、复杂可编程逻辑设备(Complex Programmable Logic Device,CPLD)等。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的合适组合。机器可读存储介质的示例可以包括基于至少一个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的合适组合。
根据本公开的一个或多个实施例,示例1提供了一种背景图生成方法,包括:
对至少两帧目标图像中的每帧目标图像进行实例分割,得到每帧目标图像对应的去除实例的背景分割图;
对于每帧目标图像,根据设定图像的背景分割图填补所述目标图像中被去除的实例所在的区域,得到所述目标图像的填补结果,其中,所述设定图像包括所述至少两帧目标图像中与所述目标图像不同的目标图像;
根据所有所述目标图像的填补结果生成背景图。
示例2根据示例1所述的方法,所述对于每帧目标图像,根据设定图像的背景分割图填补所述目标图像中被去除的实例所在的区域,得到所述目标图像的填补结果,包括:
对于每帧目标图像,依次根据每个设定图像的背景分割图中相应区域的特征信息填补所述目标图像中被去除的实例所在的区域,直至根据最后一个设定图像的背景分割图中相应区域的特征信息的填补操作完成,或者,直至所述目 标图像中被去除的实例所在的区域被完全填补,得到所述目标图像的填补结果。
示例3根据示例1所述的方法,所述根据所有所述目标图像的填补结果生成背景图,包括:
对每个所述目标图像中的实例所在的区域进行膨胀处理,得到每个所述目标图像对应的膨胀区域;
对于每帧目标图像,根据所有所述目标图像的填补结果中相应区域的特征信息修复所述目标图像对应的膨胀区域,得到所述目标图像的修复结果;
根据所有所述目标图像的修复结果生成所述背景图。
示例4根据示例3所述的方法,所述根据所有所述目标图像的填补结果中相应区域的特征信息修复所述目标图像对应的膨胀区域,得到所述目标图像的修复结果,包括:
对所有所述目标图像的填补结果中与所述膨胀区域对应的区域的特征信息取平均,将取平均的结果填补至所述目标图像对应的膨胀区域,得到所述目标图像的修复结果。
示例5根据示例3所述的方法,所述根据所有所述目标图像的修复结果生成所述背景图,包括:
对所有所述目标图像的修复结果取平均,得到所述背景图。
示例6根据示例3所述的方法,所述对于每帧目标图像,根据所有所述目标图像的填补结果中相应区域的特征信息修复所述目标图像对应的膨胀区域,得到所述目标图像的修复结果,包括:
在每次迭代过程中,对于每帧目标图像,对所有所述目标图像的填补结果中与所述膨胀区域对应的区域的特征信息取平均,将取平均的结果填补至所述目标图像对应的膨胀区域,得到所述目标图像在本次迭代过程中的修复结果;
基于所述目标图像在本次迭代过程中的修复结果不满足设定条件的判断结果,进入下一次迭代过程;
基于所述目标图像在本次迭代过程中的修复结果满足所述设定条件的判断结果,停止迭代,将所述目标图像在本次迭代过程中的修复结果作为所述目标图像的修复结果。
示例7根据示例6所述的方法,所述设定条件包括:
所述目标图像在本次迭代过程中的修复结果与在上一次迭代过程对应的修复结果的特征差异在允许范围内。
示例8根据示例1所述的方法,在所述得到每帧目标图像对应的去除实例的背景分割图之后,还包括:
选取一帧目标图像作为参考帧,根据特征点匹配算法确定除所述参考帧以 外的所有所述目标图像与所述参考帧之间的仿射变换矩阵;
根据所述仿射变换矩阵,将除所述参考帧以外的所有所述目标图像的背景分割图与所述参考帧的背景分割图对齐。
根据本公开的一个或多个实施例,示例9提供了一种图像融合方法,包括:
获取至少两帧目标图像;
根据对于所有所述目标图像中被去除的实例所在的区域的填补结果生成背景图;
将所有所述目标图像中的实例融合在所述背景图中,得到融合图像。
示例10根据示例9所述的方法,所述获取至少两帧目标图像包括:
基于动作识别算法识别视频中的动作序列帧,将所述动作序列帧作为所述目标图像。
示例11根据示例9所述的方法,所述获取至少两帧目标图像包括:
基于预训练的网络确定视频中的关键帧之间的相似度;
根据所述相似度将所述关键帧划分为多个分组;
将关键帧数量最多的一个分组中的关键帧作为所述目标图像。
示例12根据示例9所述的方法,每个所述目标图像中的实例与所述背景图的融合度按照每个所述目标图像的时序顺序依次降低。
示例13根据示例9所述的方法,根据对于所有所述目标图像中被去除的实例所在的区域的填补结果生成背景图的方法根据示例1-8任一项确定。
根据本公开的一个或多个实施例,示例14提供了一种背景图生成装置,包括:
分割模块,被设置为对至少两帧目标图像中的每帧目标图像进行实例分割,得到每帧目标图像对应的去除实例的背景分割图;
填补模块,被设置为对于每帧目标图像,根据设定图像的背景分割图填补所述目标图像中被去除的实例所在的区域,得到所述目标图像的填补结果,其中,所述设定图像包括所述至少两帧目标图像中与所述目标图像不同的目标图像;
生成模块,被设置为根据所有所述目标图像的填补结果生成背景图。
根据本公开的一个或多个实施例,示例15提供了一种图像融合装置,包括:
获取模块,被设置为获取至少两帧目标图像;
背景图生成模块,被设置为根据对于所有所述目标图像中被去除的实例所在的区域的填补结果生成背景图;
融合模块,被设置为将所有所述目标图像中的实例融合在所述背景图中, 得到融合图像。
根据本公开的一个或多个实施例,示例16提供了一种电子设备,包括:
处理器;
存储装置,被设置为存储程序;
在所述序被所述处理器执行时,所述处理器实现如示例1-8中任一项所述的背景图生成方法,或如示例9-13中任一项所述的图像融合方法。
根据本公开的一个或多个实施例,示例17提供了一种计算机可读介质,所述计算机可读介质上存储有计算机程序,所述计算机程序被处理器执行时实现如示例1-8中任一项所述的背景图生成方法,或如示例9-13中任一项所述的图像融合方法。

Claims (17)

  1. 一种背景图生成方法,包括:
    对至少两帧目标图像中的每帧目标图像进行实例分割,得到每帧目标图像对应的去除实例的背景分割图;
    对于每帧目标图像,根据设定图像的背景分割图填补所述目标图像中被去除的实例所在的区域,得到所述目标图像的填补结果,其中,所述设定图像包括所述至少两帧目标图像中与所述目标图像不同的目标图像;
    根据所有所述目标图像的填补结果生成背景图。
  2. 根据权利要求1所述的方法,其中,所述对于每帧目标图像,根据设定图像的背景分割图填补所述目标图像中被去除的实例所在的区域,得到所述目标图像的填补结果,包括:
    对于每帧目标图像,依次根据每个设定图像的背景分割图中相应区域的特征信息填补所述目标图像中被去除的实例所在的区域,直至根据最后一个设定图像的背景分割图中相应区域的特征信息的填补操作完成,或者,直至所述目标图像中被去除的实例所在的区域被完全填补,得到所述目标图像的填补结果。
  3. 根据权利要求1所述的方法,其中,所述根据所有所述目标图像的填补结果生成背景图,包括:
    对每个所述目标图像中的实例所在的区域进行膨胀处理,得到每个所述目标图像对应的膨胀区域;
    对于每帧目标图像,根据所有所述目标图像的填补结果中相应区域的特征信息修复所述目标图像对应的膨胀区域,得到所述目标图像的修复结果;
    根据所有所述目标图像的修复结果生成所述背景图。
  4. 根据权利要求3所述的方法,其中,所述根据所有所述目标图像的填补结果中相应区域的特征信息修复所述目标图像对应的膨胀区域,得到所述目标图像的修复结果,包括:
    对所有所述目标图像的填补结果中与所述膨胀区域对应的区域的特征信息取平均,将取平均的结果填补至所述目标图像对应的膨胀区域,得到所述目标图像的修复结果。
  5. 根据权利要求3所述的方法,其中,所述根据所有所述目标图像的修复结果生成所述背景图,包括:
    对所有所述目标图像的修复结果取平均,得到所述背景图。
  6. 根据权利要求3所述的方法,其中,所述对于每帧目标图像,根据所有所述目标图像的填补结果中相应区域的特征信息修复所述目标图像对应的膨胀区域,得到所述目标图像的修复结果,包括:
    在每次迭代过程中,对于每帧目标图像,对所有所述目标图像的填补结果 中与所述膨胀区域对应的区域的特征信息取平均,将取平均的结果填补至所述目标图像对应的膨胀区域,得到所述目标图像在本次迭代过程中的修复结果;
    基于所述目标图像在本次迭代过程中的修复结果不满足设定条件的判断结果,进入下一次迭代过程;
    基于所述目标图像在本次迭代过程中的修复结果满足所述设定条件的判断结果,停止迭代,将所述目标图像在本次迭代过程中的修复结果作为所述目标图像的修复结果。
  7. 根据权利要求6所述的方法,其中,所述设定条件包括:
    所述目标图像在本次迭代过程中的修复结果与在上一次迭代过程对应的修复结果的特征差异在允许范围内。
  8. 根据权利要求1所述的方法,在所述得到每帧目标图像对应的去除实例的背景分割图之后,所述方法还包括:
    选取一帧目标图像作为参考帧,根据特征点匹配算法确定除所述参考帧以外的所有所述目标图像与所述参考帧之间的仿射变换矩阵;
    根据所述仿射变换矩阵,将除所述参考帧以外的所有所述目标图像的背景分割图与所述参考帧的背景分割图对齐。
  9. 一种图像融合方法,包括:
    获取至少两帧目标图像;
    根据对于所有所述目标图像中被去除的实例所在的区域的填补结果生成背景图;
    将所有所述目标图像中的实例融合在所述背景图中,得到融合图像。
  10. 根据权利要求9所述的方法,其中,所述获取至少两帧目标图像包括:
    基于动作识别算法识别视频中的动作序列帧,将所述动作序列帧作为所述目标图像。
  11. 根据权利要求9所述的方法,其中,所述获取至少两帧目标图像包括:
    基于预训练的网络确定视频中的关键帧之间的相似度;
    根据所述相似度将所述关键帧划分为多个分组;
    将关键帧数量最多的一个分组中的关键帧作为所述目标图像。
  12. 根据权利要求9所述的方法,其中,每个所述目标图像中的实例与所述背景图的融合度按照每个所述目标图像的时序顺序依次降低。
  13. 根据权利要求9所述的方法,其中,根据对于所有所述目标图像中被去除的实例所在的区域的填补结果生成背景图的方法根据权利要求1-8任一项确定。
  14. 一种背景图生成装置,包括:
    分割模块,被设置为对至少两帧目标图像中的每帧目标图像进行实例分割,得到每帧目标图像对应的去除实例的背景分割图;
    填补模块,被设置为对于每帧目标图像,根据设定图像的背景分割图填补所述目标图像中被去除的实例所在的区域,得到所述目标图像的填补结果,其中,所述设定图像包括所述至少两帧目标图像中与所述目标图像不同的目标图像;
    生成模块,被设置为根据所有所述目标图像的填补结果生成背景图。
  15. 一种图像融合装置,包括:
    获取模块,被设置为获取至少两帧目标图像;
    背景图生成模块,被设置为根据对于所有所述目标图像中被去除的实例所在的区域的填补结果生成背景图;
    融合模块,被设置为将所有所述目标图像中的实例融合在所述背景图中,得到融合图像。
  16. 一种电子设备,包括:
    处理器;
    存储装置,被设置为存储程序;
    在所述程序被所述处理器执行时,所述处理器实现如权利要求1-8中任一项所述的背景图生成方法,或如权利要求9-13中任一项所述的图像融合方法。
  17. 一种计算机可读介质,所述计算机可读介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1-8中任一项所述的背景图生成方法,或如权利要求9-13中任一项所述的图像融合方法。
PCT/CN2022/119181 2021-10-09 2022-09-16 背景图生成、图像融合方法、装置、电子设备及可读介质 WO2023056833A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111175973.5A CN115965647A (zh) 2021-10-09 2021-10-09 背景图生成、图像融合方法、装置、电子设备及可读介质
CN202111175973.5 2021-10-09

Publications (1)

Publication Number Publication Date
WO2023056833A1 true WO2023056833A1 (zh) 2023-04-13

Family

ID=85803906

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/119181 WO2023056833A1 (zh) 2021-10-09 2022-09-16 背景图生成、图像融合方法、装置、电子设备及可读介质

Country Status (2)

Country Link
CN (1) CN115965647A (zh)
WO (1) WO2023056833A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102833464A (zh) * 2012-07-24 2012-12-19 常州泰宇信息科技有限公司 智能视频监控用结构化背景重建方法
CN108229344A (zh) * 2017-12-19 2018-06-29 深圳市商汤科技有限公司 图像处理方法和装置、电子设备、计算机程序和存储介质
CN109583509A (zh) * 2018-12-12 2019-04-05 南京旷云科技有限公司 数据生成方法、装置及电子设备
CN110569878A (zh) * 2019-08-08 2019-12-13 上海汇付数据服务有限公司 一种基于卷积神经网络的照片背景相似度聚类方法及计算机

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102833464A (zh) * 2012-07-24 2012-12-19 常州泰宇信息科技有限公司 智能视频监控用结构化背景重建方法
CN108229344A (zh) * 2017-12-19 2018-06-29 深圳市商汤科技有限公司 图像处理方法和装置、电子设备、计算机程序和存储介质
CN109583509A (zh) * 2018-12-12 2019-04-05 南京旷云科技有限公司 数据生成方法、装置及电子设备
CN110569878A (zh) * 2019-08-08 2019-12-13 上海汇付数据服务有限公司 一种基于卷积神经网络的照片背景相似度聚类方法及计算机

Also Published As

Publication number Publication date
CN115965647A (zh) 2023-04-14

Similar Documents

Publication Publication Date Title
CN110176027B (zh) 视频目标跟踪方法、装置、设备及存储介质
US10810435B2 (en) Segmenting objects in video sequences
WO2020125495A1 (zh) 一种全景分割方法、装置及设备
CN108446698B (zh) 在图像中检测文本的方法、装置、介质及电子设备
CN103578116B (zh) 用于跟踪对象的设备和方法
JP2020507850A (ja) 画像内の物体の姿の確定方法、装置、設備及び記憶媒体
JP2021508123A (ja) リモートセンシング画像認識方法、装置、記憶媒体及び電子機器
CN114550177B (zh) 图像处理的方法、文本识别方法及装置
TW201447775A (zh) 資訊識別方法、設備和系統
JP2013122763A (ja) 映像処理装置及び映像処理方法
CN111669502B (zh) 目标对象显示方法、装置及电子设备
CN110210480B (zh) 文字识别方法、装置、电子设备和计算机可读存储介质
CN110349161B (zh) 图像分割方法、装置、电子设备、及存储介质
WO2023082453A1 (zh) 一种图像处理方法及装置
WO2022227218A1 (zh) 药名识别方法、装置、计算机设备和存储介质
WO2020125062A1 (zh) 一种图像融合方法及相关装置
CN110427915B (zh) 用于输出信息的方法和装置
CN110211195B (zh) 生成图像集合的方法、装置、电子设备和计算机可读存储介质
WO2023056835A1 (zh) 视频封面生成方法、装置、电子设备及可读介质
CN112084920B (zh) 提取热词的方法、装置、电子设备及介质
CN111275824A (zh) 用于交互式增强现实的表面重建
US10891740B2 (en) Moving object tracking apparatus, moving object tracking method, and computer program product
CN110619656A (zh) 基于双目摄像头的人脸检测跟踪方法、装置及电子设备
JP2023526899A (ja) 画像修復モデルを生成するための方法、デバイス、媒体及びプログラム製品
CN115731341A (zh) 三维人头重建方法、装置、设备及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22877853

Country of ref document: EP

Kind code of ref document: A1