CN117336620B - Adaptive video stitching method and system based on deep learning - Google Patents

Adaptive video stitching method and system based on deep learning Download PDF

Info

Publication number
CN117336620B
CN117336620B CN202311576584.2A CN202311576584A CN117336620B CN 117336620 B CN117336620 B CN 117336620B CN 202311576584 A CN202311576584 A CN 202311576584A CN 117336620 B CN117336620 B CN 117336620B
Authority
CN
China
Prior art keywords
image
suture line
artifact
synthetic
pixel point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311576584.2A
Other languages
Chinese (zh)
Other versions
CN117336620A (en
Inventor
刘卫华
周舟
陈虹旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Smart Yunzhou Technology Co ltd
Original Assignee
Beijing Smart Yunzhou Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Smart Yunzhou Technology Co ltd filed Critical Beijing Smart Yunzhou Technology Co ltd
Priority to CN202311576584.2A priority Critical patent/CN117336620B/en
Publication of CN117336620A publication Critical patent/CN117336620A/en
Application granted granted Critical
Publication of CN117336620B publication Critical patent/CN117336620B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/95Computational photography systems, e.g. light-field imaging systems
    • H04N23/951Computational photography systems, e.g. light-field imaging systems by using two or more images to influence resolution, frame rate or aspect ratio
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4007Interpolation-based scaling, e.g. bilinear interpolation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4023Decimation- or insertion-based scaling, e.g. pixel or line decimation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4038Scaling the whole image or part thereof for image mosaicing, i.e. plane images composed of plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/80Camera processing pipelines; Components thereof
    • H04N23/81Camera processing pipelines; Components thereof for suppressing or minimising disturbance in the image signal generation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/80Camera processing pipelines; Components thereof
    • H04N23/82Camera processing pipelines; Components thereof for controlling camera response irrespective of the scene brightness, e.g. gamma correction

Abstract

The invention provides a self-adaptive video stitching method and a system based on deep learning, which relate to the technical field of video stitching and comprise the following steps: acquiring an initial video, selecting pixel points to be matched of a current image in the initial video according to the initial video, combining a preset background filtering algorithm, matching the pixel points to be matched with a Gaussian distribution mode acquired based on a Gaussian mixture model, and determining a foreground image; judging whether the current image can keep the best suture line of the previous frame of image or not according to the foreground image, if not, determining the best suture line corresponding to the current image through a dynamic programming algorithm, and if so, taking the best suture line of the previous frame of image as the best suture line of the current image; and determining an overlapping area of two adjacent frames of pictures according to the optimal suture line, judging whether synthetic artifact exists according to the overlapping area, generating a synthetic artifact set if the synthetic artifact exists, and eliminating the synthetic artifact according to the synthetic artifact set by a multi-target synthetic artifact identification method to obtain a spliced image.

Description

Adaptive video stitching method and system based on deep learning
Technical Field
The invention relates to the technical field of video stitching, in particular to a self-adaptive video stitching method and system based on deep learning.
Background
The video splicing technology has important theoretical research significance and plays an important role in various application fields such as virtual reality, security monitoring, intelligent driving, video conference, unmanned aerial vehicle aerial photography and the like. Video stitching techniques are commonly used to synthesize two or more videos captured by cameras of different poses. It can reduce the requirement for video acquisition equipment and achieve a larger field of view. Although the research history of image and video stitching is long, the existing video stitching methods do not perform perfectly.
In the related art, CN111193920a discloses a video picture stereoscopic splicing method and system based on a deep learning network, the method comprises: the extraction step: extracting matching points in a two-dimensional video picture shot by the same viewpoint; splicing: based on the matching points in the two-dimensional video picture, splicing the two-dimensional video picture by utilizing a GAN network to generate a panoramic video picture; judging: inputting the panoramic video picture into the GAN network for discrimination to generate a panoramic video picture of a first viewpoint; repeating the steps of: extracting matching points in a two-dimensional video picture shot by another viewpoint, repeating the splicing step and the distinguishing step, and generating a panoramic video picture of a second viewpoint; the combination step: and combining the panoramic video picture of the first viewpoint and the panoramic video picture of the second viewpoint to generate a panoramic stereoscopic video.
CN116721019a discloses a multi-camera video image stitching method based on deep learning, which comprises the steps of S1 constructing an alignment model and a stitching model, S2 acquiring a training set, importing the alignment model and the stitching model, and performing training optimization on the training set; s3, obtaining video frames shot by K cameras at the same moment; s4, splicing the video frame k and the video frame k+1 by the image to obtain a spliced image; s5, judging whether k+1 is equal to K, if so, using the spliced image as a final image, and outputting the final image; otherwise, let k=k+1, then let the mosaic be the video frame k, and return to S4; the video frames in each camera are read, then the pictures are input into an alignment model for alignment, then the aligned results are input into a splicing model for splicing, a self-attention mechanism is introduced into the alignment model to remarkably improve the feature extraction efficiency and the feature extraction precision in the reference image and the target image, and the self-attention mechanism is introduced into the splicing model to remarkably improve the feature detection efficiency and the feature detection precision in the reference image and the target image.
In summary, although the prior art can splice video and images, the foreground and the background in the images are not accurately distinguished, which may cause a large error in splicing video and images, so a method is needed to solve the defects in the prior art.
Disclosure of Invention
The embodiment of the invention provides a self-adaptive video stitching method and a system based on deep learning, which are used for solving a part of problems in the prior art.
In a first aspect of the embodiments of the present invention, a deep learning-based adaptive video stitching method includes:
acquiring an initial video, selecting a pixel point to be matched of a current image in the initial video according to the initial video, combining a preset background filtering algorithm, matching the pixel point to be matched with a Gaussian distribution mode acquired based on a Gaussian mixture model, and determining a foreground image;
judging whether the current image can use the optimal suture line of the previous frame of image according to the foreground image, if not, determining the optimal suture line corresponding to the current image through a dynamic programming algorithm, and if so, taking the optimal suture line of the previous frame of image as the optimal suture line of the current image;
and determining an overlapping area of two adjacent frames of pictures according to the optimal suture line, judging whether synthetic artifact exists according to the overlapping area, if so, generating a synthetic artifact set, and eliminating the synthetic artifact according to the synthetic artifact set by a multi-target synthetic artifact identification method to obtain a spliced image.
In an alternative embodiment of the present invention,
acquiring an initial video, selecting a pixel point to be matched of a current image in the initial video according to the initial video, matching the pixel point to be matched with a Gaussian distribution mode acquired based on a Gaussian mixture model by combining a preset background filtering algorithm, and determining a foreground image comprises:
for any pixel point of a current image in a video, determining a probability density function of the pixel point through a mixed Gaussian model, and matching all Gaussian distribution modes of the pixel point and the current moment according to the probability density function to obtain a matching result;
if the matching is successful, the pixel point is marked as a point in the background;
if the matching fails, the pixel point is marked as a point in the foreground;
if at least one Gaussian distribution pattern is successfully matched with the pixel points, updating the Gaussian distribution pattern which is not successfully matched;
and marking the area where the moving object is located in the current image according to the matching result, and performing expansion and corrosion operation on the area where the moving object is located in the current image to obtain a foreground image.
In an alternative embodiment of the present invention,
Updating the non-successfully matched Gaussian distribution pattern is shown in the following formula:
wherein,μ t is shown intThe average value of the time of day,x t representing a random number of pixels in a pixel array,is shown intVariance of time gaussian distribution, +.>Is shown int-1The variance of the gaussian distribution of the moment in time,ρrepresents the learning rate for controlling the degree of influence of the new observed data on the gaussian distribution pattern,αthe distribution weight is represented by a distribution weight,G()representing a gaussian distributed density function.
In an alternative embodiment of the present invention,
judging whether the current image can use the optimal suture line of the previous frame of image according to the foreground image, if not, determining the optimal suture line corresponding to the current image through a dynamic programming algorithm, and if so, taking the optimal suture line of the previous frame of image as the optimal suture line of the current image comprises:
acquiring the foreground image, obtaining a binary image of a marked foreground image according to a preset background segmentation algorithm, and determining whether a pixel point exists in an optimal suture line of the previous frame image or not according to the binary image;
if so, determining the optimal suture line of the current image by a preset dynamic programming algorithm along the optimal suture line of the previous frame image;
And if the pixel point is not positioned in the foreground image, taking the optimal suture line of the previous frame image as the optimal suture line of the current image.
In an alternative embodiment of the present invention,
determining the optimal suture line of the current image through a preset dynamic programming algorithm comprises the following steps:
determining a random suture line from a plurality of suture lines which are obtained in advance, connecting a current pixel point, and taking the current pixel point as a starting point of the random suture line;
calculating the similarity value of the pixel points of each column of the first row in the current image, and expanding downwards to calculate the accumulated similarity measurement value of the random suture line;
comparing the accumulated similarity measurement values of three pixel points at the leftmost end, the right side and the leftmost end of the adjacent upper row of the current pixel point of each row in the current image, and connecting the pixel point with the minimum accumulated similarity measurement value with the current pixel point;
determining the propagation direction reaching the current pixel point, calculating the accumulated similarity measurement value corresponding to all the pixel points in the current image, finding out the minimum value in the accumulated similarity measurement value, and obtaining the corresponding pixel point as the end point of the random suture line;
and backtracking is carried out according to the propagation direction of the current pixel point until the current image reaches the first row, and the starting point and the ending point of the random suture line are combined to determine the optimal suture line.
In an alternative embodiment, the dynamic programming algorithm is represented by the following formula:
wherein,maska binarized foreground image is obtained by filtering the background of the video image,lfor each line of the detection range,E(x,y)representing position as a state function in dynamic programming(x,y)At the optimum value of the parameter,E(k,y-1)representing the position in the previous rowkAt the optimum value of the parameter,s(x,y)is indicated in the position(x,y)Local cost at that location.
In an alternative embodiment of the present invention,
determining an overlapping area of two adjacent frames of pictures according to the optimal suture line, judging whether synthetic artifacts exist according to the overlapping area, generating a synthetic artifact set if the synthetic artifacts exist, eliminating the synthetic artifacts through a multi-target synthetic artifact identification method according to the synthetic artifact set, and obtaining a spliced image comprises the following steps:
dividing an overlapping region of two adjacent frames of pictures according to a preset grid division strategy, calculating pixel differences of adjacent pixels in the overlapping region, determining whether a target object exists according to the pixel differences in combination with a preset judging threshold, if the pixel differences are larger than the judging threshold, considering that the target object exists in the overlapping region, marking the target object in the overlapping region as an artifact object, and generating a synthetic artifact set;
According to the synthetic artifact set, performing spatial mapping on each artifact object of the synthetic artifact set, and mapping each artifact object into the space of two adjacent frames of images respectively;
calculating the spatial distance of the same artifact object in the space of two adjacent frames of images after mapping, judging by combining with a preset distance threshold, and if the spatial distance is larger than the distance threshold, marking the object as a synthetic artifact;
and based on an image interpolation technology, taking a pixel point adjacent to the synthetic artifact as a replacement area, interpolating the pixel point adjacent to the synthetic artifact into the synthetic artifact area, and eliminating the synthetic artifact to obtain the spliced image.
In a second aspect of an embodiment of the present invention, there is provided an adaptive video stitching system based on deep learning, including:
the method comprises the steps of a first unit, acquiring an initial video, selecting a pixel point to be matched of a current image in the initial video according to the initial video, combining a preset background filtering algorithm, matching the pixel point to be matched with a Gaussian distribution mode acquired based on a Gaussian mixture model, and determining a foreground image;
the second unit judges whether the current image can use the optimal suture line of the previous frame of image according to the foreground image, if not, the optimal suture line corresponding to the current image is determined through a dynamic programming algorithm, and if so, the optimal suture line of the previous frame of image is used as the optimal suture line of the current image;
And the third unit is used for determining the overlapping area of two adjacent frames of pictures according to the optimal suture line, judging whether synthetic artifact exists according to the overlapping area, generating a synthetic artifact set if the synthetic artifact exists, and eliminating the synthetic artifact according to the synthetic artifact set by a multi-target synthetic artifact identification method to obtain a spliced image.
In a third aspect of an embodiment of the present invention,
there is provided an electronic device including:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to invoke the instructions stored in the memory to perform the method described previously.
In a fourth aspect of an embodiment of the present invention,
there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the method as described above.
According to the invention, the adaptive video stitching of different scenes is realized through the dynamic programming algorithm and the judgment of the optimal stitching line, so that the stitching effect is more coherent and natural, the artifacts caused by stitching can be effectively detected and eliminated by adopting the multi-target synthetic artifact identification method, the image quality is improved, the accurate matching of foreground pixels is realized through background filtering and matching based on the Gaussian mixture model, the matching error is reduced, the process of judging the optimal stitching line is beneficial to maintaining the continuity of video stitching, the uncoordinated stitching line between adjacent frames is avoided, in a total, the adaptive video stitching is realized, the possible problems in the stitching process are processed, and the quality and effect of video stitching are improved.
Drawings
Fig. 1 is a flow chart of an adaptive video stitching method based on deep learning according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of an adaptive video stitching system based on deep learning according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The technical scheme of the invention is described in detail below by specific examples. The following embodiments may be combined with each other, and some embodiments may not be repeated for the same or similar concepts or processes.
Fig. 1 is a flow chart of an adaptive video stitching method based on deep learning according to an embodiment of the present invention, as shown in fig. 1, where the method includes:
S1, acquiring an initial video, selecting pixel points to be matched of a current image in the initial video according to the initial video, and matching the pixel points to be matched with a Gaussian distribution mode acquired based on a Gaussian mixture model by combining a preset background filtering algorithm to determine a foreground image;
the pixel points to be matched refer to pixels which need to be matched in the image processing process, the pixel points are generally regarded as foreground in an image because the pixel points may represent a moving object, a change in a scene or other interested objects, the background filtering algorithm is an image processing technology, and aims to separate the foreground and background parts in the image, so that background information in the image can be eliminated, and the foreground is more obvious, and the foreground image is an image obtained after the background filtering algorithm, wherein only the foreground part in the image, namely the interested object or scene, is contained.
In an alternative embodiment of the present invention,
acquiring an initial video, selecting a pixel point to be matched of a current image in the initial video according to the initial video, matching the pixel point to be matched with a Gaussian distribution mode acquired based on a Gaussian mixture model by combining a preset background filtering algorithm, and determining a foreground image comprises:
For any pixel point of a current image in a video, determining a probability density function of the pixel point through a mixed Gaussian model, and matching all Gaussian distribution modes of the pixel point and the current moment according to the probability density function to obtain a matching result;
if the matching is successful, the pixel point is marked as a point in the background;
if the matching fails, the pixel point is marked as a point in the foreground;
if at least one Gaussian distribution pattern is successfully matched with the pixel points, updating the Gaussian distribution pattern which is not successfully matched;
and marking the area where the moving object is located in the current image according to the matching result, and performing expansion and corrosion operation on the area where the moving object is located in the current image to obtain a foreground image.
Initializing parameters of a mixed Gaussian model, including the mean value, variance and weight of each Gaussian distribution, reading the pixel value of a current image, for a given pixel point of the current image and the Gaussian distribution in the mixed Gaussian model, firstly calculating a probability density function of the Gaussian distribution, carrying out weighted summation on the probability density function of each Gaussian distribution obtained by calculation according to the weight in the mixed Gaussian model to obtain the comprehensive probability density of the current pixel point in the current mixed Gaussian model, and selecting the Gaussian distribution with the largest probability according to the probability of the pixel point obtained by calculation under each Gaussian distribution, namely matching the pixel point to the Gaussian distribution with the largest probability.
For example, in order to determine whether the matching is successful, a comparison threshold needs to be set, the calculated probability density function value is compared with the set comparison threshold, if the comparison threshold is exceeded, the gaussian distribution is considered to be successfully matched with the pixel points, and if the comparison threshold is not exceeded, the gaussian distribution is considered to be failed in matching;
if at least one Gaussian distribution model is successfully matched, judging the pixel point as a point in the background; otherwise, judging the pixel as a point in the foreground, and updating the mean value, the variance and the weight according to the new foreground pixel point for the Gaussian distribution mode which is not successfully matched;
and marking the area where the moving object in the current image is located according to the matching result, wherein the area comprises pixel points in the foreground, and performing morphological operations such as expansion, corrosion and the like on the moving object area in the foreground image so as to remove noise and better divide the foreground area, thereby finally obtaining the foreground image.
The dilation is a morphological operation whose goal is to expand the target area in the image, in particular for each foreground pixel, to mark its surrounding pixels as foreground, and the erosion is also a morphological operation whose goal is to shrink the target area in the image, in particular for each foreground pixel, to mark its surrounding pixels as foreground, which remains only if all surrounding pixels are foreground.
In this embodiment, through the mixed gaussian model, rapid background modeling and foreground detection can be performed on the image, and through successful or unsuccessful matching, the gaussian distribution model is updated, so that the model can adapt to scene changes. The self-adaption can better adapt to the splicing requirements of different environments and different videos, the probability density function is utilized for matching, whether the pixel points belong to the background or the foreground can be judged more accurately, erroneous judgment is avoided, the detection accuracy of a moving object is improved, the non-matched Gaussian distribution model is updated, the background model is more robust, environmental changes in a long time can be adapted, and therefore the stability of the whole algorithm is improved.
In summary, the embodiment realizes accurate detection and clear extraction of the foreground of the moving object in the video through the combination of the Gaussian model, the probability density function and the morphological operation, and provides a high-quality foreground image for video stitching.
In an alternative embodiment, the update of the non-matching successful gaussian distribution pattern is as follows:
wherein,μ t is shown intThe average value of the time of day,x t representing a random number of pixels in a pixel array, Is shown intVariance of time gaussian distribution, +.>Is shown int-1The variance of the gaussian distribution of the moment in time,ρrepresents the learning rate for controlling the degree of influence of the new observed data on the gaussian distribution pattern,αthe distribution weight is represented by a distribution weight,G()representing a gaussian distributed density function.
The learning rate is a parameter used for controlling the influence degree of new observed data on Gaussian distribution mode updating in a Gaussian mixture model, the weight of the current observed value on model parameter updating is adjusted in the formula, the distribution weight is the contribution weight of each Gaussian distribution in the Gaussian mixture model in the overall model, and the relative importance of each Gaussian distribution in the overall mixture model is represented in the formula.
In the function, the introduction of the learning rate enables the model to adapt to new observed data instead of completely discarding the Gaussian distribution model which is not successfully matched, the self-adaption is helpful for the system to adapt to scene changes better, especially in the long-time video splicing process, the influence of the observed errors on the model updating can be reduced to a certain extent by using random pixel points in the updating process, the interference of local noise on the model is helped to be reduced by considering randomness, and as each frame possibly has different moving objects and scene changes, the Gaussian distribution model which is dynamically evolved is more in line with the actual situation, and in combination, the use of the function is helpful for improving the robustness, the adaptability and the accuracy of the model, so that the model is more suitable for complex and changed video scenes.
S2, judging whether the current image can use the optimal suture line of the previous frame of image according to the foreground image, if not, determining the optimal suture line corresponding to the current image through a dynamic programming algorithm, and if so, taking the optimal suture line of the previous frame of image as the optimal suture line of the current image;
the optimal seam is the boundary found between two adjacent images that is most suitable for stitching, and the choice of this line is important to maintain the naturalness and consistency of the overall stitching effect.
In an alternative embodiment of the present invention,
judging whether the current image can use the optimal suture line of the previous frame of image according to the foreground image, if not, determining the optimal suture line corresponding to the current image through a dynamic programming algorithm, and if so, taking the optimal suture line of the previous frame of image as the optimal suture line of the current image comprises:
acquiring the foreground image, obtaining a binary image of a marked foreground image according to a preset background segmentation algorithm, and determining whether a pixel point exists in an optimal suture line of the previous frame image or not according to the binary image;
If so, determining the optimal suture line of the current image by a preset dynamic programming algorithm along the optimal suture line of the previous frame image;
and if the pixel point is not positioned in the foreground image, taking the optimal suture line of the previous frame image as the optimal suture line of the current image.
The background segmentation algorithm is a method for extracting foreground objects in an image or a video, is generally used for segmenting a moving object from a static or near-static background, the dynamic programming algorithm is a solution method for optimizing a problem, is generally used for solving a problem with optimal substructure properties, decomposes the problem into mutually overlapped sub-problems, solves the sub-problems in a recursive manner, and finally combines the solutions of the sub-problems to obtain the solution of the original problem.
Acquiring a foreground image in a current image, dividing the foreground image into binary images by using a preset background dividing algorithm through a threshold value, analyzing the binary images, and judging whether a pixel point is positioned in the foreground image by checking whether a foreground pixel exists in the binary images;
if the pixel points are located in the foreground image, the fact that the optimal suture line of the previous frame image cannot be used is indicated, a preset dynamic programming algorithm is used for optimizing based on factors such as cost and consistency among the pixel points, and the optimal suture line is determined for the current image;
If no pixel point is located in the foreground image, the optimal suture line of the previous frame image can be used, and finally the optimal suture line of the current image is obtained according to the method.
In this embodiment, background segmentation is performed by using a preset deep learning model, so that a foreground image can be extracted more accurately, and if there are pixels in the foreground image, that is, there is a foreground region in the binary image after background segmentation, it is indicated that the optimal seam of the previous frame image may be affected by the foreground. The optimal suture line of the current image is recalculated by using a dynamic programming algorithm, so that the error transmission can be reduced, the overall splicing effect is improved, and the system can select whether to use the optimal suture line of the previous frame image according to the actual situation by judging the foreground image. The mechanism improves the robustness of the algorithm, so that the method can obtain better effects in various scenes, and in combination, the embodiment combines the background segmentation and dynamic programming algorithm of deep learning, so that the video stitching system can better cope with foreground changes in complex scenes, and the adaptability and the stability of the stitching effect are improved.
In an alternative embodiment, determining the optimal suture line of the current image through a preset dynamic programming algorithm comprises:
Determining a random suture line from a plurality of suture lines which are obtained in advance, connecting a current pixel point, and taking the current pixel point as a starting point of the random suture line;
calculating the similarity value of the pixel points of each column of the first row in the current image, and expanding downwards to calculate the accumulated similarity measurement value of the random suture line;
comparing the accumulated similarity measurement values of three pixel points at the leftmost end, the right side and the leftmost end of the adjacent upper row of the current pixel point of each row in the current image, and connecting the pixel point with the minimum accumulated similarity measurement value with the current pixel point;
determining the propagation direction reaching the current pixel point, calculating the accumulated similarity measurement value corresponding to all the pixel points in the current image, finding out the minimum value in the accumulated similarity measurement value, and obtaining the corresponding pixel point as the end point of the random suture line;
and backtracking is carried out according to the propagation direction of the current pixel point until the current image reaches the first row, and the starting point and the ending point of the random suture line are combined to determine the optimal suture line.
Before an image stitching task starts, a plurality of stitching lines are obtained in advance, one stitching line is randomly selected from the plurality of stitching lines obtained in advance to serve as a current random stitching line, a current pixel point serves as a starting point of the random stitching line, similarity values of pixel points in each column of a first row in the current image and the starting point are calculated through structural similarity indexes, and from the starting point, the accumulated similarity measurement value of each pixel point along the random stitching line is calculated in a downward expansion mode;
For the current pixel point of each row, the accumulated similarity measure values at the three pixel points at the leftmost end, the right above and the rightmost end of the adjacent upper row are compared. Selecting the pixel point with the smallest accumulated similarity measurement value, and connecting the current pixel point with the pixel point;
according to the connected pixel points, the propagation direction (left, upper and right) is determined, the accumulated similarity measurement value of each pixel point in the propagation direction is calculated, the pixel point with the smallest accumulated similarity measurement value is found in the propagation direction and used as the end point of the random suture line, the connected pixel points are traced back according to the propagation direction until the first line of the current image is traced back, and the optimal suture line is determined by combining the start point and the end point of the random suture line.
Illustratively, assume that we have two images, the first row of pixel values for image 1: [50, 60, 70, 80, 90], first line pixel values of image 2: [55, 65, 75, 85, 95], randomly selecting one suture from a plurality of previously acquired sutures as an initial random suture, assuming that the selected random suture is: (1, 1) - > (2, 2) - > (3, 3) - > (4, 4) - > (5, 5), wherein (i, j) represents the pixel point of the ith row and the jth column, and for each row, calculating the similarity measurement value of the adjacent pixel points of the row. Taking the first row as an example, the similarity of (1, 1) and (1, 2) and (2, 2) is calculated, the similarity of (2, 2) and (2, 3) and (3, 3) is calculated again, and the calculated similarity measurement value is accumulated in each row. For the first row, the accumulated metric is the sum of the similarity metrics of (1, 1) to (5, 5), and the accumulated similarity metric is finally obtained. And comparing the measurement values of the adjacent pixel points, determining the propagation direction, and finding the pixel point with the minimum value as an end point. Finally, back to the first row, the optimal suture is determined in combination with the start and end points of the random suture.
The accumulated similarity measure is an index used in image processing to measure the similarity between two regions (or pixels), and in video stitching, the accumulated similarity measure is typically used to evaluate the similarity in color, brightness, or other characteristics of overlapping regions of two frames of images.
In this embodiment, a certain randomness is introduced by randomly selecting an initial random suture from a plurality of previously acquired sutures, which is helpful for the system to maintain a certain diversity among different frames, and is very important for processing complex and dynamic video scenes, because a plurality of reasonable sutures may exist among different frames, by comparing the accumulated similarity metric values of three pixels of adjacent upper rows of pixel points in each row in the current image, the minimum value is selected for connection, which is helpful for determining the connection direction, and preferentially selecting a path with higher similarity, which can reduce the probability of connection to an error region, improve the accuracy of the suture, and in combination, the embodiment improves the adaptability of the system to different scenes and content changes by combining the feature extraction and randomness selection suture method, and is helpful for generating a more adaptive and natural video splicing result.
In an alternative embodiment, the dynamic programming algorithm is represented by the following formula:
wherein,maska binarized foreground image is obtained by filtering the background of the video image,lfor each line of the detection range,E(x,y)representing position as a state function in dynamic programming(x,y)At the optimum value of the parameter,E(k,y-1)representing the position in the previous rowkAt the optimum value of the parameter,s(x,y)is indicated in the position(x,y)Local cost at that location.
In this function, by calculating the optimal value of the current position taking into account the local cost and the optimal value of the previous line, the system is allowed to adaptively determine the optimal path according to the local and global information, and in video stitching, the scene may become complex, such as illumination change, occlusion, and the like. By taking the optimal value of the previous line into account, the dynamic programming can cope with these complications to a certain extent, ensuring that the selected path is optimal as a whole, and by taking the optimal value of the previous line into account when calculating the optimal value of the current position, the dynamic programming can effectively reduce error propagation. In sum, the function improves the solving capability of the system to the video stitching problem by combining local and global information, improves the accuracy and the robustness of the characteristics learned by the deep learning model to the calculation cost and the optimal path, and makes the algorithm more suitable for complex and dynamic video scenes
S3, determining an overlapping area of two adjacent frames of pictures according to the optimal suture line, judging whether synthetic artifacts exist according to the overlapping area, if so, generating a synthetic artifact set, and eliminating the synthetic artifacts according to the synthetic artifact set by a multi-target synthetic artifact identification method to obtain a spliced image.
The synthetic artifact refers to an artificially introduced visual artifact generated during image synthesis or stitching, and is usually caused by some deficiency or error in the image processing algorithm or the synthesis process. The set of synthetic artifacts is a set of visual artifacts created during image stitching.
In an alternative embodiment of the present invention,
determining an overlapping area of two adjacent frames of pictures according to the optimal suture line, judging whether synthetic artifacts exist according to the overlapping area, if so, generating a synthetic artifact set, eliminating the synthetic artifacts through a multi-target synthetic artifact identification method according to the synthetic artifact set, and obtaining a spliced image comprises the following steps:
dividing an overlapping region of two adjacent frames of pictures according to a preset grid division strategy, calculating pixel differences of adjacent pixels in the overlapping region, determining whether a target object exists according to the pixel differences in combination with a preset judging threshold, if the pixel differences are larger than the judging threshold, considering that the target object exists in the overlapping region, marking the target object in the overlapping region as an artifact object, and generating a synthetic artifact set;
According to the synthetic artifact set, performing spatial mapping on each artifact object of the synthetic artifact set, and mapping each artifact object into the space of two adjacent frames of images respectively;
calculating the spatial distance of the same artifact object in the space of two adjacent frames of images after mapping, judging by combining with a preset distance threshold, and if the spatial distance is larger than the distance threshold, marking the object as a synthetic artifact;
and based on an image interpolation technology, taking a pixel point adjacent to the synthetic artifact as a replacement area, interpolating the pixel point adjacent to the synthetic artifact into the synthetic artifact area, and eliminating the synthetic artifact to obtain the spliced image.
The space of two adjacent frames of images refers to the space of the whole scene formed by two adjacent images, and specifically, two adjacent frames of images exist under the background of image stitching. The two frames of images may be consecutive video frames, adjacent time-sliced images, or images of adjacent locations in the scene, the artifact being a visual flaw generated during image stitching and typically appearing as an abnormal image portion in the overlapping region.
Grid division is carried out on the overlapping area of two adjacent frames of pictures through uniform division in the horizontal and vertical directions, grid units after division are pixel blocks, pixels in each grid unit are compared, and pixel differences between adjacent pixels are calculated through gray level differences; a determination threshold is preset for determining whether the pixel difference is large enough to determine whether a target object exists, the pixel difference calculated by each grid cell is compared with the determination threshold, if the pixel difference is larger than the determination threshold, the target object is considered to exist in the grid cell, the grid cells containing the target object are marked, in the grid cells marked with the target object, the target object information in the areas, such as information of position, size, degree of the pixel difference and the like, can be recorded, the target object in the overlapped area is marked as an artifact object and put into a set, and the set is marked as a synthetic artifact set.
Traversing each artifact object in the synthetic artifact set, acquiring the position, the size and other related information of each artifact object in an original image, mapping each artifact object into the space of two adjacent frames of images by using artifact object information in the synthetic artifact set through a space mapping method such as affine transformation, ensuring that the position and the size of the artifact object in the two adjacent frames of images are kept consistent, and overlapping the mapped object to the corresponding position in the two adjacent frames of images through image superposition.
For each artifact object, acquiring position information of the artifact object in the mapped adjacent two-frame images, calculating the spatial distance of the object in the adjacent two-frame images by using Euclidean distance according to the acquired position information of the artifact object, presetting a distance threshold value for determining whether the spatial distance of the same object in the adjacent two-frame images after mapping is small enough, comparing the spatial distance of the same object in the adjacent two-frame images with the set distance threshold value for each object, and if the spatial distance is larger than the threshold value, considering that the artifact object is likely to be a synthetic artifact, marking the artifact object, and adding the marked object into the synthetic artifact set after judging the spatial distance. This set contains all artifact objects that are determined to be composite artifacts.
Traversing each artifact object in the synthetic artifact set, for each artifact object, acquiring position information of the artifact object in two adjacent frames of images, including mapped positions and sizes, for each artifact object, determining an area adjacent to the artifact object as a replacement area, wherein the replacement area comprises pixels around the artifact object, and interpolating the pixels in the replacement area to the area of the artifact object by using an image interpolation technology.
Illustratively, for each pixel in the artifact object region, calculating the relative distances of four replacement region pixels adjacent to the pixel, calculating weights according to the distances, performing bilinear interpolation on each pixel in the artifact object region by using the calculated interpolation weights, updating the interpolated pixel values into the artifact object region, and performing interpolation operation on each pixel in the replacement region.
And after the interpolation operation is finished, carrying out boundary processing on the synthesized artifact area due to discontinuity at the boundary to obtain the spliced image.
In this embodiment, the overlapping area of two adjacent frames of images is divided by a preset grid division strategy, and the pixel difference of the adjacent pixels is calculated. By combining with a preset judging threshold, the overlapping area where the target object possibly exists, namely the artifact, can be effectively detected, the detected artifact object is mapped in space and mapped into the space of two adjacent frames of images, so that the consistency of the positions and the shapes of the artifact in the two frames of images is facilitated, the mapping inaccuracy caused by the change of the visual angle or the distortion of the image is avoided, the pixel points adjacent to the synthesized artifact are replaced by utilizing the image interpolation technology, the synthesized artifact is eliminated, the artifact area is smoothly repaired, and the overall consistency of the image is maintained.
In summary, the present embodiment enables, through effective technical means such as artifact detection, spatial mapping, distance determination, and image interpolation, the artifact to be accurately identified and eliminated in the video stitching process, thereby improving quality and fidelity of video stitching.
Fig. 2 is a schematic flow chart of an adaptive video stitching system based on deep learning according to an embodiment of the present invention, as shown in fig. 2, where the system includes:
the method comprises the steps of a first unit, acquiring an initial video, selecting a pixel point to be matched of a current image in the initial video according to the initial video, combining a preset background filtering algorithm, matching the pixel point to be matched with a Gaussian distribution mode acquired based on a Gaussian mixture model, and determining a foreground image;
the second unit judges whether the current image can use the optimal suture line of the previous frame of image according to the foreground image, if not, the optimal suture line corresponding to the current image is determined through a dynamic programming algorithm, and if so, the optimal suture line of the previous frame of image is used as the optimal suture line of the current image;
and the third unit is used for determining the overlapping area of two adjacent frames of pictures according to the optimal suture line, judging whether synthetic artifact exists according to the overlapping area, generating a synthetic artifact set if the synthetic artifact exists, and eliminating the synthetic artifact according to the synthetic artifact set by a multi-target synthetic artifact identification method to obtain a spliced image.
The present invention may be a method, apparatus, system, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for performing various aspects of the present invention.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims (9)

1. The adaptive video stitching method based on the deep learning is characterized by comprising the following steps of:
acquiring an initial video, selecting a pixel point to be matched of a current image in the initial video according to the initial video, combining a preset background filtering algorithm, matching the pixel point to be matched with a Gaussian distribution mode acquired based on a Gaussian mixture model, and determining a foreground image;
Judging whether the current image can use the optimal suture line of the previous frame of image according to the foreground image, if not, determining the optimal suture line corresponding to the current image through a dynamic programming algorithm, and if so, taking the optimal suture line of the previous frame of image as the optimal suture line of the current image;
determining an overlapping area of two adjacent frames of pictures according to the optimal suture line, judging whether synthetic artifact exists according to the overlapping area, if so, generating a synthetic artifact set, and eliminating the synthetic artifact according to the synthetic artifact set by a multi-target synthetic artifact identification method to obtain a spliced image;
determining an overlapping area of two adjacent frames of pictures according to the optimal suture line, judging whether synthetic artifacts exist according to the overlapping area, generating a synthetic artifact set if the synthetic artifacts exist, eliminating the synthetic artifacts through a multi-target synthetic artifact identification method according to the synthetic artifact set, and obtaining a spliced image comprises the following steps:
dividing an overlapping region of two adjacent frames of pictures according to a preset grid division strategy, calculating pixel differences of adjacent pixels in the overlapping region, determining whether a target object exists according to the pixel differences in combination with a preset judging threshold, if the pixel differences are larger than the judging threshold, considering that the target object exists in the overlapping region, marking the target object in the overlapping region as an artifact object, and generating a synthetic artifact set;
According to the synthetic artifact set, performing spatial mapping on each artifact object of the synthetic artifact set, and mapping each artifact object into the space of two adjacent frames of images respectively;
calculating the spatial distance of the same artifact object in the space of two adjacent frames of images after mapping, judging by combining with a preset distance threshold, and if the spatial distance is larger than the distance threshold, marking the object as a synthetic artifact;
and based on an image interpolation technology, taking a pixel point adjacent to the synthetic artifact as a replacement area, interpolating the pixel point adjacent to the synthetic artifact into the synthetic artifact area, and eliminating the synthetic artifact to obtain the spliced image.
2. The method of claim 1, wherein obtaining an initial video, selecting a pixel to be matched of a current image in the initial video according to the initial video, matching the pixel to be matched with a gaussian distribution pattern obtained based on a mixed gaussian model in combination with a preset background filtering algorithm, and determining a foreground image comprises:
for any pixel point of a current image in a video, determining a probability density function of the pixel point through a mixed Gaussian model, and matching all Gaussian distribution modes of the pixel point and the current moment according to the probability density function to obtain a matching result;
If the matching is successful, the pixel point is marked as a point in the background;
if the matching fails, the pixel point is marked as a point in the foreground;
if at least one Gaussian distribution pattern is successfully matched with the pixel points, updating the Gaussian distribution pattern which is not successfully matched;
and marking the area where the moving object is located in the current image according to the matching result, and performing expansion and corrosion operation on the area where the moving object is located in the current image to obtain a foreground image.
3. The method of claim 2, wherein updating the non-matching successful gaussian distribution pattern is as follows:
wherein,μ t is shown intThe average value of the time of day,x t representing a random number of pixels in a pixel array,is shown intThe variance of the gaussian distribution of the moment in time,is shown int-The variance of the gaussian distribution at time 1,σ t representation oftThe standard deviation of the time-of-day gaussian distribution,ρrepresents the learning rate for controlling the degree of influence of the new observed data on the gaussian distribution pattern,αthe distribution weight is represented by a distribution weight,G()representing a gaussian distributed density function.
4. The method according to claim 1, wherein determining whether the current image can follow the best seam of the previous frame image according to the foreground image, if not, determining the best seam corresponding to the current image through a dynamic programming algorithm, and if so, taking the best seam of the previous frame image as the best seam of the current image comprises:
Acquiring the foreground image, obtaining a binary image of a marked foreground image according to a preset background segmentation algorithm, and determining whether a pixel point exists in an optimal suture line of the previous frame image or not according to the binary image;
if so, determining the optimal suture line of the current image by a preset dynamic programming algorithm along the optimal suture line of the previous frame image;
and if the pixel point is not positioned in the foreground image, taking the optimal suture line of the previous frame image as the optimal suture line of the current image.
5. The method of claim 4, wherein determining the optimal stitch line for the current image by a preset dynamic programming algorithm comprises:
determining a random suture line from a plurality of suture lines which are obtained in advance, connecting a current pixel point, and taking the current pixel point as a starting point of the random suture line;
calculating the similarity value of the pixel points of each column of the first row in the current image, and expanding downwards to calculate the accumulated similarity measurement value of the random suture line;
comparing the accumulated similarity measurement values of three pixel points at the leftmost end, the right side and the leftmost end of the adjacent upper row of the current pixel point of each row in the current image, and connecting the pixel point with the minimum accumulated similarity measurement value with the current pixel point;
Determining the propagation direction reaching the current pixel point, calculating the accumulated similarity measurement value corresponding to all the pixel points in the current image, finding out the minimum value in the accumulated similarity measurement value, and obtaining the corresponding pixel point as the end point of the random suture line;
and backtracking is carried out according to the propagation direction of the current pixel point until the current image reaches the first row, and the starting point and the ending point of the random suture line are combined to determine the optimal suture line.
6. The method of claim 1, wherein the dynamic programming algorithm is represented by the formula:
wherein,maska binarized foreground image is obtained by filtering the background of the video image,lfor each line of the detection range,E (x,y)representing position as a state function in dynamic programming(x,y)At the optimum value of the parameter,E(k,y-1)representing the position in the previous rowkAt the optimum value of the parameter,s(x,y)is indicated in the position(x,y)Local cost at that location.
7. A deep learning based adaptive video stitching system for implementing the deep learning based adaptive video stitching method according to any one of the preceding claims 1-6, comprising:
the method comprises the steps of a first unit, acquiring an initial video, selecting a pixel point to be matched of a current image in the initial video according to the initial video, combining a preset background filtering algorithm, matching the pixel point to be matched with a Gaussian distribution mode acquired based on a Gaussian mixture model, and determining a foreground image;
The second unit judges whether the current image can use the optimal suture line of the previous frame of image according to the foreground image, if not, the optimal suture line corresponding to the current image is determined through a dynamic programming algorithm, and if so, the optimal suture line of the previous frame of image is used as the optimal suture line of the current image;
and the third unit is used for determining the overlapping area of two adjacent frames of pictures according to the optimal suture line, judging whether synthetic artifact exists according to the overlapping area, generating a synthetic artifact set if the synthetic artifact exists, and eliminating the synthetic artifact according to the synthetic artifact set by a multi-target synthetic artifact identification method to obtain a spliced image.
8. An electronic device, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to invoke the instructions stored in the memory to perform the method of any of claims 1 to 6.
9. A computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the method of any of claims 1 to 6.
CN202311576584.2A 2023-11-24 2023-11-24 Adaptive video stitching method and system based on deep learning Active CN117336620B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311576584.2A CN117336620B (en) 2023-11-24 2023-11-24 Adaptive video stitching method and system based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311576584.2A CN117336620B (en) 2023-11-24 2023-11-24 Adaptive video stitching method and system based on deep learning

Publications (2)

Publication Number Publication Date
CN117336620A CN117336620A (en) 2024-01-02
CN117336620B true CN117336620B (en) 2024-02-09

Family

ID=89275972

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311576584.2A Active CN117336620B (en) 2023-11-24 2023-11-24 Adaptive video stitching method and system based on deep learning

Country Status (1)

Country Link
CN (1) CN117336620B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101853524A (en) * 2010-05-13 2010-10-06 北京农业信息技术研究中心 Method for generating corn ear panoramic image by using image sequence
CN113221665A (en) * 2021-04-19 2021-08-06 东南大学 Video fusion algorithm based on dynamic optimal suture line and improved gradual-in and gradual-out method
CN114913064A (en) * 2022-03-15 2022-08-16 天津理工大学 Large parallax image splicing method and device based on structure keeping and many-to-many matching
CN116109484A (en) * 2023-02-03 2023-05-12 武汉大学 Image splicing method, device and equipment for retaining foreground information and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150371420A1 (en) * 2014-06-19 2015-12-24 Samsung Electronics Co., Ltd. Systems and methods for extending a field of view of medical images

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101853524A (en) * 2010-05-13 2010-10-06 北京农业信息技术研究中心 Method for generating corn ear panoramic image by using image sequence
CN113221665A (en) * 2021-04-19 2021-08-06 东南大学 Video fusion algorithm based on dynamic optimal suture line and improved gradual-in and gradual-out method
CN114913064A (en) * 2022-03-15 2022-08-16 天津理工大学 Large parallax image splicing method and device based on structure keeping and many-to-many matching
CN116109484A (en) * 2023-02-03 2023-05-12 武汉大学 Image splicing method, device and equipment for retaining foreground information and storage medium

Also Published As

Publication number Publication date
CN117336620A (en) 2024-01-02

Similar Documents

Publication Publication Date Title
US7379583B2 (en) Color segmentation-based stereo 3D reconstruction system and process employing overlapping images of a scene captured from viewpoints forming either a line or a grid
US8089515B2 (en) Method and device for controlling auto focusing of a video camera by tracking a region-of-interest
US8385630B2 (en) System and method of processing stereo images
US8605787B2 (en) Image processing system, image processing method, and recording medium storing image processing program
EP0624981B1 (en) Motion vector detecting circuit
EP2221763A1 (en) Image generation method, device, its program and recording medium stored with program
US7840070B2 (en) Rendering images based on image segmentation
CN111127376A (en) Method and device for repairing digital video file
JP4296617B2 (en) Image processing apparatus, image processing method, and recording medium
JP2007053621A (en) Image generating apparatus
Cho et al. Extrapolation-based video retargeting with backward warping using an image-to-warping vector generation network
KR102223754B1 (en) Method and Apparatus for Enhancing Face Image
CN117336620B (en) Adaptive video stitching method and system based on deep learning
CN113269790A (en) Video clipping method and device, electronic equipment, server and storage medium
CN110717910B (en) CT image target detection method based on convolutional neural network and CT scanner
CN111932600A (en) Real-time loop detection method based on local subgraph
CN114419102B (en) Multi-target tracking detection method based on frame difference time sequence motion information
JP2004519048A (en) Method and apparatus for improving object boundaries extracted from stereoscopic images
US9380285B2 (en) Stereo image processing method, stereo image processing device and display device
CN111639642B (en) Image processing method, device and apparatus
JP2011113177A (en) Method and program for structuring three-dimensional object model
CN112052859A (en) License plate accurate positioning method and device in free scene
CN117474959B (en) Target object motion trail processing method and system based on video data
GB2358307A (en) Method of determining camera projections in 3D imaging having minimal error
CN117221466B (en) Video stitching method and system based on grid transformation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant