CN112950466A

CN112950466A - Image splicing method based on semantic object matching

Info

Publication number: CN112950466A
Application number: CN202110104851.0A
Authority: CN
Inventors: 周忠; 李萌; 吕伟; 杨硌; 梅澜
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2021-01-26
Filing date: 2021-01-26
Publication date: 2021-06-11

Abstract

The invention discloses an image splicing method based on semantic object matching, which is suitable for images or video frames with overlapping areas. The image splicing method realized by introducing the semantic information can improve the effect of a large scene or panoramic splicing result, and the splicing result has a semantic label and can be better used for subsequent image or video analysis, thereby realizing the information association and scientific management and utilization of related images or videos.

Description

Image splicing method based on semantic object matching

Technical Field

The invention belongs to the technical field of image processing, and relates to feature extraction, semantic matching, image matching and image splicing, in particular to an image splicing method based on semantic object matching and introducing image high-level semantic information into an image splicing frame.

Background

With the rapid increase of the number of videos and the fragmentation of video contents, people can hardly acquire effective information from the videos rapidly, and image splicing is one of effective means for solving the video fragmentation. Image stitching may combine video frames with overlapping regions into video frames with a wider field of view or panorama, thereby enabling information correlation of the videos. Image stitching is generally divided into two categories: pixel-based image stitching methods (direct methods) and feature-based image stitching methods. The pixel method is based on the image, and the transformation parameters are estimated by utilizing the pixel information of the images such as depth, gradient, color, geometry and the like to realize the image deformation alignment, so that the image splicing work is realized. However, the preprocessing and calculation processes based on the pixel method are complicated, all pixels in the overlapped area need to be calculated, and the method is limited by strict conditions that the scene is approximate to a plane and the optical centers are nearly consistent, so that the method is mainly used for solving the image splicing problem of simple scenes. The steps based on the feature method comprise image matching, image alignment and image fusion, and sparse feature points are used for estimating geometric transformation between image pairs, but under the scenes with few textures, large parallax or wide base lines, the traditional bottom-layer feature matching method is insufficient in robustness, and uniform and high-quality matching point pairs cannot be obtained, so that the image splicing result generates artifacts, ghosts or misalignment.

Aiming at the defects of the current image splicing method, the invention provides an image splicing method based on semantic object matching. With the great success of deep learning in the field of computer vision, high-level semantic features can be extracted from a deep network, the features are more robust to the apparent difference and shape change of an image, the features have invariance to low-level visual features such as depth, gradient, corners and color, and the like, and have high-level semantic information of the image, so that the robustness of image matching is improved to a certain extent, but accurate matching and positioning at an image pixel level cannot be obtained. In consideration of the respective characteristics of the traditional characteristics and the semantic characteristics, the method combines the accuracy of the bottom layer visual characteristics and the robustness of the high-layer semantic characteristics, introduces the high-layer semantic information of the image into an image splicing frame, improves the accuracy and the reliability of image matching in the image splicing step, and further improves the image splicing effect. The image splicing method realized by introducing the semantic information can improve the effect of a large scene or panoramic splicing result, and the splicing result has a semantic label and can be better used for later image or video analysis, so that the information association and scientific management and utilization of related images or videos are realized.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: under the framework of image splicing, the image splicing effect is improved. Aiming at the problems, the image splicing method based on semantic object matching, which introduces the high-level semantic information of the image into the frame of image splicing, is provided, so that the accuracy and reliability of image matching in the image splicing step are improved, and the image splicing effect is improved.

In order to solve the technical problems, the image stitching method based on semantic object matching adopts the following technical scheme: semantic matching pairs among instance objects are obtained by adopting a semantic matching algorithm in an image splicing frame, so that high-level semantic information of the image is introduced, the robustness and accuracy of feature matching in image splicing are improved, and the image splicing effect is further improved.

The method comprises the steps of obtaining information of example objects in an image by using image segmentation, constructing a spatial relation graph for the example objects, obtaining the same example objects through the spatial relation graph, and obtaining pixel-level semantic matching pairs through a semantic matching module pair by pair.

And the bottom layer visual feature matching pair is screened and optimized according to the segmentation result and the semantic matching pair to obtain a final matching pair, the final matching pair is input into a grid optimization framework to deform and align the images, and then the images are spliced by using an image fusion algorithm.

The image segmentation is to obtain the mask, the labeling frame and the label information of the instance object in the image through a general instance object detection segmentation frame of the image or the video frame to be spliced.

The method comprises the steps of constructing a space relation graph, judging topological relations and directional relations among all example objects according to position information of the example objects, wherein the related topological relations are separated and intersected, and the directional relations adopt an eight-direction cone model.

The semantic matching module uses a pre-trained VGG19 convolutional neural network to perform feature similarity measurement according to geometric consistency constraint to obtain pixel-level semantic feature matching pairs, and then performs optimization screening on the bottom layer visual feature matching pairs by combining with image segmentation information to obtain final feature matching pairs.

Wherein the mesh optimization framework to achieve image alignment comprises: firstly, dividing an image into uniform initial grids, carrying out iteration minimization solving on an energy function, continuously updating the grids in an iteration process until the maximum iteration times is reached to obtain a final grid, and carrying out deformation alignment on the image according to the final grid.

Wherein the energy function includes an alignment constraint, a scale constraint, a smoothing constraint, and a straight line constraint.

The image splicing is to merge corresponding pixels of an overlapping area in an image to be spliced after the aligned image is obtained, and to retain information of the pixels in a non-overlapping area, so that macroscopic light and shade change and visible seams are eliminated, and an image splicing result is more natural.

Compared with the prior art, the invention has the advantages that: the semantic matching algorithm is introduced into the framework of image splicing to obtain the semantic matching pairs among the example objects, so that the image splicing method based on semantic object matching of the high-level semantic information of the image can improve the robustness and accuracy of image matching in the image splicing step, and the image splicing effect is improved. According to the invention, the image splicing realized by introducing the semantic information can not only obtain the effect of splicing results of large scenes or panoramas, but also the splicing results have semantic labels and can be better used for subsequent image or video analysis, thereby realizing the information association and scientific management and utilization of related images or videos.

Drawings

FIG. 1 is a flow chart of an image stitching method based on semantic object matching according to the present invention;

FIGS. 2a and 2b are exemplary diagrams of image segmentation results in an image stitching method based on semantic object matching according to the present invention;

FIGS. 3a and 3b are illustrations of spatial relationship diagrams in an image stitching method based on semantic object matching according to the present invention;

FIG. 4 is a flow chart of semantic matching modules in the image stitching method based on semantic object matching according to the present invention;

FIG. 5 is a flowchart of a mesh optimization framework for image alignment in an image stitching method based on semantic object matching according to the present invention;

FIG. 6 is an exemplary diagram of an image stitching result in the image stitching method based on semantic object matching according to the present invention.

Detailed Description

Other aspects, features and advantages of the present invention will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, which form a part of this specification, and which illustrate, by way of example, the principles of the invention.

The invention provides an image splicing method based on semantic object matching, which realizes the technical scheme for solving the technical problem and comprises the following steps: semantic matching pairs among instance objects are obtained by adopting a semantic matching algorithm in an image splicing frame, so that high-level semantic information of the image is introduced, the robustness and accuracy of feature matching in image splicing are improved, and the image splicing effect is further improved.

As shown in fig. 1, the invention is a flow chart of an image stitching method based on semantic object matching, and the method specifically comprises the following steps:

step 1: the image segmentation is to obtain Mask, label frame and label information of an example object in an image by detecting a segmentation frame through a general object example segmentation frame (Mask R-CNN), and the segmentation result is exemplified as follows: please refer to fig. 2a for the image segmentation result of the left image to be stitched, and refer to fig. 2b for the image segmentation result of the right image to be stitched. If the image has the instance object, continuing to execute the steps in sequence, and if the image does not have the instance object, directly executing the step (3);

please refer to fig. 2a, which is an image segmentation result of the left image to be stitched, and the image segmentation result of the left image to be stitched is described in detail as follows: taking the chair on the right side in the figure as an example, the surrounding rectangle surrounding box is a labeling box, the English "chair" in the upper left corner of the labeling box is the recognized class to which the instance object belongs, the percentage "98%" in the upper left corner is the probability that the instance object belongs to the class, and the gray area covered on the instance object in the labeling box is the mask of the instance object. The detection segmentation framework may identify 80 common instance object classes, such as: chair, book, tv, display screen, mouse, keyboard, bottle, etc. The labels referred to in fig. 2b are the same, and are not described herein again. According to the method, the spatial relationship diagram is constructed for the example objects in the diagram by obtaining the information of the labeling frame, the category label and the mask through image segmentation.

Step 2: the method comprises the steps of constructing a spatial relationship diagram, judging topological relationships and directional relationships among all example objects according to position information of the example objects, wherein the topological relationships are separated dt and intersected ov, and the directional relationships adopt an eight-direction cone model.

And (3) constructing a spatial relationship diagram of each image according to the following algorithm 1, wherein the constructed spatial relationship diagram is an example, fig. 3a shows a spatial relationship diagram of a left image to be spliced, and fig. 3b shows a spatial relationship diagram of a right image to be spliced.

Please refer to fig. 3a, which is a spatial relationship diagram of the left graph to be merged, and the following describes the spatial relationship of the left graph to be merged in detail: taking the leftmost chair as an example and pointing to the arrow line of the leftmost computer, the line is annotated as "NE, dt", the former "NE" indicates that the end point is located in the northeast direction of the start point (N: normal, E: east), and the latter "dt" indicates that the topological relation between the end point instance object and the start point instance object is a distance (discrete). The annotation of the spatial relationship with the end points of the other lines at the starting points is shown in table 1 below. The labels referred to in fig. 3b are the same, and are not described herein again. According to the method, through traversing the spatial relationship graph of the image pair to be spliced, whether the class labels, the topological relationship and the spatial relationship of the two example objects are consistent or not is judged, and the same example object matching pair which is one-to-one in the image pair to be spliced is found.

Table 1: annotating meanings with spatial relationships

And step 3: image feature matching: if the image has the instance object, traversing the spatial relationship graph of the images to be spliced, finding the same instance object, namely matching the same instance object, and then inputting the instance object matching pair by pair into the semantic matching module. And if the image does not have the instance object, directly inputting the whole image into the semantic matching module. As shown in fig. 4, the semantic matching module flowchart is that a pre-trained VGG19 convolutional neural network is first configured to perform feature similarity measurement according to geometric consistency constraint to obtain pixel-level semantic feature matching pairs, and then the bottom layer visual feature matching pairs are optimized and screened by combining image segmentation information to obtain final feature matching pairs. Wherein the geometric consistency constraint comprises an apparent constraint, a relative direction constraint and a relative distance constraint. The apparent constraint condition is used for calculating the similarity degree between the characteristic points, and the calculation method adopts a cosine similarity measurement method. The relative direction constraint condition refers to that the relative directions of the salient features contained between two semantic objects to be matched which belong to the same category are almost consistent. For example, the directions between the mouth and the nose of different people are vertical up and down, and the directions of the centers of the front lamps and the centers of the front lenses of different automobiles are basically fixed. The relative distance constraint condition refers to that the relative distance between two semantic objects to be matched which belong to the same category is almost consistent.

And 4, step 4: the image alignment grid optimization framework flowchart is shown in fig. 5, and is characterized in that an image is divided into uniform initial grids, then an energy function comprising alignment constraint, scale constraint, smoothness constraint and straight line constraint is designed, iterative minimization solution is performed on the energy function, the grids are continuously updated in an iterative process until the maximum iteration number is reached to obtain a final grid, and the image is subjected to deformation alignment according to the final grid.

And 5: after the aligned images are obtained, corresponding pixels of the overlapped areas in the images to be spliced need to be merged, information of the pixels in the non-overlapped areas is kept, and visible light and shade changes and visible seams which are visible to naked eyes are eliminated, so that the image splicing result is more natural. Common image fusion algorithms include linear fusion, feathering fusion, and multi-band fusion.

The image mosaic realized by introducing semantic information can not only obtain the mosaic result of a large scene or a panorama, but also the mosaic result has a semantic label and can be better used for subsequent image or video analysis, thereby realizing the information association and scientific management and utilization of related images or videos.

Parts of the invention not described in detail are well known to those skilled in the art.

Finally, it should be noted that the above mentioned embodiments are only preferred embodiments of the present invention, and it should be noted that the present invention is not limited to the above mentioned preferred embodiments, and any other various forms of products can be obtained by anyone in the light of the present invention, but any changes in form or structure thereof, which are the same or similar to the technical solutions of the present invention, should be covered within the protection scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. An image splicing method based on semantic object matching is characterized in that: semantic matching pairs among instance objects are obtained by adopting a semantic matching algorithm in an image splicing frame, so that high-level semantic information of the image is introduced, the robustness and accuracy of feature matching in image splicing are improved, and the image splicing effect is further improved.

2. The image stitching method based on semantic object matching according to claim 1, characterized in that: the method comprises the steps of obtaining information of example objects in an image by using image segmentation, constructing a spatial relation graph for the example objects, obtaining the same example objects through the spatial relation graph, and obtaining pixel-level semantic matching pairs through a semantic matching module pair by pair.

3. The image stitching method based on semantic object matching according to claim 2, characterized in that: and screening and optimizing the bottom layer visual feature matching pair according to the segmentation result and the semantic matching pair to obtain a final matching pair, inputting the final matching pair into a grid optimization framework to deform and align the images, and then realizing image splicing by using an image fusion algorithm.

4. The image stitching method based on semantic object matching according to claim 2, characterized in that: the image segmentation is to obtain the mask, the labeling frame and the label information of the instance object in the image through a general instance object detection segmentation frame of the image or the video frame to be spliced.

5. The image stitching method based on semantic object matching according to claim 2, characterized in that: the method comprises the steps of constructing a spatial relationship diagram, judging topological relationships and directional relationships among all example objects according to position information of the example objects, wherein the related topological relationships are separated and intersected, and the directional relationships adopt an eight-direction cone model.

6. The image stitching method based on semantic object matching according to claim 2, characterized in that: the semantic matching module uses a pre-trained VGG19 convolutional neural network to perform feature similarity measurement according to geometric consistency constraint to obtain pixel-level semantic feature matching pairs, and then performs optimization screening on the bottom layer visual feature matching pairs by combining image segmentation information to obtain final feature matching pairs.

7. The image stitching method based on semantic object matching according to claim 3, characterized in that: the mesh optimization framework that achieves image alignment includes: firstly, dividing an image into uniform initial grids, carrying out iteration minimization solving on an energy function, continuously updating the grids in an iteration process until the maximum iteration times is reached to obtain a final grid, and carrying out deformation alignment on the image according to the final grid.

8. The image stitching method based on semantic object matching according to claim 7, characterized in that: the energy function includes an alignment constraint, a scale constraint, a smoothing constraint, and a straight line constraint.

9. The image stitching method based on semantic object matching according to claim 1, characterized in that: the image splicing is to merge corresponding pixels of an overlapping area in an image to be spliced after the aligned image is obtained, and to retain information of the pixels in a non-overlapping area, so that the visible light and shade change and visible seams of naked eyes are eliminated, and the image splicing result is more natural.