WO2022247394A1 - 图像拼接方法及装置、存储介质及电子设备 - Google Patents
图像拼接方法及装置、存储介质及电子设备 Download PDFInfo
- Publication number
- WO2022247394A1 WO2022247394A1 PCT/CN2022/080233 CN2022080233W WO2022247394A1 WO 2022247394 A1 WO2022247394 A1 WO 2022247394A1 CN 2022080233 W CN2022080233 W CN 2022080233W WO 2022247394 A1 WO2022247394 A1 WO 2022247394A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- optical flow
- image
- screenshot
- intermediate optical
- target
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 138
- 230000003287 optical effect Effects 0.000 claims abstract description 976
- 239000011159 matrix material Substances 0.000 claims description 127
- 238000004364 calculation method Methods 0.000 claims description 80
- 230000007704 transition Effects 0.000 claims description 76
- 230000000875 corresponding effect Effects 0.000 claims description 62
- 238000013507 mapping Methods 0.000 claims description 29
- 230000004927 fusion Effects 0.000 claims description 21
- 238000004590 computer program Methods 0.000 claims description 13
- 230000002596 correlated effect Effects 0.000 claims description 5
- 239000012528 membrane Substances 0.000 claims 1
- 238000012545 processing Methods 0.000 abstract description 10
- 230000008569 process Effects 0.000 description 32
- 238000012549 training Methods 0.000 description 29
- 230000008859 change Effects 0.000 description 14
- 238000013528 artificial neural network Methods 0.000 description 13
- 238000004422 calculation algorithm Methods 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 230000009466 transformation Effects 0.000 description 4
- 230000003247 decreasing effect Effects 0.000 description 3
- 238000009826 distribution Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000007500 overflow downdraw method Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000001131 transforming effect Effects 0.000 description 2
- 238000009827 uniform distribution Methods 0.000 description 2
- 241001465754 Metazoa Species 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000002324 minimally invasive surgery Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4038—Image mosaicing, e.g. composing plane images from plane sub-images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4046—Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2200/00—Indexing scheme for image data processing or generation, in general
- G06T2200/32—Indexing scheme for image data processing or generation, in general involving image mosaicing
Definitions
- the present application relates to the field of image processing, and in particular, relates to an image stitching method and device, a storage medium, and electronic equipment.
- Image stitching can refer to the process of stitching together multiple images with overlapping areas to obtain a seamless panoramic image.
- image stitching technology has been widely used in aerospace, medical minimally invasive surgery, medical microscopic observation and geological survey and other fields.
- image stitching can be realized by camera calibration method, but this kind of method has a large amount of calculation and needs to iteratively calculate multiple homography matrices, making the stitching process inefficient.
- the present application provides an image stitching method and device, a storage medium, and electronic equipment to improve the above technical problems.
- the present application provides an image stitching method
- the image stitching method may include: acquiring a first image and a second image; calculating a first image according to the first image and the second image an intermediate optical flow and a second intermediate optical flow, the first intermediate optical flow is the optical flow between the intermediate image and the first image, and the second intermediate optical flow is the intermediate image and the second image
- the optical flow between, the intermediate image is an image whose viewing angle is between the first image and the second image
- the size of the first intermediate optical flow is the same as the size of the first image
- the second intermediate The size of the optical flow is the same as the size of the second image
- the first image is calculated according to the first intermediate optical flow, the second intermediate optical flow, the first image, and the second image and the stitched image of the second image.
- the above image mosaic method has simple steps, and the image mosaic can be completed without complex iterative calculation of multiple homography matrices, so the efficiency of image mosaic can be improved, and the mosaic process becomes more real-time, thus having high practicality. sex.
- the size of the first intermediate optical flow calculated by this method is the same as the size of the first image
- the size of the second intermediate optical flow is the same as the size of the second image, so that it is convenient to use the first intermediate optical flow and the second intermediate optical flow
- the stream directly maps the first image and the second image, thereby quickly implementing image stitching.
- the calculating the first intermediate optical flow and the second intermediate optical flow according to the first image and the second image includes: intercepting the first image respectively and the image area containing the common picture in the second image to obtain the first screenshot and the second screenshot; input the first screenshot and the second screenshot into the optical flow calculation network to obtain the optical flow of the first screenshot and the second screenshot Screenshot optical flow, the first screenshot optical flow is the optical flow between the intermediate image and the first screenshot, and the second screenshot optical flow is the optical flow between the intermediate image and the second screenshot flow; upsampling the optical flow of the first screenshot to the size of the first image to obtain the first intermediate optical flow, and upsampling the optical flow of the second screenshot to the size of the second image size, to obtain the second intermediate optical flow.
- the intermediate optical flow can be estimated more accurately.
- the image area that does not contain a common frame in the first image and the second image In the image area, since there is no correspondence between pixels, it is difficult to estimate the intermediate optical flow. Therefore, if the complete first image and the second image are directly used for intermediate optical flow estimation, inaccurate results may be obtained.
- the first screenshot and the second screenshot are used to estimate the optical flow, and then the estimated small-size optical flow is up-sampled
- the desired intermediate optical flow can improve the accuracy of the intermediate optical flow.
- the movement law of the object reflected by the small-scale local optical flow is the same as that of the large-scale global optical flow (ie, the intermediate optical flow). Therefore, the validity of new optical flow values generated during the upsampling process can be guaranteed.
- the first image is calculated according to the first intermediate optical flow, the second intermediate optical flow, the first image, and the second image and the spliced image of the second image, including: mapping the first intermediate image according to the first intermediate optical flow and the first image, and obtaining the first intermediate image according to the second intermediate optical flow and the second image , the second intermediate image is mapped; according to the first intermediate optical flow and the second intermediate optical flow, the target optical flow is calculated; according to the target optical flow and the first intermediate image, the first splicing is obtained through mapping images, and, according to the target optical flow and the second intermediate image, map to obtain a second stitched image; according to the first stitched image and the second stitched image, stitch to obtain the stitched image.
- the images to be spliced are usually for the same target (if the content of the first image and the second image are completely irrelevant, there is generally no need to splice them), images collected under different viewing angles, and the optical flow between the two images can be It is considered to be a quantitative representation of the movement of the target in the image, which includes both the movement of the target itself and the movement of the camera position (including the shooting angle). Therefore, taking the intermediate image as a reference, the first intermediate optical flow (representing the movement of the first image relative to the intermediate image) corresponds to the angle of view at which the first image is collected, and the second intermediate optical flow (representing the movement of the second image relative to the intermediate image) Corresponding to the viewing angle of the second image.
- the target optical flow is at least based on the fusion of the first intermediate optical flow and the second intermediate optical flow, it can be considered that the target optical flow also corresponds to a special viewing angle, and the image collected under this viewing angle is fused with the first
- the information of the image and the second image can reflect the state of the collected target under different viewing angles, which is an ideal stitching image.
- the stitched image can be calculated by using the intermediate image and the target optical flow, which has high stitching quality, and improves the problems of artifacts, distortion, and difficult alignment of images to be stitched in traditional image stitching methods.
- the calculating and obtaining the target optical flow according to the first intermediate optical flow and the second intermediate optical flow may include: according to the first intermediate optical flow and the For the second intermediate optical flow, at least one transitional optical flow is calculated by interpolation; according to the first intermediate optical flow, the at least one transitional optical flow, and the second intermediate optical flow, the target optical flow is obtained through fusion.
- the optical flow between two images can be considered as a quantitative representation of the movement of the target in the image. Therefore, taking the intermediate image as a reference, the first intermediate optical flow corresponds to the viewing angle of the first image, and the second intermediate optical flow corresponds to the viewing angle of the second image.
- a transitional optical flow corresponds to a transitional viewing angle between the first image and the second image, and the image collected under the transitional viewing angle is a transitional image (there is no real collection of transitional images, the concept of transitional images is introduced here for the convenience of explaining the scheme principle).
- a virtual image acquisition process can be considered: collect the first image at a certain viewing angle, then move the camera to each transitional viewing angle to collect transitional images, and finally capture the second image, in this process, due to the adjacent viewing angles
- the parallax between is small, so the first intermediate optical flow gradually changes into each transitional optical flow, and finally changes to the second intermediate optical flow (also called smooth transition of optical flow).
- the target optical flow in the above implementation is generated by fusion of the first intermediate optical flow, at least one transitional optical flow, and the second intermediate optical flow, that is, it contains optical flow information under various viewing angles, so the target optical flow Reflects the gradient of the optical flow in the above virtual image acquisition process, so it can be considered that the target optical flow also corresponds to a special gradient angle of view, and the image collected under the gradient angle of view is fused with the first image, the second image and at least one transitional image information, so that it can fully reflect the state of the collected target at various angles of view, which is an ideal stitching image.
- the stitched image can be calculated using the intermediate image and the optical flow of the target.
- the stitched image reflects the overall picture of the captured target, it has a high stitching quality and improves the artifacts and distortions that exist in traditional image stitching methods. , It is difficult to align the images to be spliced.
- the interpolation calculation of at least one transitional optical flow according to the first intermediate optical flow and the second intermediate optical flow may include: acquiring at least one weight value; For each weight value, perform weighted summation on the first intermediate optical flow and the second intermediate optical flow to obtain the at least one transitional optical flow.
- the first intermediate optical flow and the second intermediate optical flow can be regarded as two endpoints of the interpolation operation, and the interpolation operation is to estimate the value of at least one position between the two endpoints, so as to realize a smooth transition of the optical flow.
- the weighted sum operation in the above implementation manners belongs to linear interpolation, and non-linear interpolation (for example, quadratic, cubic, reciprocal interpolation, etc.) may also be used.
- Linear interpolation has the advantage of simple operation, and in most cases, linear motion is sufficient to describe the motion of the target between the first image and the second image, and the effect of linear interpolation is also good enough.
- the magnitude of the weight value is related to the viewing angle position of the transition optical flow corresponding to the weight value.
- the viewing angle position of the transition light flow corresponding to the weight value will be considered, so that the transition light flow calculated by using the weight value will be consistent with the viewing angle position.
- the sum of the weighting coefficient of the first intermediate optical flow and the weighting coefficient of the second intermediate optical flow is 1, and the weight value is the first intermediate optical flow
- the weighting coefficient of the weight value is positively correlated with the degree of proximity between the viewing angle position of the transitional optical flow corresponding to the weighting value and the viewing angle position of the first intermediate optical flow.
- the weighting coefficient of the first intermediate optical flow can be regarded as a weight value (the weighting coefficient of the second intermediate optical flow at this time 1 minus the weight value), or the weight coefficient of the second intermediate optical flow can be regarded as the weight value (at this time, the weight coefficient of the first intermediate optical flow is 1 minus the weight value), and there is no substantial difference between the two schemes.
- the weight value is set to be larger, that is, the weighting coefficient of the first intermediate optical flow is increased and the weight coefficient of the first intermediate optical flow is decreased at the same time.
- the weighting coefficient of the second intermediate optical flow so that the value of the transition optical flow will be more affected by the first intermediate optical flow, which is consistent with the viewing angle position. And, if all weight values are set according to this law, it can be guaranteed that the calculated transitional optical flows are gradual.
- the at least one weight value may be uniformly distributed in the interval (0,1).
- the distribution of the angle of view position of the transition optical flow calculated by using these weight values between the first image and the second image is also relatively uniform , such a distribution of viewing angle positions enables the transition image to fully describe the overall picture of the captured target between the corresponding viewing angles of the first image and the second image, therefore, it can be considered that the stitched image that incorporates the information of the transition image has a high quality .
- the calculating and obtaining the target optical flow according to the first intermediate optical flow and the second intermediate optical flow may include: according to the first intermediate optical flow and the The second intermediate optical flow is calculated to obtain a preliminary optical flow; the preliminary optical flow is input into an optical flow inversion network to obtain the target optical flow.
- Optical flow inversion is different from simple matrix inversion, and the calculation process is more complicated.
- neural network is used to perform optical flow inversion operation.
- it is beneficial to simplify the operation and improve the efficiency of optical flow inversion.
- the merging to obtain the target optical flow according to the first intermediate optical flow, the at least one transitional optical flow, and the second intermediate optical flow may include: Obtaining N+2 weight matrices, where N is the total number of transitional optical flows; based on the N+2 weighting matrices, for the first intermediate optical flow, N transitional optical flows, and the second intermediate optical flow Flows are weighted and summed to obtain the preliminary optical flow; the preliminary optical flow is the target optical flow, or the preliminary optical flow is input into an optical flow inversion network to obtain the target optical flow.
- the transitional optical flow is weighted and summed using the weight matrix.
- the weight matrix is two-dimensional, so that information of different optical flows can be combined more flexibly in the preliminary optical flow.
- the preliminary optical flow can reflect the gradual change from the first intermediate optical flow to the second intermediate optical flow, and naturally the target optical flow obtained based on the preliminary optical flow can also reflect this gradual change.
- the position of the maximum value of an element in the weight matrix is related to the view position of the optical flow corresponding to the weight matrix.
- the position of the maximum value of an element in a certain weight matrix is consistent with the viewing angle position of its corresponding optical flow (which can be the first intermediate optical flow, transitional optical flow, or second intermediate optical flow) (a related kind way) as an example, that is, the optical flow value corresponding to the optical flow (also a matrix) at its viewing angle position contributes the most to the calculation of the preliminary optical flow, and the optical flow value corresponding to the other positions contributes to the calculation of the preliminary optical flow. Relatively small.
- the optical flow value corresponding to each view position is mainly contributed by the optical flow corresponding to the view position, so that the preliminary optical flow can reflect the first
- the target optical flow obtained based on the preliminary optical flow can also reflect this gradual change.
- the stitching to obtain the stitched image according to the first stitched image and the second stitched image may include: combining the first stitched image and the second stitched image The second stitched image is input into the mask calculation network to obtain a stitched mask; based on the stitched mask, the first stitched image and the second stitched image are stitched to obtain the stitched image.
- the stitching mask is used to achieve a smooth transition between the first stitched image and the second stitched image at the stitching point, and the stitching mask is not preset, but learned by the mask calculation network , so that the quality of the stitched image can be further improved.
- the first intermediate optical flow and the second intermediate optical flow are calculated using the first screenshot optical flow and the second screenshot optical flow output by the optical flow calculation network, and the target optical flow
- the flow is calculated by using the optical flow inversion network
- the spliced image is calculated by using the splicing mask output by the mask calculation network
- the method may also include: obtaining the first real screenshot optical flow, the second real screenshot optical flow, the real The target optical flow and the real stitched image; calculate the optical flow prediction loss according to the first screenshot optical flow, the second screenshot optical flow, the first real screenshot optical flow and the second real screenshot optical flow; according to the Calculate the optical flow inversion loss according to the target optical flow and the real target optical flow; calculate the image splicing loss according to the spliced image and the real spliced image; calculate the image splicing loss according to the optical flow prediction loss, the optical flow inversion loss and The image stitching loss calculates a total loss, and updates parameters of the optical flow calculation network, the optical flow in
- the above implementation method provides an end-to-end model training method, which can be used to train the image stitching model.
- the image stitching model includes the optical flow calculation network, the optical flow inversion network and the mask calculation network.
- optical flow prediction loss optical flow inversion loss
- image stitching loss that is, through training, the intermediate optical flow prediction accuracy of the model, the target optical flow prediction accuracy, and the splicing mask prediction accuracy are improved at the same time, so as to finally get
- the image stitching model can achieve high-quality image stitching.
- the target optical flow is fused according to the first intermediate optical flow, at least one transitional optical flow, and the second intermediate optical flow
- the second image may include: calculating the first image according to the intermediate image and the homography matrix, and calculating the second image according to the intermediate image and the inverse matrix of the homography matrix , the intermediate image is a real image
- the acquisition of the first real screenshot optical flow, the second real screenshot optical flow, the real target optical flow and the real spliced image may include: calculating the first real intermediate according to the homography matrix optical flow, and calculate the first real screenshot optical flow according to the first real intermediate optical flow; calculate the second real intermediate optical flow according to the inverse matrix of the homography matrix, and calculate the second real intermediate optical flow according to the second real intermediate optical flow
- the intermediate image is a real image
- the homography matrix can be specified.
- the supervisory signals for training can be calculated: real screenshot optical flow, real target optical flow, and real Stitch images. If a group of images to be stitched (including the first image and the second image) and their corresponding supervisory signals are regarded as a training sample, since the homography matrix can be specified arbitrarily, this implementation can use a small number of real images to quickly A large number of training samples can be generated efficiently, and these samples can cover different scenes, so that the trained image stitching model has good generalization ability.
- the present application provides an image stitching device, and the image stitching device may include: an image acquisition module, configured to acquire a first image and a second image; an intermediate optical flow calculation module, configured to The first image and the second image are calculated to obtain a first intermediate optical flow and a second intermediate optical flow, the first intermediate optical flow is the optical flow between the intermediate image and the first image, and the first intermediate optical flow is the optical flow between the intermediate image and the first image.
- the second intermediate optical flow is the optical flow between the intermediate image and the second image
- the intermediate image is an image whose viewing angle is between the first image and the second image
- the size of the first intermediate optical flow The size of the first image is the same
- the size of the second intermediate optical flow is the same as the size of the second image
- the image stitching module is used to stream, the first image, and the second image, and calculate a spliced image of the first image and the second image.
- the image stitching module calculates the first intermediate optical flow, the second intermediate optical flow, the first image, and the second image to obtain the first
- the spliced image of the image and the second image includes: mapping the first intermediate image according to the first intermediate optical flow and the first image, and obtaining the first intermediate image according to the second intermediate optical flow and the second Image, mapped to obtain a second intermediate image; according to the first intermediate optical flow and the second intermediate optical flow, calculate the target optical flow; according to the target optical flow and the first intermediate image, map to obtain the first stitching images, and, according to the target optical flow and the second intermediate image, mapping to obtain a second stitching image; according to the first stitching image and the second stitching image, stitching to obtain the stitching image.
- the image stitching module calculates and obtains the target optical flow according to the first intermediate optical flow and the second intermediate optical flow, including: according to the first intermediate optical flow and the For the second intermediate optical flow, at least one transitional optical flow is calculated by interpolation; according to the first intermediate optical flow, the at least one transitional optical flow, and the second intermediate optical flow, the target optical flow is obtained through fusion.
- the present application provides a computer-readable storage medium, where computer program instructions may be stored on the computer-readable storage medium, and when the computer program instructions are read and executed by a processor, the Execute the method provided by any one of the foregoing possible implementation manners.
- the present application provides an electronic device, which may include: a memory and a processor, where computer program instructions may be stored in the memory, and the computer program instructions are read by the processor When fetching and running, the method provided by any one of the foregoing possible implementation manners may be executed.
- FIG. 1 shows a possible flow of the image stitching method provided by the embodiment of the present application
- Fig. 2 shows a possible data flow of the image stitching method provided by the embodiment of the present application
- Fig. 3 shows the working principle of the image mosaic method provided by the embodiment of the present application
- Figure 4 shows the process of utilizing a fisheye camera to collect images and obtain images to be stitched
- FIG. 5 shows a possible flow of the model training method provided by the embodiment of the present application
- FIG. 6 shows a possible way of generating training samples in the model training method provided by the embodiment of the present application
- Fig. 7 shows a possible structure of the image stitching device provided by the embodiment of the present application.
- FIG. 8 shows a possible structure of the electronic device provided by the embodiment of the present application.
- FIG. 1 shows a possible flow of the image stitching method provided by the embodiment of the present application
- FIG. 2 shows a possible data flow during the execution of the image stitching method, for reference when explaining the steps of the method.
- the image mosaic method can be executed by, but not limited to, the electronic device shown in FIG. 8 .
- the method includes:
- Step S110 Acquiring the first image and the second image.
- the first image and the second image are images to be spliced, and they are respectively marked as I 0 and I 1 in FIG. 2 .
- the source of the first image and the second image is not limited, for example, it may be an image collected by a camera, or an image generated by a computer vision algorithm, etc., and the following mainly uses the case of camera collection as an example.
- the first image and the second image have the same image size, or the first image and the second image have the same size when they are collected, or the first image and the second image have different sizes when they are collected, but they are processed as same size.
- the image mosaic method proposed in this application does not limit the image content of the first image and the second image in principle, but considering the practical use of image mosaic, it is advisable to assume that the first image and the second image are aimed at the same target, from different
- the images collected under the angle of view for example, I 0 and I 1 in Figure 3 are collected under the angle of view 0 and angle of view 1, respectively.
- the target here generally refers to the objects that can be photographed, such as people, animals and plants, scenery and so on.
- the original first image and the original second image can be roughly aligned first (image alignment is an operation, ideally, the pixels corresponding to the same position of the target in the aligned two images can overlap together ), and then fill in 0 around the roughly aligned image as needed (that is, fill the pixels with a value of 0), fill it to the same size as the spliced image, and then use the 0-filled image as the first image and
- the second image is subjected to subsequent image stitching.
- the black parts of I 0 and I 1 in Figure 2 are the parts where 0 is added.
- the first wide-angle image and the second wide-angle image are first captured by the wide-angle camera, and then a global homography matrix is calculated according to the camera calibration method, and then the first wide-angle image and the second wide-angle image are combined according to the homography matrix Rough alignment is performed, and zeros are added around the roughly aligned image to obtain the final first image and second image.
- the first fish-eye image and the second fish-eye image are captured by a fish-eye camera.
- the shooting positions of the two images are the same, and the shooting directions are just opposite, as shown in the left column of Fig. 4 .
- the first fisheye image and the second fisheye image can be expanded respectively to obtain the expanded first expanded image and the second expanded image, as shown in the right column of Figure 4.
- the zero-filling operation has already been performed, and there is no need to specially perform zero-filling, and the first expanded image and the second expanded image have been roughly aligned.
- the first expanded image is divided into two parts, A1 on the left and B1 on the right, and the second expanded image is also divided into two parts, A2 on the left and B2 on the right, where A1 and A2 form a group
- the first and second images, B1 and B2 form another set of first and second images.
- Step S120 Calculate and obtain a first intermediate optical flow and a second intermediate optical flow according to the first image and the second image.
- the optical flow between two images can be considered as a quantitative representation of the movement of the target in the image.
- This kind of movement includes both the movement of the target itself and the movement of the camera that takes the two images (including the change of the shooting angle).
- the position of the camera moves, and the target is in the image.
- the position of also changes, equivalent to the movement of the target itself.
- the movement of the target makes a point on the target correspond to pixels at different positions in the two images, and the coordinate offset (a vector) between the two pixels is the optical flow value at one of the pixel positions.
- the optical flow between the two can also be regarded as an optical flow map, which can be the same size as the two images, and each pixel value in it is the above optical flow value.
- the intermediate image is a virtual image collected under the intermediate viewing angle (for example, the image I m collected under the viewing angle m in FIG. 3 ), and the intermediate viewing angle is a collection viewing angle between the first image and the second image.
- the angle of view in between, the collection of intermediate images mentioned here should be understood as the acquisition in the virtual sense, that is, if the camera is placed in the middle of the angle of view to shoot the target, the intermediate image can be collected, but such an image is not actually executed collection behavior.
- the size of the intermediate image is the same as that of the first image and the second image.
- the intermediate image is not necessarily a virtual image, but may also be a real collected image.
- the intermediate image is not necessarily a virtual image, but may also be a real collected image.
- the first intermediate optical flow refers to the optical flow between the intermediate image and the first image
- the second intermediate optical flow refers to the optical flow between the intermediate image and the second image.
- the first intermediate optical flow has two directions, one is the optical flow from the intermediate image I m to the first image I 0 , and the other is the optical flow from the first image I 0 to the intermediate image Im , which are denoted as F m ⁇ 0 and F 0 ⁇ m , only one of the directions of optical flow needs to be used in image stitching, and F m ⁇ 0 is used in Figure 2.
- the second intermediate optical flow also has two directions, one is the optical flow from the intermediate image I m to the second image I 1 , and the other is the optical flow from the second image I 1 to the intermediate image I m , respectively denoted as F m ⁇ 1 and F 1 ⁇ m , only need to use the optical flow in one of the directions when performing image stitching.
- F m ⁇ 1 is used (to be consistent with F m ⁇ 0 , the optical flow Both start from I m ).
- the size of the first intermediate optical flow is the same as that of the first image
- the size of the second intermediate optical flow is the same as that of the second image, so that the first intermediate optical flow can be used in subsequent steps.
- the flow and the second intermediate optical flow directly map the first image and the second image (the meaning of the mapping will be explained later), and then image stitching can be realized quickly.
- a pre-trained neural network may be used to estimate the first intermediate optical flow and the second intermediate optical flow by using the first image and the second image as input.
- the optical flow estimation can be effectively performed. Therefore, for the image area that contains a common picture in the first image and the second image, Because there is a corresponding relationship between pixels, it is possible to estimate the intermediate optical flow more accurately. For image areas that do not contain a common frame in the first image and the second image, it is difficult to estimate the intermediate optical flow because there is no corresponding relationship between pixels. Estimation of the flow (or although the optical flow value can be calculated, the calculated optical flow value is not accurate).
- the neural network is directly used to estimate the intermediate optical flow based on the complete first image and the second image, poor estimation results may be obtained. Therefore, in some implementations, it is possible to first intercept the image area containing the common picture in the first image and the second image, and only use the neural network to perform optical flow estimation between these areas, so as to obtain a small-sized light with high precision. flow, and then the small-scale optical flow is up-sampled to obtain the large-scale optical flow, that is, the intermediate optical flow, which is conducive to improving the estimation accuracy of the intermediate optical flow.
- the overlapping areas are often larger, while the non-overlapping areas are relatively small. Using this method is more conducive to obtaining high-quality optical flow estimation results. The following is a detailed description:
- Step a Respectively intercept the image areas containing the common frame in the first image and the second image to obtain the first screenshot and the second screenshot.
- the image area containing the common picture in the first image and the second image can be intercepted by a rectangular frame, and the image areas intercepted are called the first screenshot and the second screenshot respectively.
- the first screenshot and the second screenshot The screenshots are recorded as overlap-0 and overlap-1 respectively.
- Step b Input the first screenshot and the second screenshot into the optical flow calculation network to obtain the optical flow of the first screenshot and the second screenshot.
- the optical flow calculation network is a neural network for estimating optical flow. Its training method will be described in the introduction of Figure 5.
- the network takes the first screenshot and the second screenshot as input, and outputs the optical flow of the first screenshot and the second screenshot.
- Two screenshot optical flow the specific structure of the network is not limited.
- the first screenshot optical flow refers to the optical flow between the intermediate image (more precisely, the part of the intermediate image corresponding to the screenshot area) and the first screenshot
- the second screenshot optical flow refers to the intermediate image (more precisely, the The part corresponding to the screenshot area in the intermediate image) and the optical flow between the second screenshot, sometimes for the sake of simplicity, the two are collectively referred to as the screenshot optical flow.
- Similar to the first intermediate optical flow both the first screenshot optical flow and the second screenshot optical flow have two directions.
- the optical flow from the intermediate image to the first screenshot and the optical flow from the intermediate image to the second screenshot are used.
- the optical flow is denoted as F m ⁇ overlap-0 and F m ⁇ overlap-1 respectively.
- Step c Up-sampling the optical flow of the first screenshot to the size of the first image to obtain a first intermediate optical flow, and up-sampling the optical flow of the second screenshot to the size of the second image to obtain a second intermediate optical flow.
- the size of the optical flow of the first screenshot is the same as that of the first screenshot, which is smaller than the size of the first image, to obtain the first intermediate optical flow, it is necessary to upsample the optical flow of the first screenshot.
- Upsampling is also done. Since optical flow can also be regarded as a special image in which each pixel value is a vector, the method of upsampling optical flow can use interpolation algorithms for images, such as nearest neighbor interpolation, bilinear interpolation, bicubic interpolation, etc. , of course, some upsampling methods based on deep learning can also be used, such as DUpsampling, Meta-Upscale, and so on.
- the target motion rule reflected by the small-scale local optical flow (ie, the screenshot optical flow) and the large-scale global optical flow (ie, the intermediate light flow) are consistent. flow) are the same, thus, the validity of the new optical flow value generated by interpolation during the upsampling process can be guaranteed.
- the optical flow value calculated according to the upsampling is relatively reliable.
- optical flow values calculated according to upsampling are not accurate enough, in some implementations with splicing masks (see later), by changing the value of the pixels in the mask, it can also be achieved to a certain extent Weaken the negative impact caused by the inaccurate calculation of the optical flow value.
- the intermediate viewing angle refers to an expected viewing angle position between the corresponding viewing angle positions of the first image and the second image.
- the expected viewing angle position is determined when the optical flow calculation network is trained, that is, the training data makes the optical flow calculation network estimate the intermediate optical flow at which position (To be precise, first estimate the screenshot optical flow and then calculate the intermediate optical flow), and the trained optical flow calculation network can estimate the intermediate optical flow at any position, which is the position of the angle of view where the intermediate image is collected, that is, the position of the intermediate angle of view.
- the "middle" of the middle viewing angle can be determined by using the homography matrix, and the "middle” in the sense of projection transformation does not refer to the corresponding viewing angles of the first image and the second image center of the location.
- Step S130 According to the first intermediate optical flow, the second intermediate optical flow, the first image and the second image, calculate and obtain the spliced image of the first image and the second image.
- step S130 may further include the following sub-steps:
- Step A According to the first intermediate optical flow and the first image, map to obtain a first intermediate image, and, according to the second intermediate optical flow and the second image, map to obtain a second intermediate image.
- the first intermediate image and the second intermediate image can be understood as a part of the intermediate image (the intermediate image can be obtained after the two are spliced together), and they are recorded as I m ⁇ 0 and Im ⁇ 1 respectively in FIG. 2 .
- the optical flow reflects the coordinate offset between the pixels corresponding to the same point on the two images on the collected target. Therefore, it is known that one of the two images and the distance between the two images The optical flow of another image can be estimated. This estimation method is called warping.
- the first intermediate image may be obtained by mapping the first image according to the first intermediate optical flow.
- the first intermediate optical flow is F m ⁇ 0
- the backward warping backward warping
- the backward mapping which is widely used at present, is mainly used as an example to illustrate.
- Step B Calculate and obtain the target optical flow according to the first intermediate optical flow and the second intermediate optical flow.
- step B may further include the following sub-steps:
- Step B1 Calculate at least one transitional optical flow by interpolation according to the first intermediate optical flow and the second intermediate optical flow.
- the transitional image is a virtual image captured at a transitional viewing angle (for example, the image Iv captured at viewing angle v in Figure 3), which is between the collection viewing angles of the first image and the second image From the angle of view, the acquisition transition image mentioned here should be understood as the acquisition in the virtual sense.
- a transitional viewing angle for example, the image Iv captured at viewing angle v in Figure 3
- transitional viewing angles there are countless transitional viewing angles.
- four transitional viewing angles are shown at the bottom of FIG. 3 , named respectively as viewing angle 0.2, viewing angle 0.4, viewing angle 0.6, and viewing angle 0.8, and each transitional viewing angle corresponds to a transitional image.
- the weight values 0.2, 0.4, 0.6, and 0.8 here (see the explanation below for the definition of the weight value representation) roughly represent the positional relationship between each viewing angle, that is, starting from viewing angle 0, according to "viewing angle 0 ⁇ viewing angle 0.2 ⁇ viewing angle
- the order of 0.4 ⁇ view angle 0.6 ⁇ view angle 0.8 ⁇ view angle 1” can transition to view angle 1.
- a virtual image acquisition process can be considered: collect the first image at a certain starting angle of view (for example, angle of view 0 in Figure 3), and then move the camera to each transition angle of view ( For example, in FIG. 3 , transition images are collected at angles of view 0.2, angle of view 0.4, angle of view 0.6, and angle of view 0.8), and finally a second image is collected at a final angle of view (eg, angle of view 1 in FIG. 3 ).
- This process can be understood visually: the photographer holds a mobile phone, moves around a certain shooting target, and uses the mobile phone to continuously shoot the target from different angles during the movement.
- the image collected at each viewing angle can correspond to the optical flow between the intermediate image and the image.
- the first image corresponds to the first intermediate optical flow
- the second image corresponds to the second intermediate optical flow
- the transition image corresponds to the transition optical flow.
- the transition optical flow is the optical flow between the intermediate image and the transition image.
- it can be recorded is F m ⁇ v (of course, it may also be F v ⁇ m ).
- F m ⁇ v of course, it may also be F v ⁇ m
- the viewing angle and the image have a corresponding relationship, there is also a corresponding relationship between the viewing angle and the optical flow.
- this process can also be regarded as the process of sequentially transforming the first image into various transitional images, and finally transforming into the second image.
- the first intermediate optical flow is also transformed into each transitional optical flow in turn, and finally transformed into the second intermediate optical flow.
- the transition between different optical flows is relatively smooth, especially when more transition viewing angles are selected. Even more so.
- the first intermediate optical flow and the second intermediate optical flow are regarded as two endpoints, which can be calculated by interpolation
- the transition optical flow at any position between , the specific interpolation algorithm is not limited, for example, linear interpolation, quadratic interpolation, cubic interpolation, distance reciprocal interpolation and other methods can be used, and linear interpolation will be used as an example to introduce later, which will not be discussed here Expand the explanation.
- transitional optical flows to calculate and how many transitional optical flows to calculate it can be determined according to actual needs, but at least one transitional optical flow should be calculated. For example, in FIG. 3 , a total of 4 transitional optical flows are calculated, and these transitional optical flows are respectively located at the positions of viewing angle 0.2, viewing angle 0.4, viewing angle 0.6, and viewing angle 0.8.
- Step B2 According to the first intermediate optical flow, at least one transitional optical flow and the second intermediate optical flow, the target optical flow is obtained through fusion.
- step B1 the first intermediate optical flow, at least one transitional optical flow, and the second intermediate optical flow are gradually changed.
- "Fusion" in step B2 generally refers to an optical flow merging operation, which combines the first intermediate optical flow, at least one transitional optical flow, and the second intermediate optical flow into one optical flow, called the target optical flow, and makes The target optical flow can reflect the gradual change characteristics between each optical flow.
- Possible fusion operations include weighted summation, splicing, etc., which will be described later by using the weight matrix to achieve optical flow fusion as an example.
- the target optical flow contains the information of the optical flow at each viewing angle and reflects the gradual change characteristics of the optical flow at each viewing angle, it can be considered that the target optical flow also corresponds to a special gradient viewing angle.
- the information of the first image, the second image and at least one transitional image is fused in the image, so as to fully reflect the state of the collected target under various viewing angles. That is, the image collected under the gradient viewing angle is the ideal stitching result, that is, the stitching image between the first image and the second image to be calculated (referred to as the stitching image), so the target optical flow can also be regarded as the intermediate
- the optical flow between the image and the stitched image is used for the calculation of the stitched image.
- Gradient angle of view can be understood vividly: the photographer holds a mobile phone, starts from a starting angle of view, ends at an ending angle of view, and moves around a certain target to shoot.
- the collected images are spliced together without missing information in a certain viewing angle, and the final image can reflect the whole picture of the target between the starting viewing angle and the ending viewing angle.
- step B2 can be divided into two sub-steps:
- the preliminary optical flow is obtained by fusion; then, the preliminary optical flow is input into an optical flow inversion network, and the target output by the optical flow inversion network is obtained Optical flow, Figure 2 illustrates these two sub-steps.
- Optical flow inversion is performed for the following reasons:
- the preliminary optical flow obtained by direct fusion of them is also starting from the intermediate image, as shown in Figure 2 is F m ⁇ 0 ⁇ 1 , that is, the optical flow from the intermediate image to the spliced image.
- step C If F m ⁇ 0 ⁇ 1 is directly used as the target optical flow, then the mapping in step C (see the description of step C for details) can only use the previous Forward mapping, while forward mapping is currently not widely used due to some defects, so in the above implementation method, F m ⁇ 0 ⁇ 1 is converted into a reverse optical flow F 0 ⁇ 1 ⁇ m through optical flow inversion , that is, the optical flow of splicing the image to the intermediate image, and taking F 0 ⁇ 1 ⁇ m as the target optical flow, and the backward mapping can be used in step C. It should be understood that if there is a better forward mapping method, it is also possible to directly use F m ⁇ 0 ⁇ 1 as the target optical flow.
- optical flow inversion is different from simple matrix inversion, and its calculation process is more complicated. Therefore, in the above implementation method, neural network is used to perform optical flow inversion operation. On the one hand, it is beneficial to simplify the operation and improve the efficiency of optical flow inversion. Efficiency. On the one hand, the learning ability of the neural network can be used to improve the accuracy of optical flow inversion. The improvement of optical flow accuracy is obviously conducive to improving the quality of the subsequent stitched images. Regarding the training method of the optical flow inversion network, it will be explained when introducing Figure 5.
- the specific structure of the optical flow inversion network is not limited. For example, in some simpler implementations, L (L>1) consecutive convolutional layers can be used to form an optical flow inversion network.
- the first convolutional layer takes the preliminary optical flow as input, and the last convolutional layer outputs the target optical flow.
- Such a simple network design is very suitable for splicing images captured by wide-angle cameras or fisheye cameras. Since the shooting ranges of wide-angle cameras and fisheye cameras are relatively large, the range of motion of the target in the frame is relatively small. , so the optical flow values in each optical flow (the first intermediate optical flow, the second intermediate optical flow, and the transitional optical flow) change relatively smoothly, and it is not easy to have a large change in the optical flow value.
- the preliminary optical flow obtained by fusion The same is true. For such a preliminary optical flow, it is easy to invert, and there is no need to use a too complex network structure.
- a simple network can improve the efficiency of optical flow inversion.
- step B the transitional optical flow may not be calculated, but the target optical flow is directly obtained by fusing the first intermediate optical flow and the second intermediate optical flow (similar to the above scheme, you can first fuse the preliminary optical flow, and then use the preliminary optical flow directly as the target optical flow or use the optical flow inversion network to calculate the target optical flow). Steps B1 and B2), but are simpler to calculate.
- the target optical flow at this time corresponds to a degraded gradient view angle (transition directly from the view angle of the first image to the view angle of the second image), and the images collected under this gradient view angle are fused with the first image
- the information of the second image although there is no fusion image information, it already contains all the original information (first image and second image) for image mosaic, and can also reflect the captured target under different viewing angles.
- state which is also an ideal stitched image, which can also be calculated through the target optical flow and the intermediate image.
- Step C According to the target optical flow and the first intermediate image, map to obtain a first stitched image, and, according to the target optical flow and the second intermediate image, map to obtain a second stitched image.
- the first spliced image and the second spliced image can be understood as a part of the spliced image to be calculated in step D (the spliced image can be obtained after the two are spliced), and the two are respectively recorded as I 0 ⁇ 1 ⁇ m ⁇ in Fig. 2 0 and I 0 ⁇ 1 ⁇ m ⁇ 1 .
- the subscript 0 ⁇ 1 ⁇ m ⁇ 0 in I 0 ⁇ 1 ⁇ m ⁇ 0 is the abbreviation of (0 ⁇ 1) ⁇ (m ⁇ 0), which means using the first stitched image I m ⁇ 0 and the target optical flow F 0 ⁇ 1 ⁇ m is the result of backward mapping
- the subscript 0 ⁇ 1 ⁇ m ⁇ 1 in I 0 ⁇ 1 ⁇ m ⁇ 1 is the abbreviation of (0 ⁇ 1) ⁇ (m ⁇ 1), indicating that the second concatenation is used
- the result of backward mapping between image I m ⁇ 1 and target optical flow F 0 ⁇ 1 ⁇ m If forward mapping is adopted, it can also be represented similarly, and will not be described in detail.
- the target optical flow can be regarded as the optical flow between the intermediate image and the stitched image, so such a mapping is feasible.
- Step D According to the first stitched image and the second stitched image, stitching to obtain a stitched image of the first image and the second image.
- the first stitched image and the second stitched image have been obtained in step C, and the final stitched image can be obtained by stitching them together.
- a stitched image can be obtained by stitching the left image area of the first stitched image and the right image area of the second stitched image, which are denoted as I 0 ⁇ 1 .
- a stitching mask can be set to achieve a smooth transition between the first stitched image and the second stitched image at the stitching position, so as to improve the quality of the stitched image.
- Fig. 2 shows the steps of realizing image stitching by using a mask, and the stitching mask is recorded as mask.
- the mask calculation network is a pre-trained neural network, and the specific structure is not limited, and its training method will be described in the introduction of Figure 5.
- the input of the mask calculation network includes at least the first stitched image and the second stitched image, but does not exclude including other information, for example, target optical flow.
- the mask can be predicted more accurately, which is conducive to improving the quality of the stitched image.
- the stitching mask can also be regarded as an image whose size is the same as that of the first stitched image (or the second stitched image).
- the pixel values in the spliced mask may be values in the interval [0,1], and the specific values are calculated by the mask calculation network.
- a possible stitching mask is as follows:
- the values of the left and right columns of the stitching mask are 1 and 0 respectively, and the value of the middle two columns is 0.5, which means that the left two columns of the stitching image take the pixel values in the first stitching image, and the right two columns take the second stitching image
- the pixel values in , and the middle two columns take the average value of the pixel values in the two stitched images.
- the middle two columns are likely to be located in the image area containing the common picture in the two stitched images, and the first stitched image and the second stitched image can be smoothly transitioned in this area by taking the average value.
- the stitching mask only has a size of 6 ⁇ 6, which is not suitable for stitching the images in Figure 2 .
- the actual splicing mask shown in Figure 2 is also very similar to this example.
- the white on the left represents a pixel value of 1
- the black on the right represents a pixel value of 0
- the gray in the middle represents a pixel value of (0, 1) Values within.
- first stitched image and the second stitched image do not have to be spliced through a mask.
- the two stitched images can also be superimposed first, and then the abrupt transition part of the picture can be improved by smoothing and filtering.
- the first intermediate image and the second intermediate image can also be calculated first (similar to step A); then the first intermediate image and the second intermediate image are spliced (directly Stitching or splicing using a mask) to obtain an intermediate image; then calculate the target optical flow according to the first intermediate optical flow and the second intermediate optical flow (similar to step B); finally, map according to the intermediate image and target optical flow to obtain a spliced image. Details of each step can refer to steps A to D, and will not be repeated.
- the calculation process of the image mosaic method provided by the embodiment of the present application is relatively simple, and it does not need to operate multiple homography matrices through complex iterations like some traditional image mosaic methods, so that the image mosaic can become more real-time, and enhanced practicability.
- image stitching is achieved by computing the target optical flow. Since the target optical flow is generated by the fusion of the first intermediate optical flow, at least one transitional optical flow, and the second intermediate optical flow (it may not include the transitional optical flow, it can be analyzed similarly), so it contains the optical flow information under each viewing angle, Considering the corresponding relationship between the image and the optical flow, the stitched image calculated according to the optical flow of the target also fuses the information of the first image, the second image and at least one transition image, so that it can fully reflect the captured target in the first States at various viewing angles between the image and the second image, that is, an ideal splicing result.
- the spliced image obtained by this method is of high quality, which improves the problems of artifacts, distortion, and difficult alignment of the image to be spliced in the traditional image stitching method. It can also be effective when the first image and the second image have large parallax. for image stitching.
- the above image stitching method may be continuously applied. For example, to stitch the first image, the second image, and the third image, you can use this method to stitch the first image and the second image to obtain an intermediate stitching result, and then use this method to combine the intermediate stitching result with the third image Further splicing to obtain the final spliced image.
- At least one weight value is obtained; then, based on each weight value, weighted summation is performed on the first intermediate optical flow and the second intermediate optical flow to obtain at least one transitional optical flow.
- the number of weight values is the same as the number of transition optical flows. For example, if there are 4 weight values, 4 transition optical flows are calculated by weighted summation. The specific number can be determined according to requirements.
- the weight value can be pre-set (for example, written in a configuration file or program), and can be directly read and used when calculating the transition optical flow. Of course, the weight value can also be generated by a certain algorithm when calculating the transition optical flow.
- weighting coefficients When performing weighted summation, two weighting coefficients are required, namely the weighting coefficient of the first intermediate optical flow and the weighting coefficient of the second intermediate optical flow. These two weighting coefficients are mutually restrictive, as long as one of the weighting coefficients is known , and another weighting coefficient can be calculated accordingly.
- the weighting coefficient is limited to take a value in the interval (0,1), and the sum of the two weighting coefficients is 1.
- the weight value w whose value range is in the interval (0,1) can be used as the weighting coefficient of one of the intermediate optical flows, for example, the weighting coefficient of the first intermediate optical flow, then the weighting coefficient of the second intermediate optical flow is 1 -w.
- the calculation formula of transition optical flow can be expressed as:
- Linear interpolation has the advantage of simple operation, and in most cases (especially when the acquisition time interval between the first image and the second image is not too long), the linear motion is sufficient to describe the distance between the first image and the second image.
- the accuracy of the transition optical flow calculated by linear interpolation is also high enough.
- other non-linear interpolation methods can also be adopted.
- the size of the weight value can be set to be related to the viewing angle position of the transition optical flow corresponding to the weight value. This setting can ensure that the transition optical flow calculated by using the weight value is consistent with the viewing angle position, and then Make sure that the target optical flow calculated by using the transition optical flow can reflect the characteristics of the optical flow gradient.
- the size of the weight value can be set as:
- the degree of proximity between the viewing angle position of the transitional optical flow corresponding to the weight value and the viewing angle position of the first intermediate optical flow is positively correlated (or in other words, the viewing angle position of the transitional optical flow corresponding to the weight value and the viewing angle position of the second intermediate optical flow
- the proximity between locations is negatively correlated, the two are equivalent).
- the weight value is set to be larger, that is, the weighting coefficient of the first intermediate optical flow is increased and the second intermediate optical flow is decreased at the same time.
- the weighting coefficient of the intermediate optical flow so that the value of the transitional optical flow will be more affected by the first intermediate optical flow and less affected by the second intermediate optical flow, which is consistent with its viewing angle position; similar , if the distance between the viewing angle position of the transition optical flow and the viewing angle position of the first intermediate optical flow is farther, the smaller the weight value is set, that is, the weighting coefficient of the first intermediate optical flow is reduced and the second intermediate optical flow is increased at the same time In this way, the value of the transition optical flow will be more affected by the second intermediate optical flow and less affected by the first intermediate optical flow, which is consistent with the viewing angle position.
- the weight value of transition optical flow F m->0.4 under w 0.6>view angle 0.6
- the weight value of transition optical flow F m->0.6 under w 0.4>view angle 0.8
- each transitional optical flow is affected by the first intermediate optical flow from strong to weak, and affected by the second intermediate optical flow from weak to strong, so that it can be guaranteed that the calculated transitional optical flow is Gradient.
- setting the size of the weight value to be related to the viewing angle position of the transitional optical flow corresponding to the weight value does not mean that the viewing angle position of the transitional optical flow must be accurately calculated before the corresponding weight value can be further determined.
- At least one weight value can be set to be uniformly distributed in the interval (0,1).
- the weight value can be 0.5, and the interval between the weight value and 0 and 1 is 0.5, which belongs to a uniform distribution;
- M M>1 weight values, then these weight values can be i /(M+1) (i takes an integer between 1 and M), the interval between any two weight values is 1/(M+1), and the interval between the first weight value and 0,
- the interval between the Mth weight value and 1 is also 1/(M+1), which belongs to a uniform distribution.
- the angle of view position of the transition optical flow calculated by using these weight values is also between the first image and the second image Evenly distributed between the angles, such a distribution of viewing angle positions enables the collected transition images to fully describe the overall picture of the captured target between the corresponding viewing angles of the first image and the second image (without emphasizing certain viewing angles). It can be considered that the information of all transition images is fused, so the quality of the spliced image obtained in this way is high.
- weight values are evenly distributed in the interval (0,1).
- step B2 continue to introduce the method of obtaining the target optical flow through fusion in step B2 according to the first intermediate optical flow, at least one transitional optical flow, and the second intermediate optical flow.
- the weight matrix can be used to achieve optical flow fusion.
- the specific method is as follows:
- N is the total number of transition optical flows (N ⁇ 1); then, based on N+2 weight matrices, for the first intermediate optical flow, N transitional optical flows and the second intermediate The optical flow is weighted and summed to obtain the preliminary optical flow; finally, according to different implementation methods, the preliminary optical flow can be directly used as the target optical flow or the inverse can be used as the target optical flow.
- the formula can be used To represent the calculation process of the preliminary optical flow.
- F t represents the optical flow to be fused, which can be the first intermediate optical flow, transitional optical flow or the second intermediate optical flow
- W t represents the weight matrix corresponding to F t , that is, the t-th weight matrix
- ⁇ represents matrix multiplication.
- Each element in the weight matrix can be regarded as a weight value of the weighted summation.
- the elements in the weight matrix take values in the interval [0,1], and the elements in each weight matrix satisfy the relationship Represents the element in the i-th row and j-th column in the t-th weight matrix.
- the weight matrix is two-dimensional, so that it can be more flexible in the initial optical flow
- Combining information of different optical flows to be fused enables the preliminary optical flow to reflect the gradual change from the first intermediate optical flow to the second intermediate optical flow, and further enables the target optical flow calculated by optical flow inversion to reflect this gradual change.
- the elements in the weight matrix may be set according to the following rules: the position of the maximum value of the element in the weight matrix is related to the view position of the optical flow corresponding to the weight matrix.
- the above “correlation” may mean that the maximum position of the element in the weight matrix and its corresponding viewing angle of the optical flow to be fused are within the entire viewing angle range (referring to the area between the viewing angle of the first image and the viewing angle of the second image) The location is consistent.
- F 1 (F m->0 in Figure 3) is the first intermediate optical flow, and the corresponding viewing angle is viewing angle 0, which is located at the far left of the entire viewing angle range (from viewing angle 0 to viewing angle 1), so according to the above rules
- the leftmost column takes the maximum value, and the remaining columns can be gradually reduced, or set in other ways.
- the following two W 1s all satisfy the above rules:
- F 2 (F m->0.2 in Fig. 3) is the transition optical flow after F 1 , and the corresponding viewing angle is viewing angle 0.2, which is located at about 20% from the left of the entire viewing angle range, so it is set according to the above rules
- W 2 the second column from the left takes the maximum value, and the other columns can gradually decrease, or be set in other ways.
- the following two W 2 both satisfy the above rules:
- the maximum values of the elements in each weight matrix can be kept uniform, for example, in W 1 and W 2 above, the maximum values of the elements are either 1 or 0.8.
- the element that takes the maximum value in the weight matrix may be one or more columns. For example, for W 2 in the scene in Fig. 3 (the optical flow is no longer limited to 6 columns at this time), it is the maximum value of one column or several nearby columns at the position of the total number of matrix columns ⁇ 20%.
- the position of the maximum value of an element in a certain weight matrix is consistent with the view position of the corresponding optical flow to be fused (which is also a matrix), then after using the weight matrix for weighted calculation, the optical flow matrix of the optical flow to be fused, in
- the optical flow values corresponding to the viewing angle positions contribute the most to the calculation of the optical flow values at the same position in the preliminary optical flow, while the optical flow values corresponding to other positions contribute relatively little to the calculation of the optical flow values at the same position in the preliminary optical flow.
- the optical flow value corresponding to each viewing position is mainly determined by the viewing position
- the corresponding optical flow contribution to be fused enables the preliminary optical flow to reflect the gradual change from the first intermediate optical flow to the second intermediate optical flow, and further enables the target optical flow calculated by optical flow inversion to also reflect this gradual change.
- W t (t is an integer from 1 to 6) is set to be 1 for the elements in the tth column, and 0 for the other columns. Then after weighted summation, in the obtained preliminary optical flow F m->0 ⁇ 1 , the tth column comes from F t , or each F t contributes A list of element values, so F m->0 ⁇ 1 can reflect the gradient of optical flow from F 1 to F 6 .
- the image stitching method described above is implemented based on a neural network model called the image stitching model, which includes three sub-networks, namely the optical flow calculation network, the optical flow inversion network, and the mask calculation network.
- the optical flow calculation network is used to estimate the first screenshot optical flow and the second screenshot optical flow, and then the first intermediate optical flow and the second intermediate optical flow can be calculated (see step b);
- the optical flow inversion network is used to calculate The reverse optical flow of the preliminary optical flow is obtained to obtain the target optical flow (see step B2);
- the mask calculation network is used to calculate the splicing mask, and then the spliced image can be calculated (see step D).
- the functions of the above three sub-networks have been introduced in detail above and will not be repeated.
- the training method of the image mosaic model will be introduced below. Its possible process is shown in Figure 5.
- the model training method can be executed by, but not limited to, the electronic device shown in FIG. 8 .
- the method includes:
- Step S210 Using the image stitching model to calculate a stitched image of the first image and the second image.
- Step S220 Obtain the first real screenshot optical flow, the second real screenshot optical flow, the real target optical flow and the real stitched image.
- Step S230 Calculate the optical flow prediction loss according to the optical flow in the first screenshot, the optical flow in the second screenshot, the optical flow in the first real screenshot, and the optical flow in the second real screenshot.
- Step S240 Calculate the optical flow inversion loss according to the target optical flow and the real target optical flow.
- Step S250 Calculate the image stitching loss according to the stitched image and the real stitched image.
- Step S260 Calculate the total loss according to the optical flow prediction loss, the optical flow inversion loss and the image stitching loss, and update the parameters of the optical flow calculation network, the optical flow inversion network and the mask calculation network according to the total loss.
- step S210 can be implemented by using the image mosaic method provided in the embodiment of the present application (the corresponding steps use the three sub-networks in the image mosaic model), and will not be described again.
- Figure 8 is a training method applied to the model training stage
- the first image and the second image in step S210 are training images
- the first image and the second image are not limited in step S110 Is the image used for training or the image used for inference.
- the first real screenshot optical flow, the second real screenshot optical flow, the real object optical flow and the real spliced image in step S220 are supervisory signals.
- the first real screenshot optical flow and the second real screenshot optical flow are used to calculate the optical flow prediction loss in step S230 to supervise the training of the optical flow calculation network;
- the real target optical flow is used to calculate the optical flow calculation in step S240.
- the inverse loss is used to supervise the training of the optical flow inversion network;
- the real stitched image is used to calculate the image stitching loss in step S250 to supervise the training of the mask calculation network.
- the loss calculated for the later sub-network in the image stitching model may also have a supervisory effect on the front sub-network in the image stitching model.
- the optical flow inversion loss may also play a role in the optical flow calculation network.
- the image stitching loss may also play a supervisory role for the optical flow calculation network and the optical flow inversion network.
- Steps S220 to S250 are relatively flexible in execution sequence, which will be described below:
- the three supervisory signals in step S220 can be acquired together or separately, and the "acquisition" mentioned here includes calculation or direct reading.
- the calculation of the corresponding loss can be performed immediately after obtaining a supervisory signal, for example, after obtaining the optical flow of the first real screenshot and the optical flow of the second real screenshot, perform step S230 to calculate the optical flow prediction loss (The premise is that the optical flow of the first screenshot and the optical flow of the second screenshot have been calculated), and it is not necessary to wait until all the three supervision signals are obtained before performing step S230.
- step S220 is not necessarily related to step S210, and may be executed before step S210, may be executed after step S210 (as shown in FIG. 8 ), or may be executed in parallel with step S210.
- the calculation timing of the three losses in step S230 to step S250 is not limited. For example, these three losses can be calculated after the execution of step S210. At this time, the optical flow of the first screenshot, the optical flow of the second screenshot, the target optical flow, and the stitched image are all calculated during the execution of step S210, so that the loss can be calculated. calculation, and the calculation order of the three losses does not matter at this time. For another example, the loss can be calculated in the process of step S210.
- step S230 can be performed (provided that the optical flow of the first real screenshot and the second real screenshot are The optical flow has been obtained), and it is not necessary to wait until the calculation of the stitched image is completed.
- the parameter update of the image stitching model can be achieved by backpropagating according to the total loss.
- the above model training method considers the losses corresponding to the three sub-networks at the same time: optical flow prediction loss, optical flow inversion loss and image splicing loss, that is, the intermediate optical flow prediction accuracy of the model, the target optical flow prediction accuracy and Stitching mask prediction accuracy, so that the resulting image stitching model is capable of high-quality image stitching.
- the image stitching model includes the above three sub-networks, only one or two of the losses may be calculated, and it is not necessary to calculate all three losses.
- the computing network and the optical flow inversion network can also play a supervisory role, so it is also possible to only calculate the image stitching loss. Obviously, if some loss is not calculated, the corresponding supervisory signal does not need to be obtained.
- image mosaic method may not be implemented using a neural network, so the image mosaic model does not necessarily include the above three sub-networks.
- image mosaic model does not necessarily include the above three sub-networks.
- optical flow inversion can also be performed by other methods, so that the image stitching model will not include the optical flow inversion network, and the corresponding optical flow inversion loss naturally does not need to be calculated.
- the real image may be used as an intermediate image to generate the first image and the second image in step S210.
- the real image mentioned here can be a real image collected, of course, it can also be an image generated by a computer vision algorithm.
- the intermediate image at this time is a known and actually existing image, which is different from the previous explanation of the intermediate image. (When introducing step S120 above, the intermediate image is regarded as a virtual image).
- the method for generating a set of first and second images is:
- the first image is calculated according to the intermediate image and a certain homography matrix
- the second image is calculated according to the intermediate image and the inverse matrix of the homography matrix.
- Figure 6 shows this image generation process, where h represents a certain homography matrix, h -1 represents the inverse matrix of h, based on the intermediate image, the first image can be obtained by using h and h -1 to perform projection transformation respectively and the second image.
- a group of different first images and second images can be obtained only by changing h, so this method can "create" a large number of training images based on a small number of real images.
- Stitch images The homography matrix h can be set in advance, and can also be temporarily generated by an algorithm.
- step S220 can be generated in the following manner:
- the projective transformation represented by the homography matrix gives the correspondence between the pixels of the two images, so it is easy to calculate the optical flow between the two images given the homography matrix.
- the optical flow between the two can be calculated, which is called the first real intermediate optical flow.
- the first real intermediate optical flow After obtaining the first real intermediate optical flow, take a screenshot (intercept The first real screenshot optical flow can be obtained by using the image area of the common picture in the first image and the second image.
- the optical flow between the two can be calculated, which is called the second real intermediate optical flow.
- interpolation calculates at least one transition matrix, according to h, at least one transition matrix and h -1 , fuses to obtain a target matrix, and calculates the real target optical flow according to the target matrix.
- the real target optical flow should be understood as the inverse result of the preliminary optical flow under ideal conditions.
- the interpolation method used here is similar to the previous interpolation method when calculating the transition optical flow (for example, using weighted summation), and the fusion method used here is similar to the previous fusion method when calculating the target optical flow (for example, using the weight matrix Weighted summation), although the previous calculation method is aimed at optical flow, but in a mathematical sense, the homography matrix and optical flow are both matrices, and there is no essential difference, so the previous calculation method for optical flow can be applied to here.
- the target matrix can be directly obtained, and the inverse matrix does not need to be calculated (the target optical flow must be obtained by inverting the preliminary optical flow), because for the homography matrix For the fusion of , the inversion needs to be performed on the matrix itself before the fusion, but the inversion of h is h -1 , and the inversion of h -1 is h, just exchange the two, so the step of matrix inversion can be omitted.
- the target matrix can also be regarded as the homography matrix between the intermediate image and the real stitched image (as shown in Figure 6)
- the optical flow between the two that is, the real target optical flow, can be calculated according to the target matrix.
- transition optical flow does not need to be calculated in an optional scheme, then the above transition matrix does not need to be calculated, and the target matrix can be directly calculated according to h and h -1 .
- the real spliced image can be calculated according to the intermediate image and the target matrix, which is similar to the calculation of the first image and the second image, and will not be described again.
- the real stitched image should be understood as the stitching result of the first image and the second image under ideal conditions.
- this implementation can be Use a small number of real images (these images can be located in the training set) to quickly generate a large number of training samples, and select different homography matrices to make these samples cover different scenes, so that the trained image stitching model has good performance.
- the “middle” here can refer to the "middle” in the sense of projection transformation determined by the homography matrix, that is, in the The image collected at the "middle” position can be transformed to obtain the first image through a certain homography matrix, and can be transformed to obtain the second image through the inverse matrix of the homography matrix. Understandably, if other methods are used to generate training data, the definition of "intermediate” will be changed accordingly.
- FIG. 7 shows the structure of the image stitching device 300 provided by the embodiment of the present application.
- the image stitching device 300 includes:
- An image acquisition module 310 configured to acquire a first image and a second image
- the intermediate optical flow calculation module 320 is configured to calculate a first intermediate optical flow and a second intermediate optical flow according to the first image and the second image, and the first intermediate optical flow is the intermediate image and the second intermediate optical flow An optical flow between images, the second intermediate optical flow is an optical flow between the intermediate image and the second image, and the intermediate image is an optical flow whose viewing angle is between the first image and the second image image, the size of the first intermediate optical flow is the same as the size of the first image, and the size of the second intermediate optical flow is the same as the size of the second image;
- An image stitching module 330 configured to calculate the first image and the second image according to the first intermediate optical flow, the second intermediate optical flow, the first image, and the second image Stitch images.
- the intermediate optical flow calculation module 320 calculates the first intermediate optical flow and the second intermediate optical flow according to the first image and the second image, including: The first image and the second image include the image area of the common picture to obtain the first screenshot and the second screenshot; input the first screenshot and the second screenshot into the optical flow calculation network to obtain the optical flow of the first screenshot Flow and second screenshot optical flow, the first screenshot optical flow is the optical flow between the intermediate image and the first screenshot, and the second screenshot optical flow is the intermediate image and the second screenshot optical flow
- the optical flow between; the optical flow of the first screenshot is up-sampled to the size of the first image to obtain the first intermediate optical flow, and the optical flow of the second screenshot is up-sampled to the size of the first image The size of the second image to obtain the second intermediate optical flow.
- the image stitching module 330 calculates the first intermediate optical flow, the second intermediate optical flow, the first image, and the second image to obtain the first
- the spliced image of an image and the second image includes: obtaining a first intermediate image through mapping according to the first intermediate optical flow and the first image, and obtaining a first intermediate image according to the second intermediate optical flow and the first intermediate optical flow
- the second image is mapped to obtain a second intermediate image; according to the first intermediate optical flow and the second intermediate optical flow, a target optical flow is obtained through calculation; according to the target optical flow and the first intermediate image, a second intermediate image is obtained through mapping A stitched image, and, according to the target optical flow and the second intermediate image, mapped to obtain a second stitched image; according to the first stitched image and the second stitched image, stitched to obtain the stitched image.
- the image stitching module 330 calculates the target optical flow according to the first intermediate optical flow and the second intermediate optical flow, including: according to the first intermediate optical flow and the second intermediate optical flow The second intermediate optical flow is interpolated to calculate at least one transitional optical flow; according to the first intermediate optical flow, the at least one transitional optical flow, and the second intermediate optical flow, the target optical flow is obtained through fusion.
- the image stitching module 330 interpolates and calculates at least one transitional optical flow according to the first intermediate optical flow and the second intermediate optical flow, including: acquiring at least one weight value; Based on each weight value, weighted summation is performed on the first intermediate optical flow and the second intermediate optical flow to obtain the at least one transitional optical flow.
- the magnitude of the weight value is related to the viewing angle position of the transition optical flow corresponding to the weight value.
- the sum of the weighting coefficient of the first intermediate optical flow and the weighting coefficient of the second intermediate optical flow is 1, and the weight value is the first intermediate optical flow
- the weighting coefficient of the weight value is positively correlated with the degree of proximity between the viewing angle position of the transitional optical flow corresponding to the weighting value and the viewing angle position of the first intermediate optical flow.
- the at least one weight value is evenly distributed in the interval (0,1).
- the image stitching module 330 calculates and obtains the target optical flow according to the first intermediate optical flow and the second intermediate optical flow, including: Flow and the second intermediate optical flow are calculated to obtain a preliminary optical flow; the preliminary optical flow is input into an optical flow inversion network to obtain the target optical flow.
- the image stitching module 330 fuses the target optical flow according to the first intermediate optical flow, the at least one transitional optical flow, and the second intermediate optical flow, including : Obtain N+2 weight matrices, N is the total number of transition optical flows; based on the N+2 weight matrices, the first intermediate optical flow, the N transition optical flows and the second intermediate The optical flow is weighted and summed to obtain the preliminary optical flow; the preliminary optical flow is the target optical flow, or the preliminary optical flow is input into an optical flow inversion network to obtain the target optical flow.
- the position of the maximum value of the elements in the weight matrix is related to the position of the view angle of the optical flow corresponding to the weight matrix.
- the image stitching module 330 stitches the first stitched image and the second stitched image to obtain the stitched image, including: combining the first stitched image and the second stitched image
- the second stitched image is input into the mask calculation network to obtain a stitched mask; based on the stitched mask, the first stitched image and the second stitched image are stitched to obtain the stitched image.
- the first intermediate optical flow and the second intermediate optical flow are calculated using the first screenshot optical flow and the second screenshot optical flow output by the optical flow calculation network, and the target optical flow
- the flow is calculated by using an optical flow inversion network, and the spliced image is calculated by using a splicing mask output by a mask calculation network; the device also includes:
- a supervisory signal acquisition module configured to acquire the first real screenshot optical flow, the second real screenshot optical flow, the real target optical flow and the real spliced image
- a loss calculation module configured to calculate an optical flow prediction loss according to the first screenshot optical flow, the second screenshot optical flow, the first real screenshot optical flow, and the second real screenshot optical flow; and, according to the calculating an optical flow inversion loss for the target optical flow and the real target optical flow; and calculating an image stitching loss based on the stitched image and the real stitched image;
- a parameter updating module configured to calculate a total loss according to the optical flow prediction loss, the optical flow inversion loss and the image stitching loss, and update the optical flow calculation network, the optical flow calculation The inverse network and the mask compute the parameters of the network.
- the target optical flow is fused according to the first intermediate optical flow, at least one transitional optical flow, and the second intermediate optical flow
- the image acquisition module 310 acquires the first image and the second image, including: calculating the first image according to the intermediate image and the specified homography matrix; and calculating the first image according to the intermediate image and the inverse matrix of the homography matrix a second image, the intermediate image is a real image;
- the supervisory signal acquisition module acquires the first real screenshot optical flow, the second real screenshot optical flow, the real target optical flow, and the real spliced image, including: calculating the first real intermediate optical flow according to the homography matrix, and calculating the first real intermediate optical flow according to the first A real intermediate optical flow to calculate the first real screenshot optical flow; calculate a second real intermediate optical flow according to the inverse matrix of the homography matrix, and calculate the second real screenshot according to the second real intermediate optical flow Optical flow; according to the homography matrix and the inverse matrix of the homography matrix, at least one transition matrix is interpolated, and according to the homography matrix, the at least one transition matrix, and the homography matrix The inverse matrix is fused to obtain a target matrix, and the real target optical flow is calculated according to the target matrix; the real spliced image is calculated according to the intermediate image and the target matrix.
- the image stitching device 300 provided in the embodiment of the present application, its implementation principle and technical effects have been introduced in the foregoing method embodiments. For a brief description, for the parts not mentioned in the device embodiments, you can refer to the corresponding content in the method embodiments .
- FIG. 8 shows a possible structure of an electronic device 400 provided by the embodiment of the present application.
- the electronic device 400 includes: a processor 410 , a memory 420 and a communication interface 430 , and these components are interconnected and communicate with each other through a communication bus 440 and/or other forms of connection mechanisms (not shown).
- the processor 410 includes one or more (only one is shown in the figure), which may be an integrated circuit chip, and has a signal processing capability.
- the above-mentioned processor 410 can be a general-purpose processor, including a central processing unit (Central Processing Unit, referred to as CPU), a micro control unit (Micro Controller Unit, referred to as MCU), a network processor (Network Processor, referred to as NP) or other conventional processing units.
- CPU Central Processing Unit
- MCU Micro Controller Unit
- NP Network Processor
- processors 410 some of them may be general-purpose processors, and the other part may be special-purpose processors.
- Memory 420 includes one or more (only one shown in the figure), which can be, but not limited to, random access memory (Random Access Memory, RAM for short), read only memory (Read Only Memory, ROM for short), can Programmable Read-Only Memory (PROM for short), Erasable Programmable Read-Only Memory (EPROM for short), Electric Erasable Programmable Read-Only Memory (Electric Erasable Programmable Read-Only Memory for short) Only Memory, referred to as EEPROM) and so on.
- random access memory Random Access Memory, RAM for short
- Read Only Memory ROM for short
- PROM Programmable Read-Only Memory
- EPROM Erasable Programmable Read-Only Memory
- EEPROM Electrical Erasable Programmable Read-Only Memory
- the processor 410 and possibly other components can access the memory 420 to read and/or write data therein.
- one or more computer program instructions may be stored in the memory 420, and the processor 410 may read and execute these computer program instructions to implement the image stitching method provided in the embodiment of the present application.
- the communication interface 430 includes one or more (only one is shown in the figure), which can be used to directly or indirectly communicate with other devices for data interaction.
- the communication interface 430 may include an interface for wired and/or wireless communication.
- the structure shown in FIG. 8 is only for illustration, and the electronic device 400 may also include more or less components than those shown in FIG. 8 , or have a configuration different from that shown in FIG. 8 .
- Each component shown in FIG. 8 may be implemented by hardware, software or a combination thereof.
- the electronic device 400 may be a physical device, such as a PC, a notebook computer, a tablet computer, a mobile phone, a server, a smart wearable device, etc., or a virtual device, such as a virtual machine, a virtualized container, and the like.
- the electronic device 400 is not limited to a single device, and may also be a combination of multiple devices or a cluster formed by a large number of devices.
- the embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores computer program instructions, and when the computer program instructions are read and run by the processor of the computer, the computer program instructions provided by the embodiments of the present application are executed. image stitching method.
- the computer-readable storage medium can be implemented as the memory 420 in the electronic device 400 in FIG. 8 .
- the present application provides an image stitching method and device, a storage medium, and an electronic device.
- the image stitching method may include: acquiring a first image and a second image; calculating a first intermediate image according to the first image and the second image optical flow and a second intermediate optical flow, the first intermediate optical flow is the optical flow between the intermediate image and the first image, and the second intermediate optical flow is the optical flow between the intermediate image and the second image
- the optical flow between, the intermediate image is an image whose viewing angle is between the first image and the second image
- the size of the first intermediate optical flow is the same as the size of the first image
- the second intermediate optical flow The size of the flow is the same as the size of the second image; according to the first intermediate optical flow, the second intermediate optical flow, the first image and the second image, the first image and A stitched image of the second image.
- the above image mosaic method has simple steps, and the image mosaic can be completed without complex iterative calculation of multiple homography matrices, so the efficiency of image mosaic can be improved,
- the image stitching method and the image stitching device of the present application are reproducible and can be used in various industrial applications.
- the image stitching method and device, storage medium, and electronic equipment of the present application can be used in the field of image processing.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Image Processing (AREA)
- Studio Devices (AREA)
Abstract
本申请涉及图像处理技术领域,提供一种图像拼接方法及装置、存储介质及电子设备。所述方法包括:获取第一图像和第二图像;根据第一图像和第二图像,计算得到第一中间光流和第二中间光流,第一中间光流为中间图像和第一图像之间的光流,第二中间光流为中间图像和第二图像之间的光流,中间图像为视角介于第一图像和第二图像之间的图像,第一中间光流的尺寸与第一图像的尺寸相同,第二中间光流的尺寸与第二图像的尺寸相同;根据第一中间光流、第二中间光流、第一图像和第二图像,计算得到第一图像和第二图像的拼接图像。上述图像拼接方法不需要通过复杂的迭代对多个单应性矩阵进行计算即可完成图像拼接,因此可以提高图像拼接的效率。
Description
相关申请的交叉引用
本申请要求于2021年5月28日提交中国国家知识产权局的申请号为202110597189.7、名称为“图像拼接方法及装置、存储介质及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及图像处理领域,具体而言,涉及一种图像拼接方法及装置、存储介质及电子设备。
图像拼接可以指将具有重叠区域的多个图像拼合在一起,得到无缝的全景图像的过程。近年来,图像拼接技术已被广泛应用于航空航天、医学微创手术、医学显微观测以及地质调查等领域。
在相关技术中,可以通过相机校准的方法来实现图像拼接,但这类方法计算量很大,需要通过迭代的方式计算多个单应性矩阵,使得拼接过程效率低下。
发明内容
本申请提供一种图像拼接方法及装置、存储介质及电子设备,以改善上述技术问题。
在一些示例性实施例中,本申请提供一种图像拼接方法,该图像拼接方法可以包括:获取第一图像和第二图像;根据所述第一图像和所述第二图像,计算得到第一中间光流和第二中间光流,所述第一中间光流为中间图像和所述第一图像之间的光流,所述第二中间光流为所述中间图像和所述第二图像之间的光流,所述中间图像为视角介于第一图像和第二图像之间的图像,所述第一中间光流的尺寸与所述第一图像的尺寸相同,所述第二中间光流的尺寸与所述第二图像的尺寸相同;根据所述第一中间光流、所述第二中间光流、所述第一图像和所述第二图像,计算得到所述第一图像和所述第二图像的拼接图像。
上述图像拼接方法步骤简单,不需要通过复杂的迭代对多个单应性矩阵进行计算即可完成图像拼接,因此可以提高图像拼接的效率,让拼接过程变得更加实时,从而具有较高的实用性。并且,该方法计算出的第一中间光流的尺寸与第一图像的尺寸相同,第二中间光流的尺寸与第二图像的尺寸相同,从而便于利用第一中间光流和第二中间光流直接对第一图像和第二图像进行映射,进而快速地实现图像拼接。
在示例性实施例的一种实现方式中,所述根据所述第一图像和所述第二图像,计算得到第一中间光流和第二中间光流,包括:分别截取所述第一图像和所述第二图像中包含共同画面的图像区域,得到第一截图和第二截图;将所述第一截图和所述第二截图输入光流计算网络,得到第一截图光流和第二截图光流,所述第一截图光流为所述中间图像和所述第一截图之间的光流,所述第二截图光流为所述中间图像和所述第二截图之间的光流;将所述第一截图光流上采样至所述第一图像的尺寸,得到所述第一中间光流,以及,将所述第二截图的光流上采样至所述第二图像的尺寸,得到所述第二中间光流。
对于第一图像和第二图像中包含共同画面的图像区域,由于像素之间存在对应关系,因此能够比较 准确地进行中间光流的估计,对于第一图像和第二图像中不包含共同画面的图像区域,由于像素之间没有对应关系,因此难以进行中间光流的估计。从而,如果直接利用完整的第一图像和第二图像进行中间光流估计,可能会得到不准确的结果。
在上述实现方式中,先利用第一截图和第二截图(对应第一图像和第二图像中包含共同画面的图像区域)进行光流估计,然后再对估计出的小尺寸光流进行上采样得到所需的中间光流,可以改善中间光流的精度。并且,由于多数情况下图像中目标的运动在全局和局部上是一致的,因此小尺寸的局部光流所反映的目标运动规律和大尺寸的全局光流(即中间光流)是相同的。从而,在上采样过程中产生的新光流值,其有效性可以保证。
在示例性实施例的一种实现方式中,所述根据所述第一中间光流、所述第二中间光流、所述第一图像和所述第二图像,计算得到所述第一图像和所述第二图像的拼接图像,包括:根据所述第一中间光流和所述第一图像,映射得到第一中间图像,以及,根据所述第二中间光流和所述第二图像,映射得到第二中间图像;根据所述第一中间光流和所述第二中间光流,计算得到目标光流;根据所述目标光流和所述第一中间图像,映射得到第一拼接图像,以及,根据所述目标光流和所述第二中间图像,映射得到第二拼接图像;根据所述第一拼接图像和所述第二拼接图像,拼接得到所述拼接图像。
待拼接的图像通常是针对同一目标(若第一图像和第二图像内容完全不相关,一般没有必要对其进行拼接)、在不同视角下采集的图像,而两张图像之间的光流可以认为是对目标在图像中所作运动的量化表示,这种运动既包括目标本身的运动,也包括摄像头位置(包括拍摄视角)的移动。从而,以中间图像为参照,第一中间光流(表示第一图像相对于中间图像的运动)对应采集第一图像的视角,第二中间光流(表示第二图像相对于中间图像的运动)对应采集第二图像的视角。
可选地,由于目标光流至少是基于第一中间光流以及第二中间光流融合产生的,因此可以认为目标光流也对应一个特殊视角,在该视角下采集的图像中融合了第一图像以及第二图像的信息,从而可以反映被采集目标在不同视角下的状态,是比较理想的拼接图像。该拼接图像可以利用中间图像和目标光流进行计算,其具有较高的拼接质量,改善了传统图像拼接方法存在的伪影、失真、待拼接图像难以对齐等问题。
在示例性实施例的一种实现方式中,所述根据所述第一中间光流和所述第二中间光流,计算得到目标光流,可以包括:根据所述第一中间光流和所述第二中间光流,插值计算至少一个过渡光流;根据所述第一中间光流、所述至少一个过渡光流以及所述第二中间光流,融合得到所述目标光流。
根据上面的阐述,两张图像之间的光流可以认为是对目标在图像中所作运动的量化表示。从而,以中间图像为参照,第一中间光流对应采集第一图像的视角,第二中间光流对应采集第二图像的视角,利用第一中间光流和第二中间光流插值得到的每个过渡光流则对应一个位于第一图像和第二图像之间的过渡视角,过渡视角下采集的图像为过渡图像(并没有真的采集过渡图像,这里引出过渡图像的概念只是为了便于阐述方案原理)。
可以考虑一个虚拟的图像采集过程:在某个视角下采集第一图像,然后将摄像头依次移动至各个过渡视角下采集过渡图像,最后采集到第二图像,在这一过程中,由于相邻视角之间的视差较小,因此第一中间光流依次平缓地变化为各个过渡光流,最终变化为第二中间光流(也称为光流的平滑过渡)。
可选地,由于上述实现方式中的目标光流是第一中间光流、至少一个过渡光流以及第二中间光流融合产生的,即其中包含了各个视角下光流信息,从而目标光流体现了上述虚拟图像采集过程中光流的渐变,因此可以认为目标光流也对应一个特殊的渐变视角,在该渐变视角下采集的图像中融合了第一图像、第二图像以及至少一个过渡图像的信息,从而可以全面地反映被采集目标在各个视角下的状态,正是比较理想的拼接图像。该拼接图像可以利用中间图像和目标光流进行计算,如上所述,由于该拼接图像反应了被采集目标的全貌,因此具有较高的拼接质量,改善了传统图像拼接方法存在的伪影、失真、待拼接图像难以对齐等问题。
在示例性实施例的一种实现方式中,所述根据所述第一中间光流和所述第二中间光流,插值计算至少一个过渡光流,可以包括:获取至少一个权重值;分别基于每个权重值,对所述第一中间光流和所述第二中间光流进行加权求和,得到所述至少一个过渡光流。
第一中间光流和第二中间光流可以视为插值运算的两个端点,插值运算即要估计这两个端点之间的至少一个位置的值,以实现光流的平滑过渡。上述实现方式中的加权求和运算属于线性插值,也可以采用非线性插值(例如,二次、三次、倒数插值等方式)。线性插值具有运算简单的优点,并且,在多数情况下,线性运动足以描述目标在第一图像和第二图像之间的运动,线性插值的效果也足够好。
在示例性实施例的一种实现方式中,所述权重值的大小与该权重值对应的过渡光流的视角位置相关。
在上述实现方式中,设置权重值时会考虑与该权重值对应的过渡光流的视角位置,这样利用权重值计算出的过渡光流与其所在的视角位置将具有一致性。
在示例性实施例的一种实现方式中,所述第一中间光流的加权系数和所述第二中间光流的加权系数之和为1,所述权重值为所述第一中间光流的加权系数,所述权重值的大小与该权重值对应的过渡光流的视角位置和所述第一中间光流的视角位置之间的接近程度正相关。
在第一中间光流的加权系数和第二中间光流的加权系数之和为1时,既可以将第一中间光流的加权系数视为权重值(此时第二中间光流的加权系数为1减去权重值),也可以将第二中间光流的加权系数视为权重值(此时第一中间光流的加权系数为1减去权重值),两种方案没有实质区别。
以前者为例,若过渡光流的视角位置和第一中间光流的视角位置之间越接近,就将权重值设置得越大,即增大第一中间光流的加权系数同时减小第二中间光流的加权系数,这样过渡光流的取值将更多地受到第一中间光流的影响,与其所在的视角位置具有一致性。并且,如果所有权重值均按照此规律设置,可以保证计算出的过渡光流之间是渐变的。
在示例性实施例的一种实现方式中,所述至少一个权重值在区间(0,1)内可以均匀分布。
在上述实现方式中,由于权重值在区间(0,1)内均匀分布,从而利用这些权重值计算出的过渡光流的 视角位置在第一图像和第二图像之间的分布也是比较均匀的,这样的视角位置分布使得过渡图像能够充分地描述被采集目标在第一图像和第二图像所对应视角之间的全貌,因此,可视为融合了过渡图像信息的拼接图像具有较高的质量。
在示例性实施例的一种实现方式中,所述根据所述第一中间光流和所述第二中间光流,计算得到目标光流,可以包括:根据所述第一中间光流和所述第二中间光流,计算得到初步光流;将所述初步光流输入光流求逆网络,得到所述目标光流。
光流求逆不同于简单的矩阵求逆,计算过程比较复杂,在上述实现方式中采用神经网络进行光流求逆运算,一方面有利于简化运算、提高光流求逆的效率,一方面可以利用神经网络的学习能力提高光流求逆精度。
在示例性实施例的一种实现方式中,所述根据所述第一中间光流、所述至少一个过渡光流以及所述第二中间光流,融合得到所述目标光流,可以包括:获取N+2个权重矩阵,N为所述过渡光流的总数量;基于所述N+2个权重矩阵,对所述第一中间光流、N个过渡光流以及所述第二中间光流进行加权求和,得到所述初步光流;所述初步光流为所述目标光流,或者,将所述初步光流输入光流求逆网络,得到所述目标光流。
在上述实现方式中,利用权重矩阵对过渡光流进行加权求和,不同于一维的权重值,权重矩阵是二维的,从而可以更加灵活地在初步光流中组合不同光流的信息,使得初步光流能够反映第一中间光流到第二中间光流的渐变,自然基于初步光流得到的目标光流也能反映这种渐变。
在示例性实施例的一种实现方式中,所述权重矩阵中元素的最大值位置与该权重矩阵对应的光流的视角位置相关。
在上述实现方式中,以某个权重矩阵中元素的最大值位置与其对应的光流(可以是第一中间光流、过渡光流或第二中间光流)的视角位置一致(相关的一种方式)的情况为例,即该光流(也是一个矩阵)在其视角位置对应的光流值对初步光流的计算贡献最大,在其余位置对应的光流值对初步光流的计算贡献则相对较小。并且,如果所有权重矩阵均按照此规律设置,可以使得在初步光流中,对应于每个视角位置的光流值主要由该视角位置所对应的光流贡献,从而使得初步光流能够反映第一中间光流到第二中间光流的渐变,自然基于初步光流得到的目标光流也能反映这种渐变。
在示例性实施例的一种实现方式中,所述根据所述第一拼接图像和所述第二拼接图像,拼接得到所述拼接图像,可以包括:将所述第一拼接图像和所述第二拼接图像输入掩膜计算网络,得到拼接掩膜;基于所述拼接掩膜,将所述第一拼接图像和所述第二拼接图像进行拼接,得到所述拼接图像。
在上述实现方式中,利用拼接掩膜实现第一拼接图像和第二拼接图像在拼接处的平滑过渡,并且,该拼接掩膜并非是预先设定好的,而是掩膜计算网络学习到的,从而可以进一步提高拼接图像的质量。
在示例性实施例的一种实现方式中,所述第一中间光流和第二中间光流利用光流计算网络输出的第一截图光流和第二截图光流计算得到,所述目标光流利用光流求逆网络计算得到,所述拼接图像利用掩 膜计算网络输出的拼接掩膜计算得到;所述方法还可以包括:获取第一真实截图光流、第二真实截图光流、真实目标光流以及真实拼接图像;根据所述第一截图光流、所述第二截图光流、所述第一真实截图光流和所述第二真实截图光流计算光流预测损失;根据所述目标光流和所述真实目标光流计算光流求逆损失;根据所述拼接图像和所述真实拼接图像计算图像拼接损失;根据所述光流预测损失、所述光流求逆损失和所述图像拼接损失计算总损失,并根据所述总损失更新所述光流计算网络、所述光流求逆网络和所述掩膜计算网络的参数。
上述实现方式给出了一种端到端的模型训练方法,可以用于训练图像拼接模型,图像拼接模型包括光流计算网络、光流求逆网络以及掩膜计算网络,在计算损失时同时考虑与三个网络对应的损失:光流预测损失、光流求逆损失以及图像拼接损失,即通过训练同时提高模型的中间光流预测精度、目标光流预测精度以及拼接掩膜预测精度,从而最终得到的图像拼接模型能够实现高质量的图像拼接。
在示例性实施例的一种实现方式中,所述目标光流根据所述第一中间光流、至少一个过渡光流以及所述第二中间光流融合得到,所述获取第一图像和第二图像,可以包括:根据所述中间图像和单应性矩阵,计算得到所述第一图像,以及,根据所述中间图像和所述单应性矩阵的逆矩阵,计算得到所述第二图像,所述中间图像为真实图像;所述获取第一真实截图光流、第二真实截图光流、真实目标光流以及真实拼接图像,可以包括:根据所述单应性矩阵计算第一真实中间光流,并根据所述第一真实中间光流计算所述第一真实截图光流;根据所述单应性矩阵的逆矩阵计算第二真实中间光流,并根据所述第二真实中间光流计算所述第二真实截图光流;根据所述单应性矩阵和所述单应性矩阵的逆矩阵,插值计算至少一个过渡矩阵,根据所述单应性矩阵、所述至少一个过渡矩阵以及所述单应性矩阵的逆矩阵,融合得到目标矩阵,并根据所述目标矩阵计算所述真实目标光流;根据所述中间图像和所述目标矩阵计算所述真实拼接图像。
在上述实现方式中,中间图像为真实图像,单应性矩阵可以是指定好的,利用中间图像和单应性矩阵就可以计算训练用的监督信号:真实截图光流、真实目标光流以及真实拼接图像。如果将一组待拼接图像(包括第一图像和第二图像)及其对应的监督信号视为一个训练样本,由于单应性矩阵可以任意指定,所以此种实现方式可以利用少量的真实图像快速地生成大量的训练样本,并且这些样本可以覆盖不同的场景,从而使训练得到的图像拼接模型具有良好的泛化能力。
在另一些示例性实施例中,本申请提供一种图像拼接装置,该图像拼接装置可以包括:图像获取模块,用于获取第一图像和第二图像;中间光流计算模块,用于根据所述第一图像和所述第二图像,计算得到第一中间光流和第二中间光流,所述第一中间光流为中间图像和所述第一图像之间的光流,所述第二中间光流为所述中间图像和所述第二图像之间的光流,所述中间图像为视角介于第一图像和第二图像之间的图像,所述第一中间光流的尺寸与所述第一图像的尺寸相同,所述第二中间光流的尺寸与所述第二图像的尺寸相同;图像拼接模块,用于根据所述第一中间光流、所述第二中间光流、所述第一图像和所述第二图像,计算得到所述第一图像和所述第二图像的拼接图像。
在示例性实施例的一种实现方式中,图像拼接模块根据所述第一中间光流、所述第二中间光流、所述第一图像和所述第二图像,计算得到所述第一图像和所述第二图像的拼接图像,包括:根据所述第一中间光流和所述第一图像,映射得到第一中间图像,以及,根据所述第二中间光流和所述第二图像,映射得到第二中间图像;根据所述第一中间光流和所述第二中间光流,计算得到目标光流;根据所述目标光流和所述第一中间图像,映射得到第一拼接图像,以及,根据所述目标光流和所述第二中间图像,映射得到第二拼接图像;根据所述第一拼接图像和所述第二拼接图像,拼接得到所述拼接图像。
在示例性实施例的一种实现方式中,图像拼接模块根据所述第一中间光流和所述第二中间光流,计算得到目标光流,包括:根据所述第一中间光流和所述第二中间光流,插值计算至少一个过渡光流;根据所述第一中间光流、所述至少一个过渡光流以及所述第二中间光流,融合得到所述目标光流。
在另外一些示例性实施例中,本申请提供一种计算机可读存储介质,所述计算机可读存储介质上可以存储有计算机程序指令,所述计算机程序指令被处理器读取并运行时,可以执行前述任意一种可能的实现方式提供的方法。
在另一些示例性实施例中,本申请提供一种电子设备,该电子设备可以包括:存储器以及处理器,所述存储器中可以存储有计算机程序指令,所述计算机程序指令被所述处理器读取并运行时,可以执行前述任意一种可能的实现方式提供的方法。
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例中所需要使用的附图作简单地介绍,应当理解,以下附图仅示出了本申请的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。
图1示出了本申请实施例提供的图像拼接方法的一种可能的流程;
图2示出了本申请实施例提供的图像拼接方法的一种可能的数据流向;
图3示出了本申请实施例提供的图像拼接方法的工作原理;
图4示出了利用鱼眼相机采集图像并获得待拼接图像的过程;
图5示出了本申请实施例提供的模型训练方法的一种可能的流程;
图6示出了本申请实施例提供的模型训练方法中训练样本的一种可能的产生方式;
图7示出了本申请实施例提供的图像拼接装置的一种可能的结构;
图8示出了本申请实施例提供的电子设备的一种可能的结构。
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。
术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
术语“第一”、“第二”等仅用于将一个实体或者操作与另一个实体或操作区分开来,而不能理解为指示或暗示相对重要性,也不能理解为要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。
图1示出了本申请实施例提供的图像拼接方法的一种可能的流程,图2示出了在图像拼接方法的执行过程中一种可能的数据流向,供阐述该方法的步骤时参考。图像拼接方法可以但不限于由图8示出的电子设备执行,关于该电子设备的结构,可以参考后文关于图8的阐述。参照图1,该方法包括:
步骤S110:获取第一图像和第二图像。
第一图像和第二图像为待拼接的图像,在图2中分别将二者记为I
0和I
1。第一图像和第二图像的来源不限,例如,可以是摄像头采集的图像,也可以是计算机视觉算法生成的图像,等等,后文主要以摄像头采集的情况为例。第一图像和第二图像的图像尺寸相同,或者,第一图像和第二图像在采集时就是相同的尺寸,或者,第一图像和第二图像在采集时尺寸不同,但采集后被处理为相同的尺寸。
本申请提出的图像拼接方法原则上并不对第一图像和第二图像的图像内容进行限定,但考虑到图像拼接的实际用途,不妨以第一图像和第二图像是针对同一目标的、从不同的视角下采集的图像为例(比如,图3中的I
0和I
1分别在视角0和视角1下采集)。这里的目标泛指可拍摄的对象,例如,人物、动植物、风景等等。
由于第一图像和第二图像的拼接结果(简称拼接图像)在尺寸上通常大于第一图像和第二图像,因此,为便于进行图像拼接过程中的运算,在一些实现方式中,对于采集到的原始第一图像和原始第二图像,可以先将二者粗略地对齐(图像对齐是一种操作,理想情况下,对齐后的两张图像中对应于目标的同一位置的像素能够重叠在一起),然后根据需要在粗略对齐后的图像周围进行补0(即填充取值为0的像素),将其填充成和拼接图像同样的尺寸,然后再将补0后的图像作为第一图像和第二图像进行后续的图像拼接。例如,图2中的I
0和I
1中黑色的部分(I
0的左右侧和下部、I
1的左侧)即为补0的部分。
下面再分别以通过广角相机和鱼眼相机进行图像采集的情况为例进行说明:
对于前者,先通过广角相机拍摄得到第一广角图像和第二广角图像,然后根据相机标定方法计算一个全局的单应性矩阵,再根据这个单应性矩阵将第一广角图像和第二广角图像进行粗略的对齐,再对粗略对齐后的图像周围进行补0,得到最终的第一图像和第二图像。
对于后者,先通过鱼眼相机拍摄得到第一鱼眼图像和第二鱼眼图像,这两张图像的拍摄位置相同,拍摄方向恰好相反,如图4左侧一列所示。根据展开参数,可以分别将第一鱼眼图像和第二鱼眼图像展 开,得到展开后的第一展开图像和第二展开图像,如图4右侧一列所示。在展开的过程中,已经进行了补0操作,没有必要专门进行补0,并且,第一展开图像和第二展开图像已经粗略地对齐。第一展开图像分为两部分,分别是左侧的A1和右侧的B1,第二展开图像也分成两部分,分别是左侧的A2和右侧的B2,其中,A1和A2构成一组第一图像和第二图像,B1和B2构成另一组第一图像和第二图像。
步骤S120:根据第一图像和第二图像,计算得到第一中间光流和第二中间光流。
两张图像之间的光流可以认为是对目标在图像中所作运动的量化表示。这种运动既包括目标本身的运动,也包括拍摄两张图像的摄像头的位置移动(包括拍摄视角的变化),对于后者,因为运动是相对的,所以摄像头的位置移动了,目标在图像中的位置也发生变化,等效于目标本身的运动。具体地,目标的运动使得目标上的一点在两张图像中分别对应不同位置的像素,这两个像素之间坐标偏移量(一个向量)就是其中一个像素位置处的光流值,对于两张相同尺寸的图像,二者之间的光流也可以视为一张光流图,该光流图可以和这两张图像的尺寸相同,并且其中的每个像素值都是上面的光流值。
步骤S120中,中间图像是一张采集于中间视角下的虚拟图像(例如,图3中的视角m下采集的图像I
m),中间视角是一个介于第一图像和第二图像的采集视角之间的视角,这里所说的采集中间图像应理解为虚拟意义上的采集,即如果将摄像头放到中间视角下对目标进行拍摄,可以采集到中间图像,但实际中并没有执行这样的图像采集行为。中间图像的尺寸和第一图像、第二图像均相同。
需要指出,对于图像拼接方法的某些应用场景(例如,应用在图像拼接模型的训练阶段,详见后文阐述),中间图像并不一定是虚拟图像,也可能是真实采集的图像,不过,在阐述图1的步骤时,不妨暂时将中间图像理解为一张虚拟图像。
第一中间光流是指中间图像和第一图像之间的光流,第二中间光流是指中间图像和第二图像之间的光流,有时为简单起见也将二者统称为中间光流。第一中间光流有两种方向,一种是中间图像I
m到第一图像I
0的光流,一种是第一图像I
0到中间图像I
m的光流,分别记为F
m→0和F
0→m,在进行图像拼接时只需要使用其中一种方向的光流即可,图2中使用的是F
m→0。类似的,第二中间光流也有两种方向,一种是中间图像I
m到第二图像I
1的光流,一种是第二图像I
1到中间图像I
m的光流,分别记为F
m→1和F
1→m,在进行图像拼接时只需要使用其中一种方向的光流即可,图2中使用的是F
m→1(要和F
m→0保持一致,光流都是从I
m出发)。
在本申请的方案中,第一中间光流的尺寸与第一图像的尺寸相同,并且,第二中间光流的尺寸与第二图像的尺寸相同,从而便于在后续步骤中利用第一中间光流和第二中间光流直接对第一图像和第二图像进行映射(关于映射的意义,稍后阐述),进而快速地实现图像拼接。
在一些实现方式中,可以利用一个预训练的神经网络,以第一图像和第二图像为输入,估计出第一中间光流和第二中间光流。
然而,由于第一图像和第二图像是在不同视角下采集的,因此,两张图像中只有部分图像区域包含 有共同的画面。例如,图2中的I
0和I
1只有车头部分是共同的,车身及车尾部分则只存在于I
0中。根据前文对光流的定义,如果两张图像中分别包含对应于目标上同一点的像素,才能有效地进行光流估计,因此,对于第一图像和第二图像中包含共同画面的图像区域,由于像素之间存在对应关系,因此能够比较准确地进行中间光流的估计,对于第一图像和第二图像中不包含共同画面的图像区域,由于像素之间没有对应关系,因此难以进行中间光流的估计(或者说虽然能够计算光流值,但计算出的光流值不准确)。
从而,如果直接利用一个神经网络,基于完整的第一图像和第二图像进行中间光流估计,可能会得到较差的估计结果。因此,在一些实现方式中,可以先截取第一图像和第二图像中包含共同画面的图像区域,只在这部分区域之间利用神经网络进行光流估计,得到一个精度较高的小尺寸光流,然后再通过该小尺寸光流上采样得到大尺寸光流,即中间光流,这样有利于提高中间光流的估计精度。特别地,对于广角和鱼眼相机采集的图像,其重叠区域往往较大,非重叠区域则比较小,采用该方法更有利于得到高质量的光流估计结果。下面进行具体阐述:
步骤a:分别截取第一图像和第二图像中包含共同画面的图像区域,得到第一截图和第二截图。
例如,可以通过矩形框来截取第一图像和第二图像中包含共同画面的图像区域,截取出来的图像区域分别称为第一截图和第二截图,在图2中,第一截图和第二截图分别记为overlap-0和overlap-1。
步骤b:将第一截图和第二截图输入光流计算网络,得到第一截图光流和第二截图光流。
其中,光流计算网络是一个用于估计光流的神经网络,其训练方法在介绍图5时会进行阐述,该网络以第一截图和第二截图为输入,输出第一截图光流和第二截图光流,网络的具体结构不限定。
第一截图光流是指中间图像(更精确地说是中间图像中与截图区域对应的部分)和第一截图之间的光流,第二截图光流是指中间图像(更精确地说是中间图像中与截图区域对应的部分)和第二截图之间的光流,有时为简单起见也将二者统称为截图光流。类似于第一中间光流,第一截图光流和第二截图光流都有两个方向,图2中使用的是从中间图像到第一截图的光流以及从中间图像到第二截图的光流,分别记为F
m→overlap-0和F
m→overlap-1。
步骤c:将第一截图光流上采样至第一图像的尺寸,得到第一中间光流,以及,将第二截图光流上采样至第二图像的尺寸,得到第二中间光流。
由于第一截图光流的尺寸和第一截图相同,小于第一图像的尺寸,所以要得到第一中间光流,需要对第一截图光流进行上采样,类似的,对于第二截图光流也要进行上采样。由于光流也可以视为每个像素值都是一个向量的特殊图像,因此对光流进行上采样的方法可以采用针对图像的插值算法,例如最近邻插值、双线性插值、双三次插值等,当然也可以采用一些基于深度学习的上采样方法,例如DUpsampling、Meta-Upscale,等等。
由于在多数情况下,图像中目标的运动在全局和局部上是一致的,因此小尺寸的局部光流(即截图光流)所反映的目标运动规律和大尺寸的全局光流(即中间光流)是相同的,从而,在上采样过程中插 值产生的新光流值,其有效性可以保证。换句话说,对于第一图像和第二图像中不包含共同画面的图像区域,按照上采样计算出的光流值也是比较可靠的。进一步的,即使按照上采样计算出的一些光流值不够准确,在一些实现有拼接掩膜的实现方式中(见后文阐述),通过改变掩膜中像素的取值,也能够在一定程度上削弱因光流值计算不准确带来的负面影响。
关于步骤S120,还有一个问题需要澄清,就是采集中间图像的中间视角到底是指什么位置。准确来说,中间视角是指第一图像和第二图像对应的视角位置之间的某个预期的视角位置。以中间光流通过光流计算网络进行估计的情况为例,该预期的视角位置是在训练光流计算网络时就确定下来的,即训练数据使得光流计算网络要估计哪个位置的中间光流(准确地说是先估计截图光流再计算中间光流),训练好的光流计算网络就能估计哪个位置的中间光流,该位置就是采集中间图像的视角位置,即中间视角的位置。比如,根据后文对图5的阐述,中间视角的“中间”可以是利用单应性矩阵确定的、投影变换意义上的“中间”,而并非是指第一图像和第二图像对应的视角位置的正中。
步骤S130:根据第一中间光流、第二中间光流、第一图像和第二图像,计算得到第一图像和第二图像的拼接图像。
可选的,步骤S130可以进一步包括以下子步骤:
步骤A:根据第一中间光流和第一图像,映射得到第一中间图像,以及,根据第二中间光流和第二图像,映射得到第二中间图像。
第一中间图像和第二中间图像可以理解为中间图像的一部分(二者拼接后可得到中间图像),在图2中分别将二者记为I
m←0和I
m←1。根据前文的阐述,光流反映了被采集目标上的同一点在两张图像上所对应的像素之间的坐标偏移量,因此,已知两张图像中的一张以及两张图像之间的光流,可以估算出另一张图像,这种估算方式称为映射(warping)。
具体地,根据第一中间光流对第一图像进行映射,可以得到第一中间图像。根据第一中间光流方向的不同,这里存在两种不同的映射方式,如果第一中间光流为F
m→0,则采用后向映射(backward warping),映射过程可以表示为I
m←0=backward_warping(I
0,F
m→0);如果第一中间光流为F
0→m,则采用前向映射(forward warping),映射过程可以表示为I
0→m=forward_warping(I
0,F
0→m),在后文中,主要以目前应用得较为广泛的后向映射为例进行阐述。类似的,根据第二中间光流对第二图像进行映射,可以得到第二中间图像,映射过程可以表示为I
m←1=backward_warping(I
1,F
m→1)。
步骤B:根据第一中间光流和第二中间光流,计算得到目标光流。
可选的,步骤B可以进一步包括以下子步骤:
步骤B1:根据第一中间光流和第二中间光流,插值计算至少一个过渡光流。
首先引出过渡视角和过渡图像的概念。类似于中间图像,过渡图像是采集于过渡视角下的虚拟图像(例如,图3中的视角v下采集的图像I
v),过渡视角是介于第一图像和第二图像的采集视角之间的视 角,这里所说的采集过渡图像应理解为虚拟意义上的采集。
显然,理论上存在无数个过渡视角,例如,图3下方示出了4个过渡视角,分别命名为视角0.2、视角0.4、视角0.6以及视角0.8,每个过渡视角都对应一张过渡图像。这里的权重值0.2、0.4、0.6以及0.8(关于权重值表示的定义见后文的阐述)大致表示了各视角之间的位置关系,即从视角0开始,按照“视角0→视角0.2→视角0.4→视角0.6→视角0.8→视角1”的顺序,可以过渡至视角1。
定义了过渡视角和过渡图像之后,可以考虑一个虚拟的图像采集过程:在某个起始视角(例如,图3中的视角0)下采集第一图像,然后将摄像头依次移动至各个过渡视角(例如,图3中的视角0.2、视角0.4、视角0.6以及视角0.8)下采集过渡图像,最后在一个终止视角(例如,图3中的视角1)下采集到第二图像。可以形象地理解这个过程:拍摄者拿着一台手机,绕着某个拍摄目标移动,并在移动过程中用手机从不同的角度对目标进行持续拍摄。
以中间图像为参照物,将所有的采集到的图像都视为基于中间图像和光流进行映射的结果,那么每个视角下采集的图像可以与中间图像和该图像之间的光流对应起来。比如,第一图像对应第一中间光流,第二图像对应第二中间光流,过渡图像则对应过渡光流,过渡光流即中间图像和过渡图像之间的光流,按照图3可以记为F
m→v(当然也有可能是F
v→m)。当然,由于视角和图像具有对应关系,所以视角和光流之间也具有对应关系。
结合上面的虚拟的图像采集过程,该过程也可以视为第一图像依次变换为各个过渡图像,最终变换为第二图像的过程,在这一图像渐变的过程中,由于被采集图像和光流的对应关系,第一中间光流也依次变换为各个过渡光流,最终变换为第二中间光流。在这一光流渐变的过程中,由于相邻视角下采集的图像之间的视差较小,因此不同光流之间的过渡是比较平滑的,特别是在过渡视角选取得较多的情况下更是如此。
根据上面的分析,从第一中间光流到第二中间光流是平滑过渡的,从而将第一中间光流和第二中间光流视为两个端点,可以通过插值来计算这两个端点之间的任意位置的过渡光流,具体的插值算法不限,例如可以采用线性插值、二次插值、三次插值、距离倒数插值等方式,后文会以线性插值为例进行介绍,这里暂不展开阐述。
至于具体要计算哪些位置的过渡光流、要计算多少个过渡光流,可以根据实际需求确定,但至少应计算一个过渡光流。例如,在图3中,共计算了4个过渡光流,这些过渡光流分别位于视角0.2、视角0.4、视角0.6以及视角0.8的位置。但需要注意,并不需要先确定视角0.2的具体位置,再进行过渡光流F
m→0.2的计算,而是直接按照权重值0.2(关于权重值的定义见后文的阐述)插值计算出过渡光流F
m→0.2即可,F
m→0.2对应的位置自然就是视角0.2的位置。对于视角0.4、视角0.6以及视角0.8的情况也是类似,按照权重值0.4、0.6以及0.8可以分别插值计算出过渡光流F
m→0.4、F
m→0.6以及F
m→0.8。
步骤B2:根据第一中间光流、至少一个过渡光流以及第二中间光流,融合得到目标光流。
根据步骤B1中的阐述,第一中间光流、至少一个过渡光流以及第二中间光流之间是渐变的。步骤B2中的“融合”泛指一种光流合并操作,此操作将第一中间光流、至少一个过渡光流以及第二中间光流合并为一个光流,称为目标光流,并且使得目标光流能够体现出各光流之间渐变特性。可能采取的融合操作包括加权求和、拼接等,后文会以利用权重矩阵实现光流融合为例进行阐述。
进一步的,由于目标光流包含了各个视角下光流的信息、并且体现了各个视角下光流的渐变特性,因此可以认为目标光流也对应一个特殊的渐变视角,在该渐变视角下采集的图像中融合了第一图像、第二图像以及至少一个过渡图像的信息,从而可以全面地反映被采集目标在各个视角下的状态。即,渐变视角下采集的图像就是较为理想的拼接结果,也就是最终要计算的第一图像和第二图像之间的拼接图像(简称拼接图像),从而,也可以将目标光流视为中间图像与拼接图像之间的光流,用于拼接图像的计算。可以形象地理解渐变视角:拍摄者拿着一台手机,从一个起始视角开始、到一个终止视角结束,绕着某个目标移动拍摄,在拍摄过程中,手机不断地将从不同的角度下采集的图像拼接在一起,没有遗漏某个视角下的信息,最终构成的图像能够反映目标在起始视角和终止视角之间的全貌。在一些实现方式中,步骤B2可以分为两个子步骤:
首先,根据第一中间光流、至少一个过渡光流以及第二中间光流,融合得到初步光流;然后,将初步光流输入一个光流求逆网络,得到光流求逆网络输出的目标光流,图2示出了这两个子步骤。进行光流求逆的原因如下:
如果第一中间光流、至少一个过渡光流以及第二中间光流都是从中间图像出发的光流,则对他们直接融合得到的初步光流也是从中间图像出发的,在图2中记为F
m→0~1,即中间图像到拼接图像的光流,如果直接将F
m→0~1作为目标光流,那么步骤C中的映射(详见步骤C的阐述)只能采用前向映射,而前向映射目前由于存在一些缺陷导致应用得不太广泛,所以在上述实现方式中通过光流求逆将F
m→0~1转化为一个逆向的光流F
0~1→m,即拼接图像到中间图像的光流,并将F
0~1→m作为目标光流,在步骤C中就可以使用后向映射了。应当理解,如果有较好的前向映射方法,直接将F
m→0~1作为目标光流也是可以的。
需要指出,光流求逆不同于简单的矩阵求逆,其计算过程比较复杂,因此在上述实现方式中采用神经网络进行光流求逆运算,一方面有利于简化运算、提高光流求逆的效率,一方面可以利用神经网络的学习能力提高光流求逆精度,光流精度的提高显然有利于改善后续得到的拼接图像的质量。关于光流求逆网络的训练方法,在介绍图5时会进行阐述。光流求逆网络的具体结构不限。例如,在一些较简单的实现方式中,可以用L(L>1)个连续的卷积层构成光流求逆网络。其中第一个卷积层以初步光流为输入,最后一个卷积层输出目标光流。比如,取L=2,卷积核的尺寸取3×3,即光流求逆网络只包括2个3×3的卷积,那么光流求逆网络的计算过程可以表示为F
0~1->m=conv(conv(F
m->0~1,3,3),3,3),其中,conv表示卷积操作。
这样简单的网络设计很适合对广角相机或鱼眼相机拍摄到的图像进行拼接的情况,由于广角相机和鱼眼相机的拍摄范围都比较大,相对来说目标在画面中的运动幅度就比较小,所以各个光流(第一中间光流、第二中间光流、过渡光流)中的光流值的变化比较平缓,不容易出现大幅度的光流值变化,对于融合得到的初步光流亦是如此,对于这样的初步光流容易取逆,没有必要使用太复杂的网络结构,简单的网络反而能够提高光流求逆的效率。
在步骤B的另一些实现方式(不同于步骤B1和B2)中,也可以不计算过渡光流,而是直接根据第一中间光流和第二中间光流融合得到目标光流(类似于上面的方案,可以先融合得到初步光流,然后将初步光流直接作为目标光流或者利用光流求逆网络计算得到目标光流),此时目标光流的渐变特性虽然不如上面的方案(指步骤B1和B2),但计算起来更为简单。
参照上面的分析,可以认为此时的目标光流对应一个退化的渐变视角(从第一图像的视角直接过渡至第二图像的视角),在该渐变视角下采集的图像中融合了第一图像以及第二图像的信息,虽然其中没有融合过渡图像的信息,但已包含了用于图像拼接的所有原始信息(第一图像和第二图像),并且也可以反映被采集目标在不同视角下的状态,从而也是比较理想的拼接图像,该拼接图像同样可以通过目标光流和中间图像进行计算。步骤C:根据目标光流和第一中间图像,映射得到第一拼接图像,以及,根据目标光流和第二中间图像,映射得到第二拼接图像。
第一拼接图像和第二拼接图像可以理解为步骤D中要计算的拼接图像的一部分(二者拼接后可得到拼接图像),在图2中分别将二者记为I
0~1←m←0和I
0~1←m←1。I
0~1←m←0中的下标0~1←m←0是(0~1)←(m←0)的简写,表示利用第一拼接图像I
m←0和目标光流F
0~1→m进行后向映射的结果,I
0~1←m←1中的下标0~1←m←1是(0~1)←(m←1)的简写,表示利用第二拼接图像I
m←1和目标光流F
0~1→m进行后向映射的结果。如果采用前向映射,也可以类似表示,不再具体阐述。步骤B中提到,可以将目标光流视为中间图像与拼接图像之间的光流,从而这样的映射是可行的。
步骤D:根据第一拼接图像和第二拼接图像,拼接得到第一图像和第二图像的拼接图像。
步骤C中已经得到了第一拼接图像和第二拼接图像,将二者拼接起来就可以得到最终的拼接图像。例如,在图2中将第一拼接图像的左边图像区域和第二拼接图像的右边图像区域进行拼接即可得到拼接图像,记为I
0~1。
如果目标光流的精度足够高,那么计算出的第一拼接图像和第二拼接图像已经是对齐的,直接将二者叠加在一起就可以得到拼接图像。但考虑到目标光流的计算精度受很多因素影响(例如,光流计算网络的预测精度、光流上采样精度等),其精度未必足够高,进而基于目标光流计算出第一拼接图像和第二拼接图像并不能很好地对齐,因此在一些实现方式中,可以通过设置拼接掩膜来实现第一拼接图像和第二拼接图像在拼接之处的平滑过渡,以改善拼接图像的质量。其具体做法如下:
首先,将第一拼接图像和第二拼接图像输入掩膜计算网络,得到掩膜计算网络输出的拼接掩膜;然 后,基于该拼接掩膜,将第一拼接图像和第二拼接图像进行拼接,得到拼接图像。图2示出了利用掩膜实现图像拼接的步骤,拼接掩膜记为mask。
其中,掩膜计算网络是一个预训练的神经网络,具体的结构不限,其的训练方法在介绍图5时会进行阐述。掩膜计算网络的输入至少包括第一拼接图像和第二拼接图像,但也不排除包括其他信息,例如,目标光流。利用神经网络的学习能力,可以比较精确地预测掩膜,有利于提高拼接图像的质量。
拼接掩膜也可以视为一张图像,其尺寸与第一拼接图像(或第二拼接图像)的图像尺寸相同。可选的,拼接掩膜中的像素值可取区间[0,1]之间的数值,具体取值由掩膜计算网络进行计算。对于像素位置(x,y),获取第一拼接图像在该位置处的像素值p
1、第二拼接图像在该位置处的像素值p
2以及拼接掩膜在该位置处的像素值m,则拼接图像在(x,y)处的像素值p可以按照公式p=m×p
1+(1-m)×p
2进行加权计算,m代表p
1的加权系数。当然如果m代表p
2的加权系数,该公式可以改为p=(1-m)×p
1+m×p
2,与前一个公式并无本质区别。
例如,若拼接掩膜中的像素值代表第一拼接图像中像素值的加权系数,则一个可能的拼接掩膜如下:
该拼接掩膜左右两列的值分别为1和0,中间两列的值为0.5,表示拼接图像的左侧两列取第一拼接图像中的像素值,右侧两列取第二拼接图像中的像素值,中间两列则取两张拼接图像中像素值的均值。这样取值是合理的,中间两列很可能位于两张拼接图像中包含共同画面的图像区域,通过取均值使得第一拼接图像和第二拼接图像在该区域得以平滑过渡。
注意,由于是举例,所以该拼接掩膜只有6×6的尺寸,并不适于图2中图像的拼接。但图2中示出的实际拼接掩膜和该例子也十分类似,左侧为白色代表像素取值为1,右侧为黑色代表像素取值为0,中间为灰色代表像素取区间(0,1)之内的值。
可以理解的,并非必须通过掩膜才能拼接第一拼接图像和第二拼接图像,例如,也可以先将两张拼接图像叠加在一起,然后通过平滑滤波改善画面中过渡比较突兀的部分。
在步骤S130的另一些实现方式(不同于步骤A至D)中,也可以先计算第一中间图像和第二中间图像(类似步骤A);然后拼接第一中间图像和第二中间图像(直接拼接或者利用掩膜拼接),得到中间图像;再根据第一中间光流和第二中间光流计算目标光流(类似步骤B);最后根据中间图像和目标光流,映射得到拼接图像。各步骤细节可以参考步骤A至D,不再重复阐述。
综上所述,本申请实施例提供的图像拼接方法计算过程比较简单,不需要像某些传统的图像拼接方法一样通过复杂的迭代对多个单应性矩阵进行运算,从而可以使图像拼接变得更加实时,实用性得到加 强。
在该方法的一些实现方式中,通过计算目标光流实现图像拼接。由于目标光流是第一中间光流、至少一个过渡光流以及第二中间光流融合产生的(也可能不包含过渡光流,可类似分析),所以其中包含了各个视角下光流信息,考虑到图像和光流之间的对应关系,根据目标光流计算出的拼接图像,也融合了第一图像、第二图像以及至少一个过渡图像的信息,从而可以全面地反映被采集目标在第一图像和第二图像之间的各个视角下的状态,即一个比较理想的拼接结果。该方法获得的拼接图像质量较高,改善了传统图像拼接方法存在的伪影、失真、待拼接图像难以对齐等问题,在第一图像和第二图像具有较大视差的情况下,也能有效地进行图像拼接。
应当理解,如果要对两张以上的图像进行拼接,可以连续地应用上述图像拼接方法。例如,要对第一图像、第二图像以及第三图像进行拼接,可以先利用该方法拼接第一图像和第二图像,得到一个中间拼接结果,再利用该方法将中间拼接结果与第三图像进一步拼接,得到最终的拼接图像。
下面,在上述实施例的基础上,继续介绍步骤B1中通过线性插值计算过渡光流的方法。
已知两个端点(第一中间光流、第二中间光流)的情况下进行线性插值,实际上就是进行加权求和运算。其具体做法可以是:
首先,获取至少一个权重值;然后,分别基于每个权重值,对第一中间光流和第二中间光流进行加权求和,得到至少一个过渡光流。
其中,权重值的数量和过渡光流的数量相同,例如,权重值取4个,则通过加权求和计算4个过渡光流,具体数量可以根据需求确定。权重值可以预先设置好(例如,写在配置文件或程序里),需要计算过渡光流时直接读取使用,当然也可以在计算过渡光流时通过一定的算法生成权重值。
在进行加权求和时,需要两个加权系数,分别是第一中间光流的加权系数和第二中间光流的加权系数,这两个加权系数是相互制约的,只要知道了其中一个加权系数,另一个加权系数也就可以相应地计算。
比如,限定加权系数在区间(0,1)内取值,且两个加权系数之和为1。此时,可以将取值范围在区间(0,1)内的权重值w作为其中一个中间光流的加权系数,例如第一中间光流的加权系数,则第二中间光流加权系数为1-w。过渡光流的计算公式可以表示为:
F
m->v=w×F
m->0+(1-w)×F
m->1
可以理解的,若将权重值w作为第二中间光流的加权系数,则第一中间光流加权系数为1-w。过渡光流的计算公式可以表示为:
F
m->v=(1-w)×F
m->0+w×F
m->1
以上两个公式并没有实质性区别。线性插值具有运算简单的优点,并且,在多数情况下(特别是第一图像和第二图像的采集时间间隔不太长的情况下),线性运动足以描述目标在第一图像和第二图像之 间的运动,线性插值计算出的过渡光流精度也足够高。当然,如前文所述,也可以采取其他非线性插值的方式。
下面进一步说明权重值的一些设置原则:
作为一项原则,可以将权重值的大小设置为与该权重值对应的过渡光流的视角位置相关,这样设置可以确保利用权重值计算出的过渡光流与其所在的视角位置是一致的,进而确保后续利用过渡光流计算的目标光流能够体现光流渐变的特性。
例如,假定第一中间光流的加权系数和第二中间光流的加权系数之和为1,且权重值为第一中间光流的加权系数,则可以将权重值的大小设置为:与该权重值对应的过渡光流的视角位置和第一中间光流的视角位置之间的接近程度正相关(或者说,与该权重值对应的过渡光流的视角位置和第二中间光流的视角位置之间的接近程度负相关,二者等价)。
具体而言,若过渡光流的视角位置和第一中间光流的视角位置之间越接近,就将权重值设置得越大,即增大第一中间光流的加权系数同时减小第二中间光流的加权系数,这样过渡光流的取值将更多地受到第一中间光流的影响而更少地受到第二中间光流的影响,与其所在的视角位置具有一致性;类似的,若过渡光流的视角位置和第一中间光流的视角位置之间越远离,就将权重值设置得越小,即减小第一中间光流的加权系数同时增大第二中间光流的加权系数,这样过渡光流的取值将更多地受到第二中间光流的影响而更少地受到第一中间光流的影响,与其所在的视角位置具有一致性。
例如,在图3中,视角0.2、视角0.4、视角0.6、视角0.8与视角0的接近程度逐渐减小,所以视角0.2下计算过渡光流F
m->0.2的权重值w=0.8>视角0.4下计算过渡光流F
m->0.4的权重值w=0.6>视角0.6下计算过渡光流F
m->0.6的权重值w=0.4>视角0.8下计算过渡光流F
m->0.8的权重值w=0.2,相应的过渡光流可以按照以下公式计算:
F
m->0.2=0.8×F
m->0+0.2×F
m->1
F
m->0.4=0.6×F
m->0+0.4×F
m->1
F
m->0.6=0.4×F
m->0+0.6×F
m->1
F
m->0.8=0.2×F
m->0+0.8×F
m->1
想象有若干个过渡光流分布在第一中间光流和第二中间光流之间,若所有过渡光流对应的权重值均按上述规律(指正相关规律)设置,则从第一中间光流到第二中间光流,各过渡光流受到第一中间光流的影响从强到弱,受到第二中间光流的影响则从弱到强,从而可以保证计算出的过渡光流之间是渐变的。
需要指出,将权重值的大小设置为与该权重值对应的过渡光流的视角位置相关,并不是说要先精确地计算出过渡光流的视角位置,才能进一步确定相应的权重值,只是说设置权重值的时候要考虑过渡光流的视角位置这一因素。例如,设置权重值w=0.8的时候,并不需要量化地计算出视角0.2的位置,只需要知道视角0.2比视角0.4、视角0.6、视角0.8都更接近于视角0,且视角0.2与视角0之间的夹角大 致占视角0和视角1之间的夹角的20%即可,权重值w=1-20%=0.8。
作为另一项原则,可以将至少一个权重值设置为在区间(0,1)内均匀分布。例如,若只有一个权重值,则该权重值可取0.5,该权重值与0和1的间隔都是0.5,属于均匀分布;若有M(M>1)个权重值,则这些权重值可取i/(M+1)(i取1至M之间的整数),任意两个权重值之间的间隔都是1/(M+1),并且第一个权重值和0之间的间隔、第M个权重值和1之间的间隔也是1/(M+1),属于均匀分布。比如,图3就是M=4的情况。
在这些实现方式中,由于权重值在区间(0,1)内均匀分布,从而不严格地来说,利用这些权重值计算出的过渡光流的视角位置也在第一图像和第二图像之间均匀分布,这样的视角位置分布使得所采集的过渡图像能够充分地描述被采集目标在第一图像和第二图像所对应视角之间的全貌(而没有偏重于某些视角),由于拼接图像可视为融合了所有过渡图像的信息,因此这样得到的拼接图像质量较高。
当然,权重值在区间(0,1)内均匀分布并不是强制的,例如,也可以将权重值设置为在(0,1)中的某些区间内分布的密集一些,在另一些区间内分布得稀疏一些。
应当理解,上述两项权重值的设置原则也可以结合在一起使用。
下面,在上述实施例的基础上,继续介绍步骤B2中根据第一中间光流、至少一个过渡光流以及第二中间光流,融合得到目标光流的方法。
在一些实现方式中,可以利用权重矩阵实现光流融合,具体做法为:
首先,获取N+2个权重矩阵,N为过渡光流的总数量(N≥1);然后,基于N+2个权重矩阵,对第一中间光流、N个过渡光流以及第二中间光流进行加权求和,得到初步光流;最后,根据实现方式的不同,可以将该初步光流直接作为目标光流或者求逆后作为目标光流。
以要进行光流求逆的情况为例,可以用公式
来表示初步光流的计算过程。其中,F
t表示待融合光流,可以是第一中间光流、过渡光流或者第二中间光流,W
t表示F
t对应的权重矩阵,即第t个权重矩阵,×表示矩阵乘法。权重矩阵中的每个元素都可以视为加权求和的一个权重值,可选的,权重矩阵中的元素在区间[0,1]内取值,各权重矩阵中的元素满足关系
表示第t个权重矩阵中第i行第j列的元素。
利用权重矩阵对过渡光流进行加权求和,不同于利用一维的权重值进行加权求和(例如,计算过渡光流),权重矩阵是二维的,从而可以更加灵活地在初步光流中组合不同的待融合光流的信息,使得初步光流能够反映第一中间光流到第二中间光流的渐变,进一步使得通过光流求逆算出的目标光流也能反映这种渐变。
例如,可以按照以下规则设置权重矩阵中的元素:使权重矩阵中元素的最大值位置与该权重矩阵对应的光流的视角位置相关。
例如,上面的“相关”可以是指权重矩阵中元素的最大值位置与其对应的待融合光流的视角在整个 视角范围内(指第一图像的视角和第二图像的视角之间的区域)的位置一致。
结合图3对“一致”的含义进行解释,同时为简单起见不妨假设F
t矩阵只有6列:
F
1(图3中的F
m->0)是第一中间光流,对应的视角就是视角0,该视角位于整个视角范围(从视角0到视角1)的最左侧,从而按照上述规则设置的W
1,最左侧一列取最大值,其余各列的可以逐渐减小,或者按照其他方式设置,例如,下面两个W
1都满足上述规则:
F
2(图3中的F
m->0.2)是位于F
1之后的过渡光流,对应的视角就是视角0.2,该视角位于整个视角范围的左起大约20%的位置,从而按照上述规则设置的W
2,左起第二列取最大值,其余各列的可以逐渐减小,或者按照其他方式设置。例如,下面两个W
2都满足上述规则:
对于F
3、F
4、F
5、F
6(图3中的F
m->0.4、F
m->0.6、F
m->0.8、F
m->1),其对应的权重矩阵W
3、W
4、W
5、W
6的设置方式是类似的,不再详细阐述。
可选的,各个权重矩阵中元素的最大值可以保持统一,例如在上面的W
1和W
2中,元素的最大值或者均为1,或者均为0.8。上面的例子容易推广到光流包含更多列的情况,此时权重矩阵中取最大值的元素可能为一列或多列。例如,对于图3场景中的W
2(此时不再限定光流为6列),就是矩阵总列数×20%位置处的一列或者附近的几列取最大值。
某个权重矩阵中元素的最大值位置与其对应的待融合光流(也是一个矩阵)的视角位置一致,则利用该权重矩阵进行加权计算后,可使得待融合光流的光流矩阵中,在其视角位置对应的那些光流值对初步光流中同位置的光流值计算贡献最大,在其余位置对应的光流值对初步光流中同位置光流值的计算贡 献则相对较小。如果所有权重矩阵均按照上述规则设置,并且保证各个权重矩阵中元素的最大值一致,可以使得在初步光流的光流矩阵中,对应于每个视角位置的光流值主要由该视角位置所对应的待融合光流贡献,从而使得初步光流能够反映第一中间光流到第二中间光流的渐变,进一步使得通过光流求逆算出的目标光流也能反映这种渐变。
例如,仍然考虑待融合光流只有6列的情况,将W
t(t取1至6的整数)设置为第t列元素均取1,其余各列元素均取0的形式。则进行加权求和后,在得到的初步光流F
m->0~1中,第t列来自F
t,或者说每个F
t贡献了F
m->0~1中与其视角位置对应的一列元素值,因此F
m->0~1够反映从F
1到F
6的光流渐变。作为对比的,如果采用一维的权重值(假设记为w
t)对待融合光流进行加权求和,由于w
t不加区分地作用于F
t中的每个光流值,所以并不能很好地体现出F
t中不同位置的光流值对F
m->0~1中的光流值在计算贡献上的区别,因此所得到的F
m->0~1不能很好地反映从F
1到F
6的光流渐变。
需要指出,上面的“相关”也可以采取其他定义,权重矩阵中元素的最大值位置与其对应的待融合光流的视角在整个视角范围内的位置也可以不严格一致,只是大致上一致,等等。
考虑这样一种场景,以上阐述的图像拼接方法是基于一个称为图像拼接模型的神经网络模型实现的,该模型包括三个子网络,分别是光流计算网络、光流求逆网络以及掩膜计算网络。其中,光流计算网络用于估计第一截图光流和第二截图光流,进而可以计算出第一中间光流和第二中间光流(见步骤b);光流求逆网络用于计算初步光流的逆向光流,得到目标光流(见步骤B2);掩膜计算网络用于计算拼接掩膜,进而可以计算出拼接图像(见步骤D)。
以上三个子网络的功能前文已经详细介绍,不再重复阐述,下面在以上实施例的基础上,介绍图像拼接模型的训练方法。其可能的流程如图5所示。该模型训练方法可以但不限于由图8示出的电子设备执行,关于该电子设备的结构,可以参考后文关于图8的阐述。参照图5,该方法包括:
步骤S210:利用图像拼接模型计算第一图像和第二图像的拼接图像。
步骤S220:获取第一真实截图光流、第二真实截图光流、真实目标光流以及真实拼接图像。
步骤S230:根据第一截图光流、第二截图光流、第一真实截图光流和第二真实截图光流计算光流预测损失。
步骤S240:根据目标光流和真实目标光流计算光流求逆损失。
步骤S250:根据拼接图像和真实拼接图像计算图像拼接损失。
步骤S260:根据光流预测损失、光流求逆损失和图像拼接损失计算总损失,并根据总损失更新光流计算网络、光流求逆网络和掩膜计算网络的参数。
其中,步骤S210可以采用本申请实施例提供的图像拼接方法实现(相应的步骤要使用图像拼接模型中的三个子网络),不再重复阐述。但须注意,由于图8是一种训练方法,应用于模型训练阶段,因此步骤S210中的第一图像和第二图像是训练用的图像,步骤S110中则没有限定第一图像和第二图像是 训练用的图像还是推断用的图像。
步骤S220中的第一真实截图光流、第二真实截图光流、真实目标光流以及真实拼接图像均为监督信号。其中,第一真实截图光流和第二真实截图光流在步骤S230中用于计算光流预测损失,以监督光流计算网络的训练;真实目标光流在步骤S240中用于计算光流求逆损失,以监督光流求逆网络的训练;真实拼接图像在步骤S250中用于计算图像拼接损失,以监督掩膜计算网络的训练。当然,针对于图像拼接模型中靠后的子网络所计算出的损失对于图像拼接模型中靠前的子网络也可能存在监督作用,例如,光流求逆损失对于光流计算网络也可能会起到监督作用,图像拼接损失对于光流计算网络和光流求逆网络也可能会起到监督作用。
步骤S220至步骤S250在执行顺序上比较灵活,下面进行说明:
步骤S220中的三项监督信号可以一起获取,也可以分别获取,这里所说的“获取”包括进行计算或者直接读取的情况。对于分别获取的情况,可以在获取某项监督信号后,立即进行相应损失的计算,例如,获取第一真实截图光流和第二真实截图光流后,就执行步骤S230进行计算光流预测损失(前提是第一截图光流和第二截图光流已经计算好),并不一定要等到三项监督信号都获取完毕后才执行步骤S230。
步骤S220的执行时机和步骤S210没有必然的关系,可以在步骤S210之前执行,可以在步骤S210之后执行(图8示出的情况),也可以和步骤S210并行执行。
步骤S230至步骤S250中三项损失的计算时机不限。例如,可以等步骤S210执行完了之后再计算这三项损失,此时第一截图光流、第二截图光流、目标光流、拼接图像都在执行步骤S210的过程中算出,从而可以进行损失计算,并且此时三项损失的计算顺序是无所谓的。又例如,可以在步骤S210的过程中就计算损失,比如,计算出第一截图光流和第二截图光流后就可以执行步骤S230计算(前提是第一真实截图光流和第二真实截图光流已经获取),不必等到拼接图像计算完毕后才计算。
步骤S260中根据光流预测损失、光流求逆损失和图像拼接损失计算总损失,可以采用加权求和、计算乘积等方式,以加权求和为例,总损失可以表示为:Loss=α×Loss1+β×Loss2+γ×Loss3。其中,α、β、γ为加权系数,Loss1、Loss2、Loss3分别是光流预测损失、光流求逆损失和图像拼接损失。根据总损失进行反向传播,可以实现图像拼接模型的参数更新。
上述模型训练方法由于同时考虑了与三个子网络对应的损失:光流预测损失、光流求逆损失以及图像拼接损失,即通过训练同时提高模型的中间光流预测精度、目标光流预测精度以及拼接掩膜预测精度,从而最终得到的图像拼接模型能够实现高质量的图像拼接。
应当理解,在某些实现方式中,即使图像拼接模型包括上述三个子网络,也可以只计算其中一项或者两项损失,并不需计算全部三项损失,例如,由于图像拼接损失对于光流计算网络和光流求逆网络也能起到监督作用,所以也可以只计算图像拼接损失。显然,若某些损失不计算,相应的监督信号也无需 获取。
还应当理解,图像拼接方法的某些步骤也可能不利用神经网络来实现,从而图像拼接模型也不一定会包含上述三个子网络。例如,也可以通过其他方法进行光流求逆,这样图像拼接模型将不包含光流求逆网络,相应的光流求逆损失自然也就无需计算。
在一些实现方式中,可以将真实图像作为中间图像来生成步骤S210中的第一图像和第二图像。这里所说的真实图像,可以是真实采集的图像,当然也可以是计算机视觉算法生成的图像,总之,此时的中间图像是已知的、实际存在的图像,与前文对中间图像的解释不同(前文在介绍步骤S120时,将中间图像视为一张虚拟图像)。生成一组第一图像和第二图像的方法为:
根据中间图像和某个单应性矩阵,计算得到第一图像,以及,根据中间图像和该单应性矩阵的逆矩阵,计算得到第二图像。图6示出了这一图像生成过程,其中h表示某个单应性矩阵,h
-1表示h的逆矩阵,基于中间图像,分别利用h和h
-1进行投影变换就可以得到第一图像和第二图像。
显然,对于一张给定的中间图像,只需改变h,就可以得到一组不同的第一图像和第二图像,从而该方法可以基于少量的真实图像“创造”出大量供训练用的待拼接图像。单应性矩阵h可以预先设置好,也可以通过算法临时生成。
进一步的,步骤S220中的三项监督信号可以通过以下方式生成:
(1)真实截图光流的生成
根据h计算第一真实中间光流,并根据第一真实中间光流计算第一真实截图光流,以及,根据h
-1计算第二真实中间光流,并根据第二真实中间光流计算第二真实截图光流。真实截图光流应理解为理想情况下的截图光流估计结果。
单应性矩阵所代表的投影变换给出了两张图像像素之间的对应关系,因此已知单应性矩阵很容易计算两张图像之间的光流。
从而,利用中间图像和第一图像之间的单应性矩阵h,可以计算二者之间的光流,称为第一真实中间光流,获得第一真实中间光流后,通过截图(截取第一图像和第二图像中包含共同画面的图像区域)就可以得到第一真实截图光流。类似的,利用中间图像和第二图像之间的单应性矩阵h
-1,可以计算二者之间的光流,称为第二真实中间光流,获得第二真实中间光流后,通过截图就可以得到第二真实截图光流。
(2)真实目标光流的生成
根据h和h
-1,插值计算至少一个过渡矩阵,根据h、至少一个过渡矩阵以及h
-1,融合得到目标矩阵,并根据目标矩阵计算真实目标光流。真实目标光流应理解为理想情况下的初步光流求逆结果。
这里采用的插值方法和前面计算过渡光流时的插值方法类似(例如,利用权重值加权求和),这里采用的融合方法则和前面计算目标光流时的融合方法类似(例如,利用权重矩阵加权求和),虽然前面 的计算方法针对的对象是光流,但从数学意义上看,单应性矩阵和光流都是矩阵,并没有本质区别,所以之前针对光流的计算方法可以应用在此处。需要指出,这里融合h、至少一个过渡矩阵以及h
-1之后,就可以直接得到目标矩阵,并不需要计算逆矩阵(目标光流要通过初步光流求逆得到),因为对于单应性矩阵的融合来说,求逆需要在融合之前对矩阵本身进行,但对h求逆就是h
-1,对h
-1求逆就是h,只是将二者交换了一下而已,所以矩阵求逆的步骤就可以省略了。
由于目标矩阵也可以视为中间图像和真实拼接图像之间的单应性矩阵(如图6所示),所以根据目标矩阵可以计算二者之间的光流,即真实目标光流。
另外,如果某个可选方案中并不需要计算过渡光流,那么上面的过渡矩阵也无需计算,直接根据h和h
-1计算目标矩阵就可以了。
(3)真实拼接图像
根据中间图像和目标矩阵可以计算真实拼接图像,与计算第一图像和第二图像类似,不再重复阐述。真实拼接图像应理解为理想情况下第一图像和第二图像的拼接结果。
在上述实现方式中,如果将一组待拼接图像(包括第一图像和第二图像)及其对应的监督信号视为一个训练样本,由于单应性矩阵可以任意指定,所以此种实现方式可以利用少量的真实图像(这些图像可以位于训练集中)快速地生成大量的训练样本,并且选取不同的单应性矩阵,可使得这些样本覆盖不同的场景,从而使训练得到的图像拼接模型具有良好的泛化能力。
介绍了训练数据的生成后,可以对中间图像中“中间”作出更为明确的解释,这里的“中间”可以指利用单应性矩阵确定的、投影变换意义上的“中间”,即在该“中间”位置采集的图像,通过某个单应性矩阵可以变换得到第一图像,并且,通过该单应性矩阵的逆矩阵,可以变换得到第二图像。可以理解的,如果采用其他方式生成训练数据,“中间”的定义也会相应地变换。
图7示出了本申请实施例提供的图像拼接装置300的结构。参照图7,图像拼接装置300包括:
图像获取模块310,用于获取第一图像和第二图像;
中间光流计算模块320,用于根据所述第一图像和所述第二图像,计算得到第一中间光流和第二中间光流,所述第一中间光流为中间图像和所述第一图像之间的光流,所述第二中间光流为所述中间图像和所述第二图像之间的光流,所述中间图像为视角介于第一图像和第二图像之间的图像,所述第一中间光流的尺寸与所述第一图像的尺寸相同,所述第二中间光流的尺寸与所述第二图像的尺寸相同;
图像拼接模块330,用于根据所述第一中间光流、所述第二中间光流、所述第一图像和所述第二图像,计算得到所述第一图像和所述第二图像的拼接图像。
在图像拼接装置300的一种实现方式中,中间光流计算模块320根据所述第一图像和所述第二图像,计算得到第一中间光流和第二中间光流,包括:分别截取所述第一图像和所述第二图像中包含共同画面的图像区域,得到第一截图和第二截图;将所述第一截图和所述第二截图输入光流计算网络,得到第一 截图光流和第二截图光流,所述第一截图光流为所述中间图像和所述第一截图之间的光流,所述第二截图光流为所述中间图像和所述第二截图之间的光流;将所述第一截图光流上采样至所述第一图像的尺寸,得到所述第一中间光流,以及,将所述第二截图光流上采样至所述第二图像的尺寸,得到所述第二中间光流。
在图像拼接装置300的一种实现方式中,图像拼接模块330根据所述第一中间光流、所述第二中间光流、所述第一图像和所述第二图像,计算得到所述第一图像和所述第二图像的拼接图像,包括:根据所述第一中间光流和所述第一图像,映射得到第一中间图像,以及,根据所述第二中间光流和所述第二图像,映射得到第二中间图像;根据所述第一中间光流和所述第二中间光流,计算得到目标光流;根据所述目标光流和所述第一中间图像,映射得到第一拼接图像,以及,根据所述目标光流和所述第二中间图像,映射得到第二拼接图像;根据所述第一拼接图像和所述第二拼接图像,拼接得到所述拼接图像。
在图像拼接装置300的一种实现方式中,图像拼接模块330根据所述第一中间光流和所述第二中间光流,计算得到目标光流,包括:根据所述第一中间光流和所述第二中间光流,插值计算至少一个过渡光流;根据所述第一中间光流、所述至少一个过渡光流以及所述第二中间光流,融合得到所述目标光流。
在图像拼接装置300的一种实现方式中,图像拼接模块330根据所述第一中间光流和所述第二中间光流,插值计算至少一个过渡光流,包括:获取至少一个权重值;分别基于每个权重值,对所述第一中间光流和所述第二中间光流进行加权求和,得到所述至少一个过渡光流。
在图像拼接装置300的一种实现方式中,所述权重值的大小与该权重值对应的过渡光流的视角位置相关。
在图像拼接装置300的一种实现方式中,所述第一中间光流的加权系数和所述第二中间光流的加权系数之和为1,所述权重值为所述第一中间光流的加权系数,所述权重值的大小与该权重值对应的过渡光流的视角位置和所述第一中间光流的视角位置之间的接近程度正相关。
在图像拼接装置300的一种实现方式中,所述至少一个权重值在区间(0,1)内均匀分布。
在图像拼接装置300的一种实现方式中,图像拼接模块330根据所述第一中间光流和所述第二中间光流,计算得到所述目标光流,包括:根据所述第一中间光流和所述第二中间光流,计算得到初步光流;将所述初步光流输入光流求逆网络,得到所述目标光流。
在图像拼接装置300的一种实现方式中,图像拼接模块330根据所述第一中间光流、所述至少一个过渡光流以及所述第二中间光流,融合得到所述目标光流,包括:获取N+2个权重矩阵,N为所述过渡光流的总数量;基于所述N+2个权重矩阵,对所述第一中间光流、N个过渡光流以及所述第二中间光流进行加权求和,得到所述初步光流;所述初步光流为所述目标光流,或者,将所述初步光流输入光流求逆网络,得到所述目标光流。
在图像拼接装置300的一种实现方式中,所述权重矩阵中元素的最大值位置与该权重矩阵对应的光 流的视角位置相关。
在图像拼接装置300的一种实现方式中,图像拼接模块330根据所述第一拼接图像和所述第二拼接图像,拼接得到所述拼接图像,包括:将所述第一拼接图像和所述第二拼接图像输入掩膜计算网络,得到拼接掩膜;基于所述拼接掩膜,将所述第一拼接图像和所述第二拼接图像进行拼接,得到所述拼接图像。
在图像拼接装置300的一种实现方式中,所述第一中间光流和第二中间光流利用光流计算网络输出的第一截图光流和第二截图光流计算得到,所述目标光流利用光流求逆网络计算得到,所述拼接图像利用掩膜计算网络输出的拼接掩膜计算得到;所述装置还包括:
监督信号获取模块,用于获取第一真实截图光流、第二真实截图光流、真实目标光流以及真实拼接图像;
损失计算模块,用于根据所述第一截图光流、所述第二截图光流、所述第一真实截图光流和所述第二真实截图光流计算光流预测损失;以及,根据所述目标光流和所述真实目标光流计算光流求逆损失;以及,根据所述拼接图像和所述真实拼接图像计算图像拼接损失;
参数更新模块,用于根据所述光流预测损失、所述光流求逆损失和所述图像拼接损失计算总损失,并根据所述总损失更新所述光流计算网络、所述光流求逆网络和所述掩膜计算网络的参数。
在图像拼接装置300的一种实现方式中,所述目标光流根据所述第一中间光流、至少一个过渡光流以及所述第二中间光流融合得到,图像获取模块310获取第一图像和第二图像,包括:根据所述中间图像和指定的单应性矩阵,计算得到所述第一图像,以及,根据所述中间图像和所述单应性矩阵的逆矩阵,计算得到所述第二图像,所述中间图像为真实图像;
监督信号获取模块获取第一真实截图光流、第二真实截图光流、真实目标光流以及真实拼接图像,包括:根据所述单应性矩阵计算第一真实中间光流,并根据所述第一真实中间光流计算所述第一真实截图光流;根据所述单应性矩阵的逆矩阵计算第二真实中间光流,并根据所述第二真实中间光流计算所述第二真实截图光流;根据所述单应性矩阵和所述单应性矩阵的逆矩阵,插值计算至少一个过渡矩阵,根据所述单应性矩阵、所述至少一个过渡矩阵以及所述单应性矩阵的逆矩阵,融合得到目标矩阵,并根据所述目标矩阵计算所述真实目标光流;根据所述中间图像和所述目标矩阵计算所述真实拼接图像。
本申请实施例提供的图像拼接装置300,其实现原理及产生的技术效果在前述方法实施例中已经介绍,为简要描述,装置实施例部分未提及之处,可参考方法实施例中相应内容。
图8示出了本申请实施例提供的电子设备400的一种可能的结构。参照图8,电子设备400包括:处理器410、存储器420以及通信接口430,这些组件通过通信总线440和/或其他形式的连接机构(未示出)互连并相互通讯。
其中,处理器410包括一个或多个(图中仅示出一个),其可以是一种集成电路芯片,具有信号的 处理能力。上述的处理器410可以是通用处理器,包括中央处理器(Central Processing Unit,简称CPU)、微控制单元(Micro Controller Unit,简称MCU)、网络处理器(Network Processor,简称NP)或者其他常规处理器;还可以是专用处理器,包括图形处理器(Graphics Processing Unit,GPU)、神经网络处理器(Neural-network Processing Unit,简称NPU)、数字信号处理器(Digital Signal Processor,简称DSP)、专用集成电路(Application Specific Integrated Circuits,简称ASIC)、现场可编程门阵列(Field Programmable Gate Array,简称FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。并且,在处理器410为多个时,其中的一部分可以是通用处理器,另一部分可以是专用处理器。
存储器420包括一个或多个(图中仅示出一个),其可以是,但不限于,随机存取存储器(Random Access Memory,简称RAM),只读存储器(Read Only Memory,简称ROM),可编程只读存储器(Programmable Read-Only Memory,简称PROM),可擦除可编程只读存储器(Erasable Programmable Read-Only Memory,简称EPROM),电可擦除可编程只读存储器(Electric Erasable Programmable Read-Only Memory,简称EEPROM)等。
处理器410以及其他可能的组件可对存储器420进行访问,读和/或写其中的数据。特别地,在存储器420中可以存储一个或多个计算机程序指令,处理器410可以读取并运行这些计算机程序指令,以实现本申请实施例提供的图像拼接方法。
通信接口430包括一个或多个(图中仅示出一个),可以用于和其他设备进行直接或间接地通信,以便进行数据的交互。通信接口430可以包括进行有线和/或无线通信的接口。
可以理解,图8所示的结构仅为示意,电子设备400还可以包括比图8中所示更多或者更少的组件,或者具有与图8所示不同的配置。图8中所示的各组件可以采用硬件、软件或其组合实现。电子设备400可能是实体设备,例如PC机、笔记本电脑、平板电脑、手机、服务器、智能穿戴设备等,也可能是虚拟设备,例如虚拟机、虚拟化容器等。并且,电子设备400也不限于单台设备,也可以是多台设备的组合或者大量设备构成的集群。
本申请实施例还提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序指令,所述计算机程序指令被计算机的处理器读取并运行时,执行本申请实施例提供的图像拼接方法。例如,计算机可读存储介质可以实现为图8中电子设备400中的存储器420。
以上所述仅为本申请的实施例而已,并不用于限制本申请的保护范围,对于本领域的技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。
本申请提供了图像拼接方法及装置、存储介质及电子设备,该图像拼接方法可以包括:获取第一图 像和第二图像;根据所述第一图像和所述第二图像,计算得到第一中间光流和第二中间光流,所述第一中间光流为中间图像和所述第一图像之间的光流,所述第二中间光流为所述中间图像和所述第二图像之间的光流,所述中间图像为视角介于第一图像和第二图像之间的图像,所述第一中间光流的尺寸与所述第一图像的尺寸相同,所述第二中间光流的尺寸与所述第二图像的尺寸相同;根据所述第一中间光流、所述第二中间光流、所述第一图像和所述第二图像,计算得到所述第一图像和所述第二图像的拼接图像。上述图像拼接方法步骤简单,不需要通过复杂的迭代对多个单应性矩阵进行计算即可完成图像拼接,因此可以提高图像拼接的效率,让拼接过程变得更加实时,从而具有较高的实用性。
此外,可以理解的是,本申请的图像拼接方法、图像拼接装置是可以重现的,并且可以用在多种工业应用中。例如,本申请的图像拼接方法及装置、存储介质及电子设备可以用于图像处理领域。
Claims (19)
- 一种图像拼接方法,其特征在于,包括:获取第一图像和第二图像;根据所述第一图像和所述第二图像,计算得到第一中间光流和第二中间光流,所述第一中间光流为中间图像和所述第一图像之间的光流,所述第二中间光流为所述中间图像和所述第二图像之间的光流,所述中间图像为视角介于第一图像和第二图像之间的图像,所述第一中间光流的尺寸与所述第一图像的尺寸相同,所述第二中间光流的尺寸与所述第二图像的尺寸相同;根据所述第一中间光流、所述第二中间光流、所述第一图像和所述第二图像,计算得到所述第一图像和所述第二图像的拼接图像。
- 根据权利要求1所述的图像拼接方法,其特征在于,所述根据所述第一图像和所述第二图像,计算得到第一中间光流和第二中间光流,包括:分别截取所述第一图像和所述第二图像中包含共同画面的图像区域,得到第一截图和第二截图;将所述第一截图和所述第二截图输入光流计算网络,得到第一截图光流和第二截图光流,所述第一截图光流为所述中间图像和所述第一截图之间的光流,所述第二截图光流为所述中间图像和所述第二截图之间的光流;将所述第一截图光流上采样至所述第一图像的尺寸,得到所述第一中间光流,以及,将所述第二截图光流上采样至所述第二图像的尺寸,得到所述第二中间光流。
- 根据权利要求1或2所述的图像拼接方法,其特征在于,所述根据所述第一中间光流、所述第二中间光流、所述第一图像和所述第二图像,计算得到所述第一图像和所述第二图像的拼接图像,包括:根据所述第一中间光流和所述第一图像,映射得到第一中间图像,以及,根据所述第二中间光流和所述第二图像,映射得到第二中间图像;根据所述第一中间光流和所述第二中间光流,计算得到目标光流;根据所述目标光流和所述第一中间图像,映射得到第一拼接图像,以及,根据所述目标光流和所述第二中间图像,映射得到第二拼接图像;根据所述第一拼接图像和所述第二拼接图像,拼接得到所述拼接图像。
- 根据权利要求3所述的图像拼接方法,其特征在于,所述根据所述第一中间光流和所述第二中间光流,计算得到目标光流,包括:根据所述第一中间光流和所述第二中间光流,插值计算至少一个过渡光流;根据所述第一中间光流、所述至少一个过渡光流以及所述第二中间光流,融合得到所述目标光流。
- 根据权利要求4所述的图像拼接方法,其特征在于,所述根据所述第一中间光流和所述第二中间光流,插值计算至少一个过渡光流,包括:获取至少一个权重值;分别基于每个权重值,对所述第一中间光流和所述第二中间光流进行加权求和,得到所述至少一个过渡光流。
- 根据权利要求5所述的图像拼接方法,其特征在于,所述权重值的大小与该权重值对应的过渡光流的视角位置相关。
- 根据权利要求6所述的图像拼接方法,其特征在于,所述第一中间光流的加权系数和所述第二中间光流的加权系数之和为1,所述权重值为所述第一中间光流的加权系数,所述权重值的大小与该权重值对应的过渡光流的视角位置和所述第一中间光流的视角位置之间的接近程度正相关。
- 根据权利要求5至7中任一项所述的图像拼接方法,其特征在于,所述至少一个权重值在区间(0,1)内均匀分布。
- 根据权利要求3至8中任一项所述的方法,其特征在于,所述根据所述第一中间光流和所述第二中间光流,计算得到目标光流,包括:根据所述第一中间光流和所述第二中间光流,计算得到初步光流;将所述初步光流输入光流求逆网络,得到所述目标光流。
- 根据权利要求4至8中任一项所述的图像拼接方法,其特征在于,所述根据所述第一中间光流、所述至少一个过渡光流以及所述第二中间光流,融合得到所述目标光流,包括:获取N+2个权重矩阵,N为所述过渡光流的总数量;基于所述N+2个权重矩阵,对所述第一中间光流、N个过渡光流以及所述第二中间光流进行加权求和,得到初步光流;所述初步光流为所述目标光流,或者,将所述初步光流输入光流求逆网络,得到所述目标光流。
- 根据权利要求10所述的图像拼接方法,其特征在于,所述权重矩阵中元素的最大值位置与该权重矩阵对应的光流的视角位置相关。
- 根据权利要求3至11任一项所述的图像拼接方法,其特征在于,所述根据所述第一拼接图像和所述第二拼接图像,拼接得到所述拼接图像,包括:将所述第一拼接图像和所述第二拼接图像输入掩膜计算网络,得到拼接掩膜;基于所述拼接掩膜,将所述第一拼接图像和所述第二拼接图像进行拼接,得到所述拼接图像。
- 根据权利要求3至12中任一项所述的图像拼接方法,其特征在于,所述第一中间光流和第二中间光流利用光流计算网络输出的第一截图光流和第二截图光流计算得到,所述目标光流利用光流求逆网络计算得到,所述拼接图像利用掩膜计算网络输出的拼接掩膜计算得到;所述方法还包括:获取第一真实截图光流、第二真实截图光流、真实目标光流以及真实拼接图像;根据所述第一截图光流、所述第二截图光流、所述第一真实截图光流和所述第二真实截图光流计算光流预测损失;根据所述目标光流和所述真实目标光流计算光流求逆损失;根据所述拼接图像和所述真实拼接图像计算图像拼接损失;根据所述光流预测损失、所述光流求逆损失和所述图像拼接损失计算总损失,并根据所述总损失更新所述光流计算网络、所述光流求逆网络和所述掩膜计算网络的参数。
- 根据权利要求13所述图像拼接方法,其特征在于,所述目标光流根据所述第一中间光流、至少一个过渡光流以及所述第二中间光流融合得到,所述获取第一图像和第二图像,包括:根据所述中间图像和单应性矩阵,计算得到所述第一图像,以及,根据所述中间图像和所述单应性矩阵的逆矩阵,计算得到所述第二图像,所述中间图像为真实图像;所述获取第一真实截图光流、第二真实截图光流、真实目标光流以及真实拼接图像,包括:根据所述单应性矩阵计算第一真实中间光流,并根据所述第一真实中间光流计算所述第一真实截图光流;根据所述单应性矩阵的逆矩阵计算第二真实中间光流,并根据所述第二真实中间光流计算所述第二真实截图光流;根据所述单应性矩阵和所述单应性矩阵的逆矩阵,插值计算至少一个过渡矩阵,根据所述单应性矩阵、所述至少一个过渡矩阵以及所述单应性矩阵的逆矩阵,融合得到目标矩阵,并根据所述目标矩阵计算所述真实目标光流;根据所述中间图像和所述目标矩阵计算所述真实拼接图像。
- 一种图像拼接装置,其特征在于,包括:图像获取模块,所述图像获取模块配置成用于获取第一图像和第二图像;中间光流计算模块,所述中间光流计算模块配置成用于根据所述第一图像和所述第二图像,计算得到第一中间光流和第二中间光流,所述第一中间光流为中间图像和所述第一图像之间的光流,所述第二中间光流为所述中间图像和所述第二图像之间的光流,所述中间图像为视角介于第一图像和第二图像之间的图像,所述第一中间光流的尺寸与所述第一图像的尺寸相同,所述第二中间光流的尺寸与所述第二图像的尺寸相同;图像拼接模块,所述图像拼接模块配置成用于根据所述第一中间光流、所述第二中间光流、所述第一图像和所述第二图像,计算得到所述第一图像和所述第二图像的拼接图像。
- 根据权利要求15所述的图像拼接装置,其特征在于,所述图像拼接模块根据所述第一中间光流、所述第二中间光流、所述第一图像和所述第二图像,计算得到所述第一图像和所述第二图像的拼接图像,包括:根据所述第一中间光流和所述第一图像,映射得到第一中间图像,以及,根据所述第二中间光流和所述第二图像,映射得到第二中间图像;根据所述第一中间光流和所述第二中间光流,计算得到目标光流;根据所述目标光流和所述第一中间图像,映射得到第一拼接图像,以及,根据所述目标光流和所述第二中间图像,映射得到第二拼接图像;根据所述第一拼接图像和所述第二拼接图像,拼接得到所述 拼接图像。
- 根据权利要求16所述的图像拼接装置,其特征在于,所述图像拼接模块根据所述第一中间光流和所述第二中间光流,计算得到目标光流,包括:根据所述第一中间光流和所述第二中间光流,插值计算至少一个过渡光流;根据所述第一中间光流、所述至少一个过渡光流以及所述第二中间光流,融合得到所述目标光流。
- 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序指令,所述计算机程序指令被处理器读取并运行时,执行如权利要求1至14中任一项所述的方法。
- 一种电子设备,其特征在于,包括存储器以及处理器,所述存储器中存储有计算机程序指令,所述计算机程序指令被所述处理器读取并运行时,执行权利要求1至14中任一项所述的方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110597189.7A CN113469880A (zh) | 2021-05-28 | 2021-05-28 | 图像拼接方法及装置、存储介质及电子设备 |
CN202110597189.7 | 2021-05-28 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022247394A1 true WO2022247394A1 (zh) | 2022-12-01 |
Family
ID=77871814
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/080233 WO2022247394A1 (zh) | 2021-05-28 | 2022-03-10 | 图像拼接方法及装置、存储介质及电子设备 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113469880A (zh) |
WO (1) | WO2022247394A1 (zh) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113469880A (zh) * | 2021-05-28 | 2021-10-01 | 北京迈格威科技有限公司 | 图像拼接方法及装置、存储介质及电子设备 |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106803899A (zh) * | 2015-11-26 | 2017-06-06 | 华为技术有限公司 | 合并图像的方法和装置 |
CN107369129A (zh) * | 2017-06-26 | 2017-11-21 | 深圳岚锋创视网络科技有限公司 | 一种全景图像的拼接方法、装置及便携式终端 |
US20180137633A1 (en) * | 2016-11-14 | 2018-05-17 | Htc Corporation | Method, device, and non-transitory computer readable storage medium for image processing |
US20190068876A1 (en) * | 2017-08-29 | 2019-02-28 | Nokia Technologies Oy | Method Of Image Alignment For Stitching Using A Hybrid Strategy |
CN111696035A (zh) * | 2020-05-21 | 2020-09-22 | 电子科技大学 | 一种基于光流运动估计算法的多帧图像超分辨率重建方法 |
CN112104830A (zh) * | 2020-08-13 | 2020-12-18 | 北京迈格威科技有限公司 | 视频插帧方法、模型训练方法及对应装置 |
CN113469880A (zh) * | 2021-05-28 | 2021-10-01 | 北京迈格威科技有限公司 | 图像拼接方法及装置、存储介质及电子设备 |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106997579B (zh) * | 2016-01-26 | 2020-01-03 | 华为技术有限公司 | 图像拼接的方法和装置 |
CN107451952B (zh) * | 2017-08-04 | 2020-11-03 | 追光人动画设计(北京)有限公司 | 一种全景视频的拼接融合方法、设备以及系统 |
-
2021
- 2021-05-28 CN CN202110597189.7A patent/CN113469880A/zh active Pending
-
2022
- 2022-03-10 WO PCT/CN2022/080233 patent/WO2022247394A1/zh unknown
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106803899A (zh) * | 2015-11-26 | 2017-06-06 | 华为技术有限公司 | 合并图像的方法和装置 |
US20180137633A1 (en) * | 2016-11-14 | 2018-05-17 | Htc Corporation | Method, device, and non-transitory computer readable storage medium for image processing |
CN107369129A (zh) * | 2017-06-26 | 2017-11-21 | 深圳岚锋创视网络科技有限公司 | 一种全景图像的拼接方法、装置及便携式终端 |
US20190068876A1 (en) * | 2017-08-29 | 2019-02-28 | Nokia Technologies Oy | Method Of Image Alignment For Stitching Using A Hybrid Strategy |
CN111696035A (zh) * | 2020-05-21 | 2020-09-22 | 电子科技大学 | 一种基于光流运动估计算法的多帧图像超分辨率重建方法 |
CN112104830A (zh) * | 2020-08-13 | 2020-12-18 | 北京迈格威科技有限公司 | 视频插帧方法、模型训练方法及对应装置 |
CN113469880A (zh) * | 2021-05-28 | 2021-10-01 | 北京迈格威科技有限公司 | 图像拼接方法及装置、存储介质及电子设备 |
Also Published As
Publication number | Publication date |
---|---|
CN113469880A (zh) | 2021-10-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Nie et al. | Depth-aware multi-grid deep homography estimation with contextual correlation | |
Wu et al. | Revisiting light field rendering with deep anti-aliasing neural network | |
EP3218870B1 (en) | Parallax tolerant video stitching with spatial-temporal localized warping and seam finding | |
CN110827200A (zh) | 一种图像超分重建方法、图像超分重建装置及移动终端 | |
TW202117611A (zh) | 電腦視覺訓練系統及訓練電腦視覺系統的方法 | |
Joshi et al. | A learning-based method for image super-resolution from zoomed observations | |
WO2017091927A1 (zh) | 图像处理方法和双摄像头系统 | |
CN109934772B (zh) | 一种图像融合方法、装置及便携式终端 | |
US20150170405A1 (en) | High resolution free-view interpolation of planar structure | |
Su et al. | Super-resolution without dense flow | |
US9342873B1 (en) | Tile-based optical flow | |
CN113643414A (zh) | 一种三维图像生成方法、装置、电子设备及存储介质 | |
Dutta | Depth-aware blending of smoothed images for bokeh effect generation | |
Guan et al. | Srdgan: learning the noise prior for super resolution with dual generative adversarial networks | |
CN109767381A (zh) | 一种基于特征选择的形状优化的矩形全景图像构造方法 | |
JP2024526417A (ja) | マルチスペクトルカメラ向けの高速画像レジストレーション方法及び装置 | |
US9860441B1 (en) | Tile-based digital image correspondence | |
WO2022247394A1 (zh) | 图像拼接方法及装置、存储介质及电子设备 | |
CN115564639A (zh) | 背景虚化方法、装置、计算机设备和存储介质 | |
CN117173012A (zh) | 无监督的多视角图像生成方法、装置、设备及存储介质 | |
Lee et al. | Learning local implicit fourier representation for image warping | |
Won et al. | Learning depth from focus in the wild | |
Kang et al. | Facial depth and normal estimation using single dual-pixel camera | |
CN113935934A (zh) | 图像处理方法、装置、电子设备和计算机可读存储介质 | |
Zhang et al. | Pseudo-LiDAR point cloud magnification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22810128 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 02.04.2024) |