CN118071655A - Fisheye image correction and stitching method and system - Google Patents

Fisheye image correction and stitching method and system Download PDF

Info

Publication number
CN118071655A
CN118071655A CN202410464709.0A CN202410464709A CN118071655A CN 118071655 A CN118071655 A CN 118071655A CN 202410464709 A CN202410464709 A CN 202410464709A CN 118071655 A CN118071655 A CN 118071655A
Authority
CN
China
Prior art keywords
image
fisheye
distortion
mapping
blocks
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410464709.0A
Other languages
Chinese (zh)
Inventor
刘寒松
王国强
王永
刘瑞
李越
李贤超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sonli Holdings Group Co Ltd
Original Assignee
Sonli Holdings Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sonli Holdings Group Co Ltd filed Critical Sonli Holdings Group Co Ltd
Priority to CN202410464709.0A priority Critical patent/CN118071655A/en
Publication of CN118071655A publication Critical patent/CN118071655A/en
Pending legal-status Critical Current

Links

Landscapes

  • Image Processing (AREA)

Abstract

The invention belongs to the technical field of image processing, and relates to a fisheye image correction and splicing method and a fisheye image correction and splicing system, wherein fisheye images shot by a fisheye lens are collected first, and the fisheye images are divided into 2D image blocks; then, a self-supervision pre-training module is adopted to conduct self-supervision pre-training, fine granularity distortion characterization of the fisheye image is extracted, and a full-scale mapping flow is learned through a fisheye image correction module, so that a corrected fisheye image pair is obtained; and then, the corrected fisheye image pair is divided into a target image and a reference image, and finally, the mapped target image and the reference image are fed into a SIAMESEMAE network for mixing to generate a final panoramic stitching image, so that the panoramic stitching image can adapt to different fisheye image stitching scenes, including different factors such as illumination conditions and backgrounds, the universality of the method is improved, and the method is better represented in diversified application environments.

Description

Fisheye image correction and stitching method and system
Technical Field
The invention belongs to the technical field of image processing, and relates to a fisheye image correction and stitching method and system.
Background
The fish-eye lens is a wide-angle camera, and the lens design enables the fish-eye lens to capture a wider visual field area, so that the problem of blind areas in the traditional camera is solved, and the monitoring comprehensiveness is improved. Therefore, fewer cameras are needed during installation, and the installation and maintenance costs are reduced to a great extent. With the increasing of urban traffic pressure, vehicles are rapidly increased, and the use of fish-eye cameras at traffic intersections, highways, parking lots and other places can improve the efficiency of traffic supervision, help to reduce traffic accidents and improve traffic smoothness. Therefore, the wide field of view of the fish-eye lens makes the fish-eye lens an ideal choice in a monitoring system, provides a more flexible and comprehensive solution for monitoring in various fields, and provides reliable support for the safety and development of society.
Although fish-eye lenses have unique advantages in providing an extremely wide viewing angle, their special viewing angle is such that the image they capture exhibits the characteristics of a spherical projection, and such distortion not only affects the appearance of the image, but can also lead to misunderstanding of the size and shape of the object. Thus, fisheye image correction and stitching is a critical technique to overcome these potential vision problems. The distortion can be eliminated or minimized through fisheye image correction, so that the image is more in line with the actual scene, the image quality and accuracy are effectively improved, and meanwhile, the splicing technology allows a plurality of fisheye images to be seamlessly fused ingeniously, and a continuous and complete panoramic image is created. By stitching, a user can obtain a more comprehensive and complete view without being limited by the view angle of a single camera. This plays an important role in monitoring systems, virtual navigation, data acquisition under specific circumstances, and the like.
In summary, in the background of fisheye lens application, fisheye image correction and stitching are a concern. In order to improve the quality and accuracy of fish-eye images, researchers are struggling to explore and develop methods such as fish-eye image correction technology, fish-eye image stitching technology, deep learning algorithm and the like so as to provide more effective technical means for monitoring in various fields.
Disclosure of Invention
In order to solve the problem that the fisheye lens is distorted and deformed by images, the invention provides a fisheye image correction splicing method and a fisheye image correction splicing system, an effective method is introduced for fine-granularity distortion characterization in encoded fisheye images based on self-supervision representation learning to finely adjust a corrected fisheye image network, then a pixel torsion-based method is used for processing the problem of large parallax, and corrected fisheye images are spliced to obtain seamless and coherent panoramic images, so that a more comprehensive and complete field of view is obtained.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
In a first aspect, the invention provides a fisheye image correction stitching method, which comprises the following steps:
S1, constructing a fisheye image dataset: collecting a fisheye image shot by a fisheye lens, constructing a fisheye image data set, and dividing the fisheye image into 2D image blocks;
S2, self-supervision pre-training: self-supervision pre-training is carried out by adopting a self-supervision pre-training module, and fine granularity distortion characterization of the fisheye image is extracted;
s3, correcting the fisheye image: using the extracted fine-granularity distortion characterization, and learning a full-scale mapping flow through a fisheye image correction module to obtain corrected fisheye image pairs;
S4, mapping the pixel-level image: dividing the corrected fisheye image into a target image and a reference image, repositioning pixels of the target image onto a reference image plane by using a mapping vector, and directly estimating a two-dimensional mapping vector for an overlapping region in the target image; regularizing the loss of pixels in the non-overlapping area to obtain a mapped target image;
s5, splicing fish-eye images: the mapped target image is mixed with a reference image feed to SIAMESEMAE network to generate the final panoramic stitched image.
As a further technical scheme of the invention, the fisheye image in the step S1 isDividing a 2D image block of the fisheye image into/>Wherein H and W are image/>S is the size of the image block,/>Representing the number of 2D image blocks.
As a further technical scheme of the invention, the specific process of the step S2 is as follows:
s21, mapping the 2D image block draw to the D dimension by using linear projection to obtain an image block embedded
S22, designing a specific distortion position diagram for the fish-eye imageEach value in the distortion location map represents the degree of distortion of the corresponding image block, the value in the distortion location map being obtained from the image block radius;
S23, embedding the image block After random scrambling in the first dimension, embedding the position embedded in the scrambled image block to obtain a new image block embedding/>New image blocks are then embedded/>Input into vision transformer network to obtain abstract image block representation/>
S24, the distortion position diagram is subjected to the same way as the step S23Disturbing and remodelling to obtain distortion degree label/>Distortion degree label/>Corresponds to abstract tile representation/>
S25, taking image blocks with the same distortion degree as positive examples, taking image blocks with different distortion degrees as negative examples, and coding local unique distortion in different abstract image block representations by using contrast learning, wherein a loss function in the contrast learning process is defined as follows:
wherein, Representative/>I-th abstract image block representation,/>Is the positive index set of the ith abstract image block,/>An evaluation index function indicating whether or not to calculate, total contrast loss/>At/>Calculated on each abstract image block representation,/>
S26, performing pretraining end-to-end optimization on the self-supervision pretraining module by using the fisheye image dataset, wherein the training target isAnd extracting fine granularity distortion characterization of the fisheye image through a pre-trained self-supervision pre-training module.
As a further technical scheme of the invention, the specific process of the step S3 is as follows:
S31, embedding the fisheye image into the image block obtained in the same process as the step S21 Then directly inputting the abstract image block representation into vision transformer networks to obtain abstract image block representation/>
S32, representing the abstract image blockAdjusting the shape to/>The weights of the adjacent pixels are estimated for up-sampling, and the specific flow is as follows:
S321, generating a scale by using two convolution layers Mapping stream/>Wherein the first dimension represents the offset of the image in the x-direction, the second dimension represents the offset of the image in the y-direction, and 2 represents both the x-direction and the y-direction;
S322, predicting dimensionality as using two convolution layers And pair/>/>, Of each pixel of (1)The weights of the neighborhood perform a normalization (softmax) operation, the obtained dimension beingIs a diagram of (1);
s323, the dimension obtained by the steps is Is arranged and aligned to be the full scale map/>
S324, obtaining corrected image by bilinear sampling
Wherein,Representing corrected image/>Pixel coordinates,/>Representing fish eye imagesIn the pixel coordinates of the prediction.
As a further technical solution of the present invention, the training objective of the fisheye image correction module in step S3 is: wherein/> Is a predicted full-scale mapping flow,/>Is the actual mapping flow given,Representing the true effective foreground region of the rectified image,/>Representing the L1 distance between the predicted full-scale map and a given real map.
As a further technical scheme of the invention, the specific process of the step S4 is as follows:
S41, selecting a pair of corrected fish-eye images, and defining a target image as The reference image is/>First, a feature encoder/>, is usedImage/>, objectAnd reference image/>Dense feature maps, each mapped to a lower resolution, yield image features/>And/>Where d=256, applying one and/>Contextual networks of identical structure/>From the target image/>Extracting contextual features in a computer system
S42, for image characteristicsAnd/>Forming a visual correlation quantity/>, by taking the dot product between all pairs of feature vectorsThe specific calculation is as follows:
Wherein i, j, k, l, h means that i, j are image features The spatial position of the features in (1), k, l is/>The spatial position of the features in (c), h is the component index in the feature vector,Is the image feature/>The value of the h-th channel of the medium feature tensor at position (i, j)/>Is the image feature/>The value of the h-th channel of the medium feature tensor at position (k, l);
S43, slave object image Contextual characteristics/>In using the visual correlation quantity/>Indexing to generate related feature map m/>
S44, giving the current mapping estimationRelated feature map m/>Each pixel/>Mapping to/>Estimated position/>
S45, directly estimating a two-dimensional mapping vector for an overlapping area in the target image; regularizing the loss of pixels in non-overlapping regions, defining the overlapping regions asThe non-overlapping region is/>Given a true pixel level map/>The loss functions of the supervised predictive and real maps are defined as:
pixel level mapping using prediction Repositioning image/>To obtain a mapped target image/>
As a further aspect of the present invention, the feature encoder in step S41 is composed of 6 residual blocks, wherein 2 residual blocks areResolution, 2 residual blocks are/>Resolution, 2 residual blocks are/>Resolution.
As a further technical scheme of the invention, the panoramic stitching image obtained in the step S5 is thatWherein/>For SIAMESEMAE networks, the SIAMESEMAE network adopts a network based on a transducer architecture, two images are firstly converted into a series of non-overlapped image blocks, then the linear projection mapping is used for embedding the image blocks, position embedding information is added, a [ cls ] mark is added, and the network is encouraged to learn the correlation of the two images by establishing asymmetric masking, namely, unshielding/>And mask/>90% Of the image blocks in the image are then encoded using a weight-shared transducer encoder, the processed image blocks will/>The encoded image is embedded as key and value,/>The coded image is embedded as a query and fed into a transducer decoder to obtain a panoramic mosaic image/>
In a second aspect, the present invention provides a fisheye image correction stitching system, comprising:
The fisheye image data set construction module is used for collecting fisheye images shot by the fisheye lens and dividing the fisheye images into 2D image blocks;
the self-supervision pre-training module is used for learning fine-granularity distortion characteristic representation of the fisheye image;
the fisheye image correction module is used for correcting the fisheye image pair by using the fine-granularity distortion characteristic representation;
The pixel-level mapping module is used for dividing the fisheye image obtained by the fisheye image correction module into a target image and a reference image, and then repositioning each pixel in the target image to obtain a mapped target image;
and the fish-eye stitching image generation module is used for mixing the mapped target image with the reference image to generate a final panoramic stitching image.
In a third aspect, the invention provides an electronic device comprising a memory and a processor and computer instructions stored on the memory and running on the processor, which when executed by the processor, perform the method of the first aspect.
In a fourth aspect, the present invention provides a computer readable storage medium storing computer instructions which, when executed by a processor, perform the method of the first aspect.
Compared with the prior art, the invention has the beneficial effects that:
The invention provides a fish-eye image correction and splicing method and system based on SIAMESEMAE networks, which aim to solve the problem that a fish-eye lens is distorted and deformed by images, thereby improving the quality and accuracy of the fish-eye image, and the innovation is mainly characterized in three aspects: a self-supervision distortion perception method based on the distortion characteristics of fish-eye images, a pixel-level-based fish-eye image mapping method and a SIAMESEMAE network-based fish-eye splicing method have the following advantages:
(1) The self-supervision method does not need external calibration data, and is more flexible in coping with the distortion problems of different fisheye images based on the fisheye image distortion characteristics.
(2) The pixel-level fisheye image mapping method processes the fisheye image pair at the pixel level, is beneficial to maintaining the detail and accuracy of the image and reducing the image distortion.
(3) The fish-eye splicing method based on SIAMESEMAE networks can adapt to different fish-eye image splicing scenes, including different illumination conditions, backgrounds and other factors, and improves the universality of the method, so that the method can better perform in diversified application environments.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the present disclosure and do not constitute a limitation on the invention.
Fig. 1 is a schematic flow chart of a fisheye image correction and stitching method provided by the invention.
Fig. 2 is a diagram of a SIAMESEMAE network structure provided in the present invention.
Fig. 3 is a network structure block diagram of a fisheye image correction splicing system provided by the invention.
Detailed Description
The invention will be further described with reference to the drawings and examples.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention. As used herein, unless the context clearly indicates otherwise, the singular forms also are intended to include the plural forms, and furthermore, it is to be understood that the terms "comprises" and "comprising" and any variations thereof are intended to cover non-exclusive inclusions, such as, for example, processes, methods, systems, products or devices that comprise a series of steps or units, are not necessarily limited to those steps or units that are expressly listed, but may include other steps or units that are not expressly listed or inherent to such processes, methods, products or devices.
Embodiments of the invention and features of the embodiments may be combined with each other without conflict.
Example 1:
as shown in fig. 1, the embodiment provides a fisheye image correction and stitching method, which includes the following steps:
s1, constructing a fisheye image dataset: collecting fish-eye images shot by a fish-eye lens Constructing a fisheye image dataset and dividing the fisheye image into 2D image blocks/>Wherein H and W are image/>S is the size of the image block,/>Representing the number of 2D image blocks.
S2, self-supervision pre-training: processing a 2D image to obtain an image block, embedding the image block, dividing a fisheye image into a plurality of image blocks, designing a specific distortion position diagram by utilizing inherent distortion characteristics of the fisheye image to mark, embedding the image blocks into abstract representation, reducing the abstract representation range with the same distortion mode by using contrast learning, and expanding the characteristic distance with other abstract representation at the same time so as to distinguish different local distortion modes, and extracting fine-granularity distortion characterization, wherein the specific steps are as follows:
S21, flattening the 2D image block and mapping the 2D image block to the D dimension by using linear projection to obtain an embedded image block
S22, designing a specific distortion position diagram for the fish-eye imageEach value in the distortion location map represents the degree of distortion of the corresponding image block, the value in the distortion location map being obtained from the image block radius; for clarity, the following matrix provides one/>Size distortion location map example:
S23, after obtaining the image block embedding and distortion position diagram, embedding the image block to avoid simple mapping between the position embedding and the specific civic mode After random scrambling in the first dimension, embedding the position embedded in the scrambled image block to obtain a new image block embedding/>New image blocks are then embedded/>Input into vision transformer network to obtain abstract image block representation/>
S24, the distortion position diagram is subjected to the same way as the step S23Disturbing and remodelling to obtain distortion degree label/>Distortion degree label/>Corresponds to abstract tile representation/>
S25, observing the fisheye image can find that the distortion of the fisheye image is radially distributed, and the distortion degree increases along with the increase of the distance from the center of the image, so that image blocks with the same distortion degree are regarded as positive examples, image blocks with different distortion degrees are regarded as negative examples, the local unique distortion in different abstract image block representations is encoded by using contrast learning, and a loss function in the contrast learning process is defined as follows:
wherein, Representative/>I-th abstract image block representation,/>Is the positive index set of the ith abstract image block,/>An evaluation index function indicating whether or not to calculate, total contrast loss/>At the position ofCalculated on each abstract image block representation,/>
S26, performing pretraining end-to-end optimization on the self-supervision pretraining module by using the fisheye image dataset, wherein the training target isAnd extracting fine granularity distortion characterization of the fisheye image through a pre-trained self-supervision pre-training module.
S3, correcting the fisheye image: after self-supervision pre-training, the network can provide better fine-grained distortion characterization for the fisheye image, in the following fisheye image correction module, the pre-trained weight is used for initializing, namely the pre-trained weight is used for initializing the network, the network model benefits from the pre-trained model, the pre-trained weight is trained on a large data set, and therefore the network directly contains rich fine-grained distortion characterization information, the step can help the model to converge more quickly, the training time is reduced, and the network model can be better generalized to new data under the condition of data scarcity, and the method is specifically as follows:
S31, embedding the fisheye image into the image block obtained in the same process as the step S21 Then directly inputting the abstract image block representation into vision transformer networks to obtain abstract image block representation/>
S32, representing the abstract image blockAdjusting the shape to/>The weights of the adjacent pixels are estimated for up-sampling, and the specific flow is as follows:
s321, generating a scale by using two convolution layers Mapping stream/>Wherein the first dimension represents the offset of the image in the x-direction, the second dimension represents the offset of the image in the y-direction, and 2 represents both the x-direction and the y-direction;
S322, predicting dimensionality as using two convolution layers And pair/>/>, Of each pixel of (1)The weights of the neighborhood perform a normalization (softmax) operation, the obtained dimension beingIs a diagram of (1);
s323, the dimension obtained by the steps is Is arranged and aligned to be the full scale map/>
S324, obtaining corrected image by bilinear sampling
Wherein,Representing corrected image/>Pixel coordinates,/>Representing fish eye imagesIn step S3, the training target of the fisheye image correction module is: wherein/> Is a predicted full-scale mapping flow,/>Is the actual mapping flow given,Representing the true effective foreground region of the rectified image,/>Representing the L1 distance between the predicted full-scale map and a given real map.
S4, mapping at a pixel level: after the fisheye image is corrected, two fisheye images from different views of the same scene are corrected, the two images inevitably have overlapping areas, and because the two images are from the same scene, if the conventional affine matrix is used for image stitching, the quality of the stitching result is reduced due to the ghost effect, so that aiming at the characteristics of fisheye image stitching, we do not use a shared transformation function, but directly estimate the two-dimensional mapping vector of each pixel, and a deformed target image is obtained, specifically:
S41, selecting a pair of corrected fish-eye images, and defining a target image as The reference image is/>First, a feature encoder/>, is usedImage/>, objectAnd reference image/>Dense feature maps, each mapped to a lower resolution, yield image features/>And/>Where d=256, applying one and/>Contextual networks of identical structure/>From the target image/>Extracting contextual features in a computer system; The feature encoder consists of 6 residual blocks, 2 of which are/>Resolution, 2 residual blocks are/>Resolution, 2 residual blocks are/>Resolution ratio;
S42, for image characteristics And/>Forming a visual correlation quantity/>, by taking the dot product between all pairs of feature vectorsThe specific calculation is as follows:
Wherein i, j, k, l, h means that i, j are image features The spatial position of the features in (1), k, l is/>The spatial position of the features in (c), h is the component index in the feature vector,Is the image feature/>The value of the h-th channel of the medium feature tensor at position (i, j)/>Is the image feature/>The value of the h-th channel of the medium feature tensor at position (k, l);
S43, slave object image Contextual characteristics/>In using the visual correlation quantity/>Indexing to generate related feature map m/>
S44, giving the current mapping estimationEach pixel/>, in the relevant feature map mMapping to/>Estimated position/>
S45, directly estimating a two-dimensional mapping vector for an overlapping area in the target image; because of the lack of correspondence, mapping the pixel direction of the non-overlapping region is an ambiguous problem, regularizing the loss of the non-overlapping region to consider the discomfort of mapping the non-overlapping region, defining the overlapping region asThe non-overlapping region is/>Given a true pixel level map/>The loss functions of the supervised predictive and real maps are defined as:
pixel level mapping using prediction Repositioning image/>To obtain a mapped target image/>
S5, generating a fisheye spliced image: target image for remappingAnd reference image/>Using SIAMESEMAE network (denoted/>) Generating a splice result/>The following is indicated: /(I)Specifically, SIAMESEMAE network is based on a transducer architecture, as shown in fig. 2, after converting two images into a series of non-overlapping image blocks, linear projection mapping is used to embed the image blocks, then position embedding information is added, and a [ cls ] mark is added, so that the network is encouraged to learn the correlation of the two images by establishing asymmetric masking, namely, unmasking/>And mask/>90% Of the image blocks in the image are then encoded using a weight-shared transducer encoder, the processed image blocks will/>The encoded image is embedded as key and value,/>The coded image is embedded as a query and fed into a transducer decoder to obtain a panoramic mosaic image/>
Example 2:
the embodiment provides a fisheye image correction stitching system, as shown in fig. 3, including:
The fisheye image data set construction module is used for collecting fisheye images shot by the fisheye lens and dividing the fisheye images into 2D image blocks;
the self-supervision pre-training module is used for learning fine-granularity distortion characteristic representation of the fisheye image;
the fisheye image correction module is used for correcting the fisheye image pair by using the fine-granularity distortion characteristic representation;
and the pixel-level mapping module is used for dividing the fisheye image obtained by the fisheye image correction module into a target image and a reference image, and then repositioning each pixel in the target image to obtain a mapped target image.
And the fish-eye stitching image generation module is used for mixing the mapped target image with the reference image to generate a final panoramic stitching image.
Here, it should be noted that the above-mentioned modules correspond to steps S1 to S5 in embodiment 1, and the above-mentioned modules are the same as examples and application scenarios implemented by the corresponding steps, but are not limited to those disclosed in embodiment 1. It should be noted that the modules described above may be implemented as part of a system in a computer system, such as a set of computer-executable instructions.
In further embodiments, there is also provided:
An electronic device comprising a memory and a processor and computer instructions stored on the memory and running on the processor, which when executed by the processor, perform the method described in embodiment 1. For brevity, the description is omitted here.
It should be understood that in this embodiment, the processor may be a central processing unit CPU, and the processor may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate array FPGA or other programmable logic device, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may include read only memory and random access memory and provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store information of the device type.
A computer readable storage medium storing computer instructions which, when executed by a processor, perform the method described in embodiment 1.
The method in embodiment 1 may be directly embodied as a hardware processor executing or executed with a combination of hardware and software modules in the processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method. To avoid repetition, a detailed description is not provided herein.
Those of ordinary skill in the art will appreciate that the elements of the various examples described in connection with the present embodiments, i.e., the algorithm steps, can be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
While the foregoing description of the embodiments of the present invention has been presented in conjunction with the drawings, it should be understood that it is not intended to limit the scope of the invention, but rather, it is intended to cover all modifications or variations within the scope of the invention as defined by the claims of the present invention.

Claims (10)

1. The fisheye image correction and splicing method is characterized by comprising the following steps of:
S1, constructing a fisheye image dataset: collecting a fisheye image shot by a fisheye lens, constructing a fisheye image data set, and dividing the fisheye image into 2D image blocks;
S2, self-supervision pre-training: self-supervision pre-training is carried out by adopting a self-supervision pre-training module, and fine granularity distortion characterization of the fisheye image is extracted;
s3, correcting the fisheye image: using the extracted fine-granularity distortion characterization, and learning a full-scale mapping flow through a fisheye image correction module to obtain corrected fisheye image pairs;
S4, mapping the pixel-level image: dividing the corrected fisheye image into a target image and a reference image, repositioning pixels of the target image onto a reference image plane by using a mapping vector, and directly estimating a two-dimensional mapping vector for an overlapping region in the target image; regularizing the loss of pixels in the non-overlapping area to obtain a mapped target image;
s5, splicing fish-eye images: the mapped target image is mixed with a reference image feed to SIAMESEMAE network to generate the final panoramic stitched image.
2. The method for correcting and stitching fish-eye images according to claim 1, wherein the fish-eye images in step S1 areDividing a 2D image block of the fisheye image into/>Wherein H and W are image/>S is the size of the image block,/>Representing the number of 2D image blocks.
3. The fisheye image correction and stitching method according to claim 2, wherein the specific process of step S2 is as follows:
s21, mapping the 2D image block draw to the D dimension by using linear projection to obtain an image block embedded
S22, designing a specific distortion position diagram for the fish-eye imageEach value in the distortion location map represents the degree of distortion of the corresponding image block, the value in the distortion location map being obtained from the image block radius;
S23, embedding the image block After random scrambling in the first dimension, embedding the position embedded in the scrambled image block to obtain a new image block embedding/>New image blocks are then embedded/>Input into vision transformer network to obtain abstract image block representation/>
S24, the distortion position diagram is subjected to the same way as the step S23Disturbing and remodelling to obtain distortion degree marksDistortion degree label/>Corresponds to abstract tile representation/>
S25, taking image blocks with the same distortion degree as positive examples, taking image blocks with different distortion degrees as negative examples, and coding local unique distortion in different abstract image block representations by using contrast learning, wherein a loss function in the contrast learning process is defined as follows:
wherein, Representative/>I-th abstract image block representation,/>Is the positive index set of the ith abstract image block,/>An evaluation index function indicating whether or not to calculate, total contrast loss/>At/>Calculated on each abstract image block representation,/>
S26, performing pretraining end-to-end optimization on the self-supervision pretraining module by using the fisheye image dataset, wherein the training target isAnd extracting fine granularity distortion characterization of the fisheye image through a pre-trained self-supervision pre-training module.
4. The fisheye image correction and stitching method according to claim 3, wherein the specific process of step S3 is as follows:
S31, embedding the fisheye image into the image block obtained in the same process as the step S21 Then directly inputting the abstract image block representation into vision transformer networks to obtain abstract image block representation/>
S32, representing the abstract image blockAdjusting the shape to/>The weights of the adjacent pixels are estimated for up-sampling, and the specific flow is as follows:
S321, generating a scale by using two convolution layers Mapping stream/>Wherein the first dimension represents the offset of the image in the x-direction, the second dimension represents the offset of the image in the y-direction, and 2 represents both the x-direction and the y-direction;
S322, predicting dimensionality as using two convolution layers And pair/>/>, Of each pixel of (1)The weight of the neighborhood performs normalization operation, and the obtained dimension is/>Is a diagram of (1);
s323, the dimension obtained by the steps is Is arranged and aligned to be the full scale map/>
S324, obtaining corrected image by bilinear sampling
Wherein,Representing corrected image/>Pixel coordinates,/>Representing fish eye image/>In the pixel coordinates of the prediction.
5. The fisheye image correction stitching method according to claim 4, wherein the training objective of the fisheye image correction module in step S3 is: wherein/> Is a predicted full-scale mapping flow,/>Is given as true mapping flow,/>Representing the true effective foreground region of the rectified image,/>Representing the L1 distance between the predicted full-scale map and a given real map.
6. The fisheye image correction and stitching method according to claim 5, wherein the specific process of step S4 is:
S41, selecting a pair of corrected fish-eye images, and defining a target image as The reference image is/>First, a feature encoder/>, is usedImage/>, objectAnd reference image/>Dense feature maps, each mapped to a lower resolution, yield image features/>And/>Where d=256, applying one and/>Contextual networks of identical structure/>From the target image/>Extracting contextual features in a computer system; The feature encoder consists of 6 residual blocks, 2 of which are/>Resolution, 2 residual blocks are/>Resolution, 2 residual blocks are/>Resolution ratio;
S42, for image characteristics And/>Forming a visual correlation quantity/>, by taking the dot product between all pairs of feature vectorsThe specific calculation is as follows:
Wherein i, j, k, l, h means that i, j are image features The spatial position of the features in (1), k, l is/>The spatial position of the features in (1), h is the component index in the feature vector,/>Is the image feature/>The value of the h-th channel of the medium feature tensor at position (i, j),Is the image feature/>The value of the h-th channel of the medium feature tensor at position (k, l);
S43, slave object image Contextual characteristics/>In using the visual correlation quantity/>Indexing to generate related feature map m/>
S44, giving the current mapping estimationRelated feature map m/>Each pixel of (3)Mapping to/>Estimated position/>
S45, directly estimating a two-dimensional mapping vector for an overlapping area in the target image; regularizing the loss of pixels in non-overlapping regions, defining the overlapping regions asThe non-overlapping region is/>Given a true pixel level map/>The loss functions of the supervised predictive and real maps are defined as:
pixel level mapping using prediction Repositioning image/>To obtain a mapped target image
7. The fisheye image correction and stitching method according to claim 6, wherein the panoramic stitching image obtained in step S5 isWherein/>For SIAMESEMAE networks, the SIAMESEMAE network adopts a network based on a transducer architecture, two images are firstly converted into a series of non-overlapped image blocks, then the linear projection mapping is used for embedding the image blocks, position embedding information is added, a [ cls ] mark is added, and the network is encouraged to learn the correlation of the two images by establishing asymmetric masking, namely, unshielding/>And mask/>90% Of the image blocks in the image are then encoded using a weight-shared transducer encoder, the processed image blocks will/>The encoded image is embedded as key and value,/>The coded image is embedded as a query and fed into a transducer decoder to obtain a panoramic mosaic image/>
8. A fisheye image correction stitching system capable of performing the method of any one of claims 1-7, comprising:
The fisheye image data set construction module is used for collecting fisheye images shot by the fisheye lens and dividing the fisheye images into 2D image blocks;
the self-supervision pre-training module is used for learning fine-granularity distortion characteristic representation of the fisheye image;
the fisheye image correction module is used for correcting the fisheye image pair by using the fine-granularity distortion characteristic representation;
The pixel-level mapping module is used for dividing the fisheye image obtained by the fisheye image correction module into a target image and a reference image, and then repositioning each pixel in the target image to obtain a mapped target image;
and the fish-eye stitching image generation module is used for mixing the mapped target image with the reference image to generate a final panoramic stitching image.
9. An electronic device comprising a memory and a processor and computer instructions stored on the memory and running on the processor, which when executed by the processor, perform the method of any one of claims 1-7.
10. A computer readable storage medium storing computer instructions which, when executed by a processor, perform the method of any of claims 1-7.
CN202410464709.0A 2024-04-18 2024-04-18 Fisheye image correction and stitching method and system Pending CN118071655A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410464709.0A CN118071655A (en) 2024-04-18 2024-04-18 Fisheye image correction and stitching method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410464709.0A CN118071655A (en) 2024-04-18 2024-04-18 Fisheye image correction and stitching method and system

Publications (1)

Publication Number Publication Date
CN118071655A true CN118071655A (en) 2024-05-24

Family

ID=91097534

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410464709.0A Pending CN118071655A (en) 2024-04-18 2024-04-18 Fisheye image correction and stitching method and system

Country Status (1)

Country Link
CN (1) CN118071655A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106683045A (en) * 2016-09-28 2017-05-17 深圳市优象计算技术有限公司 Binocular camera-based panoramic image splicing method
CN110099220A (en) * 2019-06-17 2019-08-06 广东中星微电子有限公司 A kind of panorama mosaic method and device
US20190325580A1 (en) * 2019-06-26 2019-10-24 Intel Corporation Surround camera system with seamless stitching for arbitrary viewpoint selection

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106683045A (en) * 2016-09-28 2017-05-17 深圳市优象计算技术有限公司 Binocular camera-based panoramic image splicing method
CN110099220A (en) * 2019-06-17 2019-08-06 广东中星微电子有限公司 A kind of panorama mosaic method and device
US20190325580A1 (en) * 2019-06-26 2019-10-24 Intel Corporation Surround camera system with seamless stitching for arbitrary viewpoint selection

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HAO FENG ET AL: "SimFIR: A Simple Framework for Fisheye Image Rectification with Self-supervised Representation Learning", 《2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV)》, 17 August 2023 (2023-08-17) *
林亚: "双鱼眼摄像头视频拼接方法研究", 《中国优秀硕士学位论文全文数据库信息科技辑》, 15 June 2020 (2020-06-15) *

Similar Documents

Publication Publication Date Title
CN111325797B (en) Pose estimation method based on self-supervision learning
Wang et al. SFNet-N: An improved SFNet algorithm for semantic segmentation of low-light autonomous driving road scenes
CN110782490B (en) Video depth map estimation method and device with space-time consistency
CN111325794A (en) Visual simultaneous localization and map construction method based on depth convolution self-encoder
CN105046649A (en) Panorama stitching method for removing moving object in moving video
CN113077505B (en) Monocular depth estimation network optimization method based on contrast learning
CN111354030A (en) Method for generating unsupervised monocular image depth map embedded into SENET unit
CN115376024A (en) Semantic segmentation method for power accessory of power transmission line
CN113436130A (en) Intelligent sensing system and device for unstructured light field
CN113436254B (en) Cascade decoupling pose estimation method
Liu et al. Deep learning enables parallel camera with enhanced-resolution and computational zoom imaging
CN112561979B (en) Self-supervision monocular depth estimation method based on deep learning
CN118071655A (en) Fisheye image correction and stitching method and system
CN110738696B (en) Driving blind area perspective video generation method and driving blind area view perspective system
Nie et al. Context and detail interaction network for stereo rain streak and raindrop removal
CN110766732A (en) Robust single-camera depth map estimation method
CN115330935A (en) Three-dimensional reconstruction method and system based on deep learning
CN116258740A (en) Vehicle-mounted forward-looking multi-target tracking method based on multi-camera pixel fusion
CN108876755A (en) A kind of construction method of the color background of improved gray level image
CN115272450A (en) Target positioning method based on panoramic segmentation
Zhang et al. 3D Object Detection Based on Multi-view Adaptive Fusion
Li et al. Hybrid Feature based Pyramid Network for Nighttime Semantic Segmentation.
Zhang et al. SkipcrossNets: Adaptive Skip-cross Fusion for Road Detection
CN118096534B (en) Infrared image super-resolution reconstruction method based on complementary reference
Pittner et al. LaneCPP: Continuous 3D Lane Detection using Physical Priors

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination