CN116402687A

CN116402687A - Image stitching method and device, electronic equipment and storage medium

Info

Publication number: CN116402687A
Application number: CN202310382389.XA
Authority: CN
Inventors: 晋忠孝; 陈赢峰; 范长杰; 胡志鹏
Original assignee: Netease Hangzhou Network Co Ltd
Current assignee: Netease Hangzhou Network Co Ltd
Priority date: 2023-04-11
Filing date: 2023-04-11
Publication date: 2023-07-07

Abstract

The invention discloses an image stitching method, an image stitching device, electronic equipment and a storage medium, which are applied to the field of computer vision. The method comprises the following steps: and acquiring a plurality of images shot by a plurality of cameras aiming at the target scene, wherein shooting visual angles corresponding to the plurality of images are different, and shooting overlapping areas exist between adjacent images in the plurality of images. And sequentially inputting the adjacent images into a pixel matching model, and obtaining pixel matching points between the adjacent images through the pixel matching model. And determining pose calibration results between adjacent cameras according to positions of the pixel matching points on the adjacent images, wherein the adjacent images correspond to the adjacent cameras. And sequentially splicing the images according to the pose calibration result to obtain the panoramic image corresponding to the target scene. By the method, panoramic images which can fully reflect scene information can be provided under the condition that no specific calibration place exists, and the receiving efficiency of operators on the environment information is improved.

Description

Image stitching method and device, electronic equipment and storage medium

Technical Field

The application relates to the field of computer vision, in particular to an image stitching method, an image stitching device, electronic equipment and a storage medium.

Background

Because engineering vehicles such as an excavator and a crane have large size and complex working environment, operators need to know the change condition of the surrounding environment of the vehicle in the construction process in real time in order to ensure safe driving. Therefore, how to provide the operator with information of the surroundings of the vehicle in real time becomes critical.

In the prior art, the common technical means is to shoot images in real time through a camera to conduct window display, and display surrounding environment information to operators in real time. Or the vehicle is monitored in a surrounding manner through a specific calibration place.

However, in the above prior art, the method for displaying windows requires an operator to receive and integrate the information of a plurality of windows at the same time, so that the information receiving efficiency is low. The method for realizing the looking-around monitoring of the vehicle through the specific calibration places and the calibration objects is not suitable for severe working environments such as chemical plants, mining sites and the like because of the dependence on the specific calibration places. Therefore, how to fully provide the environment information of the periphery of the vehicle body for operators in real time without depending on specific calibration places becomes a new requirement.

Disclosure of Invention

The application provides an image stitching method, an image stitching device, electronic equipment and a storage medium, wherein images of surrounding environments of an engineering vehicle are collected through a plurality of cameras, adjacent images in the images are input into a pixel matching model to obtain pixel matching points, pose calibration results of the cameras are obtained according to the pixel matching points, and the images are stitched according to the pose calibration results to obtain panoramic images capable of reflecting all surrounding environment information of the engineering vehicle.

An embodiment of the present application provides an image stitching method, where the method includes:

acquiring a plurality of images shot by a plurality of cameras aiming at a target scene; wherein, the shooting visual angles corresponding to the plurality of images are different; a shooting overlapping area exists between adjacent images in the plurality of images;

sequentially inputting adjacent images into a pixel matching model, and obtaining pixel matching points between the adjacent images through the pixel matching model;

determining pose calibration results between adjacent cameras according to positions of pixel matching points on adjacent images; the adjacent image corresponds to an adjacent camera;

and sequentially splicing the images according to the pose calibration result to obtain the panoramic image corresponding to the target scene.

A second aspect of the embodiments of the present application provides an image stitching apparatus, including:

an acquisition unit configured to acquire a plurality of images for a target scene captured by a plurality of cameras; wherein, the shooting visual angles corresponding to the plurality of images are different; a shooting overlapping area exists between adjacent images in the plurality of images;

the matching unit is used for sequentially inputting the adjacent images into the pixel matching model, and obtaining pixel matching points between the adjacent images through the pixel matching model;

The determining unit is used for determining pose calibration results between adjacent cameras according to the positions of the pixel matching points on the adjacent images; the adjacent image corresponds to an adjacent camera;

and the splicing unit is used for splicing the plurality of images in sequence according to the pose calibration result to obtain a panoramic image corresponding to the target scene.

A third aspect of the embodiments of the present application further provides an electronic device, including: the system comprises a memory and a processor, wherein the memory and the processor are coupled. Wherein the memory is used to store one or more computer instructions; the processor is configured to execute one or more computer instructions to implement the methods described above.

And a computer-readable storage medium having stored thereon one or more computer instructions executable by a processor to implement the above-described method.

In the method provided by the embodiment of the application, an image stitching method is provided for providing panoramic images for engineering vehicles. The method comprises the steps of shooting a plurality of images of a target scene under different visual angles through a plurality of cameras, wherein every two adjacent images have shooting overlapping areas. And then inputting the adjacent images into a pixel matching model to obtain pixel matching points. And obtaining a pose calibration result of the camera corresponding to the adjacent image according to the pixel matching points. And splicing the images according to pose calibration results of all cameras to obtain panoramic images corresponding to the target scene.

By the image stitching method, dependence on specific calibration places can be eliminated, the pose calibration result of the camera can be obtained according to actual scene images, and the panoramic images can be stitched according to the pose calibration result. In addition, even if the construction environment of the engineering vehicle is bad and the image background is single, the method can obtain accurate pixel matching points through the pixel matching model, so that the splicing effect of the panoramic image is improved, and the panoramic image is accurate and full in environmental information transmission.

Drawings

Fig. 1 is a schematic flow chart of an image stitching method provided in an embodiment of the present application;

FIG. 2 is a schematic flow chart of a method for multi-band fusion projection of an overlapping region image according to an embodiment of the present application;

FIG. 3 is a flowchart of a method for model training according to pixel matching points according to an embodiment of the present application;

FIG. 4 is a schematic flow chart of a method for training a model according to pose calibration results according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an image stitching device according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to enable those skilled in the art to better understand the technical solutions of the present application, the present application is clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. This application is intended to be limited to the details shown and described, and it is intended that the invention not be limited to the particular embodiment disclosed, but that the application will include all embodiments falling within the scope of the appended claims.

It should be noted that the terms "first," "second," "third," and the like in the claims, specification, and drawings herein are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. The data so used may be interchanged where appropriate to facilitate the embodiments of the present application described herein, and may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and their variants are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Engineering vehicles such as excavators and cranes are often applied to scenes with complex environments such as chemical plants, mine sites and construction sites, and in the working environments, if operators cannot receive environmental information around the working sites in time, safety accidents are easily caused. In this regard, a dedicated supervisory person or dedicated supervisory equipment is generally required for the work of the engineering vehicle to timely feed back the operating environment of the construction site to the operator.

In the existing supervision method adopted by the supervision equipment, one is to shoot images of the surrounding environment of the vehicle in real time through a camera, and display a plurality of images shot in real time to an operator in a split screen or paging mode. And the other is to establish a special calibration place aiming at the work place, calibrate the external parameters of the camera through the calibration objects of the calibration place, and optimize each image according to the external parameters so as to monitor the scene.

However, in the method of separately displaying the photographed images, an operator needs to view a plurality of images while operating the machine, and the operator integrates the information around the vehicle by himself, so that the receiving efficiency of the information by the operator is too low. In the method for monitoring the scene through the calibration place, the method is not applicable because the difficulty in establishing the calibration place is high and the economic benefit is low for the working environments with complex environments and severe conditions such as chemical plants, mining sites and the like depending on the specific calibration place.

Corresponding to the defects of the prior art, the embodiment of the application provides an image stitching method, which does not depend on a specific calibration place, performs pixel matching on adjacent images through the computing capability of a pixel matching model, obtains pose calibration results of each camera according to pixel matching points, and then stitches a plurality of images according to the pose calibration results to obtain a panoramic image with all environment information, so that operators can efficiently receive the environment information of the periphery of a vehicle.

The following describes an image stitching method provided in this embodiment with reference to fig. 1, and fig. 1 is a schematic flow chart of the image stitching method provided in this embodiment of the present application. It should be noted that the steps illustrated in the flowchart may be performed in a computer system, such as a set of computer-executable instructions, and in some cases, the steps illustrated may be performed in a different logical order than illustrated in the flowchart.

As shown in fig. 1, the method includes steps S101-S104, specifically as follows:

s101, acquiring a plurality of images of a target scene shot by a plurality of cameras.

First, a plurality of images for a target scene are acquired by a plurality of cameras. The shooting visual angles corresponding to the images are different, and shooting overlapping areas exist between adjacent images in the images.

Specifically, a target scene is shot simultaneously by a plurality of cameras, so that a plurality of images corresponding to the plurality of cameras at a target moment are acquired. The plurality of images may include all or only a part of the environmental information of the target scene at the shooting time. For example, there are four cameras taking images of the excavator in four directions, front, rear, left, and right, where the plurality of images includes all of the environmental information around the vehicle body at the target time. Only three cameras shoot images in three directions of the rear, left and right of the excavator, and a plurality of images at the moment only comprise partial environment information around the vehicle body at the target moment.

The environmental information includes all image information of the target scene at the target time, such as an image of a fixed object, such as a tree, a stone, or an image of a movable object, such as a person, a vehicle, etc., and is not limited herein, based on the photographing result in the actual implementation process.

The cameras are used for acquiring images aiming at a target scene, and can be installed on the engineering vehicle, and the cameras move along with the movement of the engineering vehicle. Or can also be installed in a fixed position in the field, at this time, the camera does not move along with the movement of the engineering vehicle, and the specific installation mode can be set differently according to different requirements, and is not limited herein.

In addition, when the camera acquires an image, the image can be shot according to a fixed frequency, and then a plurality of images corresponding to each shooting time can be acquired. Alternatively, the video may be recorded by a plurality of cameras, and the images of the cameras at the same time may be extracted from the video. For example, 4 cameras are provided, images in the front, back, left and right directions of the excavator can be respectively shot, and when the images are acquired, all the cameras are made to shoot images every 1 second, so that all the images corresponding to each second are obtained. Or each camera records a video with the length of 5 seconds, and after storing, the image frames of the video corresponding to each camera at the same moment are extracted and used as a plurality of images at the same moment. The specific image acquisition mode can be changed differently according to different requirements of application scenes, and is not limited herein.

In addition, a plurality of images shot by a plurality of cameras can be optimized, so that the plurality of images can be more convenient for matching of pixel points, and the optimized images are used as a plurality of acquired images. For example, when the image is optimized, distortion correction, image contrast adjustment, exposure processing, and the like may be performed on the initial images captured by the plurality of cameras, which is not limited herein.

When a plurality of images are obtained, a pixel matching model is input according to adjacent images in the plurality of images, and pixel matching is carried out, and the method specifically comprises the following steps.

S102, sequentially inputting adjacent images into a pixel matching model, and obtaining pixel matching points between the adjacent images through the pixel matching model.

After a plurality of images of a target scene shot by a plurality of cameras are acquired, extracting adjacent images from the plurality of images, and inputting the adjacent images into a pixel matching model for pixel matching.

The identification of the adjacent images can be performed through the camera identifications corresponding to the images. Specifically, when the camera shoots an image, a camera identification corresponding to the shot image is given, and then the adjacent images are determined by the adjacent relation of the camera identifications. For example, the acquired multiple images for the target scene include A, B, C, D images, the camera identifications corresponding to the respective images are identified, and the images A, B, C, D are determined to be the images of the front, rear, left and right cameras, so as to obtain four groups of adjacent images, respectively AB, BC, CD and DA.

After the neighboring images are acquired, the neighboring images are input into a pixel matching model. When the method is used for inputting, the adjacent images can be sequentially input into the pixel matching model, so that the pixel matching model processes the adjacent images one by one. The adjacent images can also be input into the pixel matching model at the same time, and whether the images are processed one by one or processed together is determined by the pixel matching model. That is, the adjacent images may be input to the pixel matching model to perform single-process pixel matching, or may be simultaneously subjected to multi-process pixel matching by the pixel matching model, which is not limited herein.

And obtaining pixel matching points between adjacent images through a pixel matching model. It is to be understood that the obtained pixel matching points are present in pairs, and the obtained pixel matching points include positional information on adjacent images.

Illustratively, the adjacent images include an image a and an image B, and the pixel matching points a and B are obtained by the pixel matching model, which indicates that the point a in the image a and the point B in the image B are the same point in the actual scene. And when the pixel matching points a and B are acquired, the position coordinate information of the point a in the image A and the position coordinate information of the point B in the image B are acquired at the same time.

In addition, if the image is preprocessed, the input adjacent image is the preprocessed adjacent image, and the preprocessing method for correcting distortion is as follows: the distortion preprocessing is firstly carried out on a plurality of images respectively. Sequentially inputting adjacent images into the pixel matching model comprises: and inputting the preprocessed adjacent images into a pixel matching model.

After the pixel matching points are obtained, camera calibration is performed according to the position information of the pixel matching points.

S103, determining pose calibration results between adjacent cameras according to positions of pixel matching points on adjacent images.

Because the adjacent images correspond to the adjacent cameras, the homography matrix calculation can be carried out through the position coordinates of the pixel matching points on the two images according to the positions of the pixel matching points on the adjacent images, so that the pose parameters of the adjacent cameras corresponding to the adjacent images are obtained, and the pose calibration result between the adjacent cameras is obtained.

In addition, since two or more adjacent images may exist in each image when the adjacent images are extracted, in this case, after the pose calibration result of one camera is determined, repeated determination of the pose calibration result is not performed. Illustratively, the pose calibration result of the camera corresponding to the image B can be determined according to the adjacent image combination AB and the adjacent image combination BC. When the camera calibration method is used for calculating, the calibration result can be judged, if the camera calibration is carried out according to the image group AB, the pose calibration result of the camera corresponding to the image B is not repeatedly determined when the camera calibration is carried out according to the image group BC.

Alternatively, a combination of adjacent images that is not subjected to repeated calibration may be selected among the combinations of all the adjacent images. For example, for the adjacent image combination AB, BC, CD, DA, only the pixel matching points corresponding to AB and CD may be selected to determine the pose calibration result, or only the pixel matching points corresponding to BC and DA may be selected to determine the pose calibration result.

Or, camera parameter calibration can be performed based on pixel matching points corresponding to all adjacent image combinations, and comprehensive consideration is performed on the camera pose calibration results obtained repeatedly. For example, the pose calibration result of the camera corresponding to the image B can be obtained according to the image group AB and the image group BC. When determining, firstly determining pose calibration results according to the image group AB to obtain pose calibration results B1 of the camera corresponding to the image B, determining pose calibration results B2 of the camera corresponding to the image B when calibrating the camera according to the image group BC, and carrying out mean value processing on the B1 and the B2 to obtain B3 as camera parameters of the camera corresponding to the image B.

The specific method of determining the camera parameters may also be other, depending on the requirements of the implementation process, and is not limited herein. After the pose calibration results of all cameras are determined, the next step is carried out.

And S104, sequentially splicing the images according to the pose calibration result to obtain a panoramic image corresponding to the target scene.

And after the pose calibration results of all the cameras are obtained in the last step, splicing a plurality of images according to the position calibration results corresponding to all the cameras to obtain panoramic images corresponding to the target scene.

The stitching process is specifically to project a plurality of images onto the same imaging plane according to the pose calibration result, and obtain an initial projection image. The initial projection image has a projection overlapping region, and the projection overlapping region corresponds to a shooting overlapping region between adjacent images. And carrying out fusion processing on the projection overlapping area in the initial projection image.

That is, according to the pose calibration result corresponding to each camera, a plurality of images corresponding to each camera are projected onto the same projection surface. Because the pose calibration result is obtained according to the pixel matching points, a projection coincidence area corresponding to the shooting coincidence area between adjacent images exists in the projection image after the pose calibration result, and two points corresponding to the pixel matching points are coincident. And then fusion processing is carried out on the overlapped area so as to enable the transition of the image of the overlapped area to be smoother and enable the transmission effect of the environment information to be better.

In addition, when the same imaging plane is selected, different imaging planes can be selected, so that the panorama obtained after the splicing of a plurality of images can achieve better visual effect, and the information transmission efficiency is improved. For example, spherical projection may be employed such that an aerial image overlooked from the top down is presented after the image projection to enhance the spatial perception of the panoramic image. Alternatively, other imaging planes such as curved, single plane, etc. may be used to achieve different effects, without limitation.

In addition, the determined pose calibration result can be further optimized to obtain a more accurate pose calibration result. For example, a beam adjustment method (Bundle adjustment) may be used to optimize the pose calibration result of the camera, or according to a pixel matching point corresponding to another moment, the pose calibration result is optimized, and the specific manner is not limited herein.

After the pose calibration result is optimized, image stitching is performed according to the corrected pose calibration result, and a beam adjustment method is taken as an example, and the specific implementation process is as follows: and correcting the pose calibration result according to a beam adjustment method. At this time, splicing the plurality of images in sequence according to the pose calibration result includes: and splicing the images in sequence according to the corrected pose calibration result.

The image stitching method provided by the embodiment of the application is simply introduced, and the method is used as a basic method provided by the embodiment, and is characterized in that pixel matching is performed on adjacent images by using a pixel matching model to obtain pixel matching points, camera pose calibration results of each camera are obtained according to the pixel matching points, and then a plurality of images are stitched according to the camera pose calibration results to obtain panoramic images capable of expressing all environmental information, so that the method is suitable for application scenes with severe environments, and the dependence on specific calibration places is eliminated. Some modifications may be made to the above-described method under different application scenarios, and the following description is further presented on the basis of the above-described method, so as to more fully understand the present technology.

In step S102 of the above method, it is necessary to obtain pixel matching points between adjacent images through a pixel matching model. In the implementation process, different pixel matching models directly influence the effect of pixel matching. For an application scene of the engineering vehicle, the background of a plurality of photographed images is single, and if a model of firstly extracting characteristic points and then carrying out characteristic point matching is adopted, the obtained pixel matching points tend to have larger deterioration. For this, the present embodiment introduces a model of LoFTR (Local Feature Transformer, a model that implements local feature matching based on a deep self-attention transforming network) to obtain accurate pixel matching results.

The implementation method comprises the following steps: and sequentially inputting the adjacent images into the LoFTR model, and obtaining pixel matching points between the adjacent images through the LoFTR model.

After inputting the image into the LoFTR model, the LoFTR model traverses the image to perform pixel point matching calculation. The LoFTR model performs four processes of feature map extraction, information transcoding, coarse granularity matching and fine granularity matching.

Specifically, after the first target image and the second target image of the adjacent images are input into the LoFTR model, the LoFTR model firstly extracts respective feature images of the first target image and the second target image, then respectively performs feature coding of adding position information to the respective feature images, and performs self attention (self attention) calculation and cross attention (cross attention) calculation to obtain the reconstructed features. The reconstructed feature includes both the relationship information between the points of the image itself and the correlations with the matching object, i.e. with the points in the other image. And performing coarse granularity matching on the reconstructed features to obtain a plurality of pixel matching areas in which the first target image and the second target image are matched with each other. And carrying out fine-granularity matching on the basis of a plurality of mutually matched pixel matching areas to obtain the most matched pixel points in the matched pixel matching areas, namely a plurality of pixel matching points.

The image a and the image B are corresponding to each other, after inputting the image a and the image B into the LoFTR feature extraction model, the LoFTR feature extraction model performs feature extraction on the image a and the image B to obtain a feature map A1 corresponding to the image a and a feature map B1 corresponding to the image B, then fuses meaning information and position information of pixels in the feature maps A1 and B1 to obtain feature information A2 and B2 with position information, further performs self-attention mechanism calculation on the A2 to obtain a relationship between each point in the A2, further performs cross-attention mechanism calculation to obtain a relationship between each point in the A2 and each point in the B2, and performs multiple self-attention mechanism calculation and cross-attention mechanism calculation to obtain a reconstructed feature A3. The self-attention mechanism calculation and the calculation process of the crossing-attention mechanism calculation are performed on the B2 to obtain a reconstruction feature B3. And performing coarse granularity matching on the A3 and the B3 to obtain 256 matched pixel matching areas. And carrying out fine granularity matching according to the matched small areas to obtain 256 matched pixel matching points, and outputting the pixel matching points as a result.

Because the LoFTR model can traverse all pixel points of the image to match, the matching result is more comprehensive and the reliability is higher. Moreover, when matching points on adjacent images, the LoFTR model considers both internal global information of the images, namely the interrelationship of each point of the images, and crossing information, namely the interrelationship with each point on another image. The matching process does not find out the characteristic points and match according to the independent information of the characteristic points, but combines the position relation and the global context to match the pixel points, and can always and stably obtain a matching result with high accuracy for a scene image with single background, which is obtained in real time during construction of engineering machinery.

In addition, since the pixel matching model has a large calculation amount when the pixel matching points between the adjacent images are obtained through the pixel matching model, in order to reduce the matching calculation load caused by the pixel matching model obtaining the pixel matching points between the adjacent images, each image of the adjacent images can be segmented into a plurality of image blocks, and then the image block target image blocks corresponding to the shooting overlapping areas are input into the pixel matching model to obtain the pixel matching points between the adjacent images, so as to reduce the calculation amount of the pixel matching model and improve the calculation efficiency, the method specifically comprises the following steps:

the method further comprises, prior to sequentially inputting adjacent images to the pixel matching model: and respectively segmenting the adjacent images to obtain target image blocks corresponding to each image in the adjacent images, wherein the target image blocks are image blocks containing shooting overlapping areas. Sequentially inputting adjacent images into a pixel matching model, including: and inputting the target image block corresponding to each image into the pixel matching model.

For example, the pixel sizes of the adjacent image images a and B are 1800×1800, the right side of the image a is adjacent to the left side of the image B, and before the pixel matching model is input, the image images a and B are split in half, so that the right half image of the image a is used as the target image block corresponding to the image a, the left half image of the image B is used as the target image block corresponding to the image a, and at this time, the pixel sizes of the input images corresponding to the image images a and B are 900×1800, thereby greatly reducing the calculation amount of the pixel matching model.

In addition, in order to ensure the fast operation and the real-time performance, the GPU operation unit can be adopted on the system to accelerate the real-time operation of the model. Or other acceleration methods may be used without limitation.

In addition, when the adjacent images are input into the pixel matching model, the pixel matching model outputs that the pixel matching points of the two images are a plurality of results. For this, screening can be further performed, and fewer optimal results are used as the basis for determining pose calibration results between adjacent cameras in step S103. Wherein the preferred pixel matching points can be determined by a confidence screening method. The confidence is used for indicating the confidence of the pixel matching point, and the higher the confidence is, the higher the probability that two points corresponding to the pixel matching point are one point is, namely the more matching is indicated.

Specifically, when the pixel matching model outputs the pixel matching points, the confidence of each group of pixel matching points is correspondingly output. The method for screening the pixel matching points specifically comprises the following steps: and screening the pixel matching points output by the pixel matching model according to the confidence coefficient of the pixel matching points to obtain target pixel matching points. According to the position of the pixel matching point on the adjacent images, determining the pose calibration result between the adjacent cameras comprises the following steps: and determining pose calibration results between adjacent cameras according to the positions of the target pixel matching points on the adjacent images.

For example, for graphs a and B, the pixel matching model outputs 256 pairs of pixel matching points, each pair of pixel matching points corresponds to the corresponding position coordinates of each point in graphs a and B and the corresponding confidence degrees, the 256 pairs of pixel matching points are ranked from large to small according to the confidence degrees, and the first 20 pairs of pixel matching points are selected as target pixel matching points. And when the pose calibration result is determined, determining the pose calibration result between the adjacent cameras according to the positions of the target pixel matching points on the adjacent images.

In addition, when step S104 is performed, it is necessary to first project a plurality of images onto the same imaging plane according to the pose calibration result, so as to obtain an initial projection image. In this process, the plurality of images may be projected one by one in sequence, or the plurality of images may be projected simultaneously, without being limited thereto. Moreover, overlapping portions of the initial projection images after projection may overlap by different layers. For example, one image of the adjacent images is taken as the bottom layer, and the other image is taken as the top layer, and the overlapping portion at this time displays the overlapping scene in the second target image.

After the initial projection image is obtained, a plurality of projection overlapping areas are obtained, and fusion processing is carried out on the projection overlapping areas in the initial projection image. Among them, there are many methods of fusion processing, and several possible methods are provided below:

First, pixel weighted average based method

When in fusion, weights can be respectively assigned to the gray values of the two images at the same pixel point according to the gray information of the projection overlapping region corresponding to the adjacent images, and the gray values of the fusion region of the two images are weighted. For a color image, the operation is repeated on each channel of the three primary colors, and the fusion gray scale of the three channels is obtained.

Second, maximum (Max)/minimum (Min) based image fusion method

In the fusion processing, the gray values of pixels at corresponding positions in adjacent images are compared, and pixels with larger (or smaller) gray values are used as the pixels of the fused images.

The two methods can be used for carrying out fusion treatment on the projection overlapping region of the initial projection image, but local details near the seam and macroscopic features of two pictures on a large scale cannot be considered. In this regard, the following third method may be adopted, as shown in fig. 2, and fig. 2 is a flowchart of a method for fusing and projecting images of overlapping areas in multiple frequency bands, as shown in steps S201 to S204.

Third, multi-band fusion method

S201, a first projection image and a second projection image corresponding to the projection overlapping area are obtained.

First, when the projection overlapping areas are fused, a first projection image and a second projection image corresponding to the projection overlapping areas to be fused are determined. Wherein the first projection image and the second projection image are projection images corresponding to the projection overlapping area after the corresponding adjacent images are projected. For example, after the projection of the images a and B, the area image c of the projection image of the image a coincides with the area image d of the projection image of the image B, and the projection overlapping area is e, the area image c, and the area image d coincides with the size of the projection overlapping area e.

S202, macro characteristic points and detail characteristic points in the first projection image and the second projection image are determined.

After the first projection image and the second projection image are acquired, macro feature points and detail feature points in the first projection image and the second projection image are determined. Specifically, the first projection image and the second projection image can be respectively unfolded into a Gaussian pyramid according to the frequency, so that a Laplacian pyramid is obtained, different levels of the golden sub-towers are different frequency bands, and different pixels at different frequencies are distinguished, so that macroscopic feature points and detail feature points are distinguished.

S203, carrying out weighted superposition on macro feature points in the first projection image and the second projection image according to first weights corresponding to the macro feature points, and carrying out weighted superposition on detail feature points in the first projection image and the second projection image according to second weights corresponding to the detail feature points.

For pixels with different frequencies, the pixels corresponding to the pyramid level frequencies can be divided into high frequency and low frequency according to a preset frequency threshold value, so that macroscopic feature points and detail feature points are distinguished, the macroscopic feature points are weighted and overlapped according to a first weight, and the detail feature points are weighted and overlapped according to a second weight corresponding to the detail feature points. The pixel points at each level of the pyramid can be weighted and overlapped according to different weights.

S204, carrying out addition operation on the macro feature points and the detail feature points after weighted superposition.

And re-summing the weighted and overlapped macro feature points and detail feature points to obtain a final fusion result. And sequentially superposing the results of all the weighted and superposed frequency bands to obtain a final panoramic image.

In addition, there are other possible methods of fusing the projection overlap regions, and the method is not limited thereto.

In addition, after the panoramic image is obtained, the panoramic image can be displayed, and environmental information is displayed for an operator. The panoramic image can also be transmitted to a computer system as the basis of automatic driving.

Specifically, after the panoramic image is obtained, the panoramic image is transmitted to a display device in a wired or wireless mode, and environmental information of corresponding moments of a plurality of images is displayed to an operator.

The display device can be arranged in the cab of the engineering vehicle or can be arranged as a stand-alone device to be applied to a remote operation scene. And different interactive keys can be arranged on the display device, so that an operator can perform operations such as shrinking, enlarging and the like on the display screen. There is no limitation in this regard.

The panoramic image may also be provided to an autopilot system as a basis for autopilot, in particular, a plurality of cameras are fixedly mounted on the work vehicle, and at this time, acquiring a plurality of images for a target scene captured by the plurality of cameras includes: and controlling a plurality of cameras to shoot the target scene at the same time to obtain a plurality of images. And planning a motion trail according to the panoramic image corresponding to the target scene after the panoramic image is obtained through an image stitching method. And then controlling the engineering vehicle to move according to the movement track.

In addition, in the basic method for image stitching provided above, after step S103, pose calibration results of each camera are obtained. And after the camera is installed, if the installation mode of the camera is not changed, the pose calibration result of the camera is not changed. If the installation mode is changed, the pose calibration result of the camera is correspondingly changed. Aiming at the condition that the installation mode is not changed, after the pose calibration result at the target moment is determined, the image at the subsequent moment can be used for the determined pose calibration result. And for the condition that the installation mode is changed, after the pose calibration result is determined according to the image at the target moment, the pose calibration result needs to be re-determined according to the image at the subsequent moment. Specifically, the following is described.

1. For the situation that the pose of the camera is not changed.

Because the pose of the camera is not changed, after the pose corresponding to each camera is determined, the pose can be directly spliced without determining the pose.

Specifically, after a first panoramic image corresponding to a first moment is obtained according to a plurality of images at the first moment, a first pose calibration result corresponding to each camera is determined. And for the plurality of images at the second moment, the plurality of images at the second moment can be spliced directly according to the first pose calibration result to obtain a second panoramic image corresponding to the second moment.

2. For the case of camera pose changes.

Because of the change of the camera pose, after the pose corresponding to each camera is determined, the pose needs to be re-determined, and then image stitching is performed according to the re-determined pose.

Specifically, according to a plurality of images shot by each camera at a first moment, inputting adjacent images corresponding to the first moment into a pixel matching model, obtaining corresponding pixel matching points, further obtaining a first pose calibration result corresponding to the first moment, and performing image stitching according to the first pose calibration result to obtain a first panoramic image. When the second moment is reached, according to a plurality of images shot by each camera at the second moment, inputting adjacent images corresponding to the second moment into the pixel matching model, obtaining corresponding pixel matching points, further obtaining a second pose calibration result corresponding to the second moment, and performing image stitching according to the second pose calibration result to obtain a second panoramic image.

In addition, as to whether the camera pose is changed, the judgment can be made by different judgment.

The position and orientation of the camera can be judged according to the comparison results of the position and orientation calibration results corresponding to different moments. Specifically, the pose calibration result of each camera corresponding to the first moment can be obtained when the adjacent image corresponding to the first moment is input to the pixel matching model. When the second moment is entered, inputting adjacent images corresponding to the second moment into the pixel matching model, and obtaining pose calibration results of all cameras corresponding to the second moment. And comparing the pose calibration result at the first moment with the pose calibration result at the second moment, and if the difference value is within the threshold range, considering that the pose of the camera is unchanged, adopting the pose calibration result at the first moment as the image projection at the subsequent moment. If the difference value is out of the threshold range, the pose of the camera is considered to be changed, pose calibration results corresponding to the respective moments are calculated for the subsequent moments, and image stitching is carried out according to the pose calibration results corresponding to the respective moments, so that a corresponding panoramic image is generated.

And whether the pose of the camera changes can be judged according to the pose adjustment instruction. Specifically, before an instruction for adjusting the camera pose is not received, the camera pose is considered unchanged, and after the instruction for adjusting the camera pose is received, the camera pose is considered to be changed, and a camera calibration result is redetermined. The pose adjustment command may be a specific camera pose change command. That is, the pose adjustment command may be a command for instructing pose adjustment of the target camera, and if such a command is received, only the target camera needs to be re-determined as to the camera calibration result.

There are other possible ways of determining whether the pose of the camera is changed, and no limitation is made in this regard.

In addition, camera calibration can be performed for different time periods to stitch images. Specifically, in a first period, image stitching is performed on images at all times in the first period according to camera calibration results obtained from a plurality of images corresponding to a first time in the first period. When the image acquisition time enters the second period, image stitching is carried out on the images at all times in the second period according to camera calibration results obtained by a plurality of images corresponding to the first time in the second period.

In addition, instead of dividing the period to determine different camera calibration results for image stitching, other manners may be adopted to determine the camera calibration results that meet the expectations, such as manually issuing an instruction to recalibrate all camera poses, where the specific implementation is dependent on the requirements in the implementation process, and the specific implementation is not limited.

The image stitching method provided by the embodiment of the application is described in detail above, in which the pixel matching model is a trained model, and the following description is given for the training method of the pixel matching model.

Because the direct output result of the pixel matching model is a pixel matching point, the purpose of obtaining the pixel matching point is to further obtain parameters of the camera, namely pose calibration results. And the process of determining the pose calibration result according to the pixel matching points basically does not generate data loss. Correspondingly, the embodiment provides two different training methods for training the pixel matching model.

The first training method is a method for training according to pixel matching points, and the adopted training samples comprise collected image samples and marked pixel matching points. Fig. 3 is a schematic flow chart of a method for training a model according to a pixel matching point according to the embodiment, specifically, steps S301-S304 are shown in fig. 3.

S301, acquiring a first training sample, wherein the first training sample is an adjacent image sample acquired by a plurality of cameras and a pixel matching point marked for the adjacent image sample.

S302, inputting the first training sample into a pixel matching model, and obtaining output pixel matching points corresponding to the adjacent image samples through the pixel matching model.

S303, determining the loss of the pixel matching result between the output pixel matching point and the marked pixel matching point.

S304, adjusting model parameters of the pixel matching model according to the pixel matching result loss.

First, a first training sample is obtained, wherein the training sample is an image combination acquired by a plurality of cameras and a pixel matching point corresponding to the mark. The method comprises the steps of sampling images through a camera, taking adjacent images shot at the same moment as adjacent image samples, and manually labeling or machine labeling pixel matching points for the adjacent image samples, wherein the adjacent image samples and the pixel matching points corresponding to the labeling of the adjacent image samples are taken as a group of training samples of a first training sample.

After the first training sample is input to the pixel matching model, the pixel matching model performs corresponding pixel matching, so that corresponding output pixel matching points are obtained. And carrying out result loss calculation on the output pixel matching points and the marked pixel matching points, and obtaining the position difference of the output pixel matching points and the marked pixel matching points, so that model parameters of the pixel matching model are adjusted according to the pixel matching result loss.

And, continuous training can make the result loss gradually become smaller, finally satisfies training condition. Specifically, when the first training condition is reached, training of the pixel matching model is ended. The first training condition is that the preset training times are reached or the loss of the pixel matching result between the output pixel matching point and the marked pixel matching point is smaller than a preset loss threshold value.

The second training method is a method for training according to the pose calibration result, and the training samples are collected image samples and standard poses. Fig. 4 is a schematic flow chart of a method for training a model according to a pose calibration result according to the embodiment. Specifically, steps S401-S405 are shown.

S401, acquiring a second training sample, wherein the second training sample is an adjacent standard image sample acquired by a plurality of cameras under the standard pose.

S402, inputting the second training sample into a pixel matching model, and obtaining output pixel matching points corresponding to the adjacent standard image samples through the pixel matching model.

S403, determining pose calibration output results between adjacent cameras according to positions of output pixel matching points corresponding to adjacent standard image samples on the adjacent standard image samples, wherein the adjacent standard image samples correspond to the adjacent cameras.

S404, determining loss of camera pose results between the pose calibration output result and the standard pose of the adjacent camera.

S405, adjusting model parameters of the pixel matching model according to the loss of the pose result of the camera.

First, a second training sample is acquired, wherein the training sample is a plurality of images acquired by a plurality of cameras under a set standard pose. And taking adjacent images in the plurality of images as adjacent standard image samples, and combining the labeling pose corresponding to each camera as a group of training samples of the second training samples.

After the second training sample is input into the pixel matching model, the pixel matching model performs corresponding pixel matching on the adjacent standard image samples, so that corresponding output pixel matching points are obtained, and further, pose calibration output results among adjacent cameras are determined. And calculating the result loss of the pose calibration output result and the standard poses of the adjacent cameras to obtain the corresponding camera pose result loss, so that the model parameters of the pixel matching model are adjusted according to the camera pose result loss.

Also, the continuous training can gradually reduce the result loss, and finally meets the training condition. Specifically, the method comprises the following steps. And when the second training condition is reached, finishing training the pixel matching model. The second training condition is that the loss of the camera pose result between the preset training times or pose calibration output result and the standard pose of the adjacent camera is smaller than a preset loss threshold value.

In addition, when training the model, the model training may be performed by combining the two methods, or the model training may be performed by simulating a virtual sample, or the model training may be performed by collecting a sample by other methods, which is not limited herein.

The above details of the image stitching method provided in the embodiment of the present application are described below with reference to fig. 5, where fig. 5 is a schematic structural diagram of an image stitching device 500 provided in the present invention, and as shown in fig. 5, the device includes:

an acquiring unit 501 configured to acquire a plurality of images for a target scene captured by a plurality of cameras; wherein, the shooting visual angles corresponding to the plurality of images are different; a shooting overlapping area exists between adjacent images in the plurality of images;

the matching unit 502 is configured to sequentially input adjacent images into a pixel matching model, and obtain pixel matching points between the adjacent images through the pixel matching model;

a determining unit 503, configured to determine pose calibration results between adjacent cameras according to positions of pixel matching points on adjacent images; the adjacent image corresponds to an adjacent camera;

and the stitching unit 504 is configured to stitch the plurality of images in sequence according to the pose calibration result, so as to obtain a panoramic image corresponding to the target scene.

In a possible implementation, a plurality of cameras are fixedly mounted on the engineering vehicle, and the acquiring unit 501 is further configured to control the plurality of cameras to shoot the target scene at the same time, so as to obtain a plurality of images.

The image stitching device 500 further includes a control unit 505, configured to plan a motion trail according to a panoramic image corresponding to the target scene;

and the method is also used for controlling the engineering vehicle to move according to the movement track.

In a possible implementation manner, the stitching unit 504 is further configured to project the plurality of images onto the same imaging plane according to the pose calibration result, so as to obtain an initial projection image; a projection overlapping area exists in the initial projection image, and the projection overlapping area corresponds to a shooting overlapping area between adjacent images;

and the fusion processing is also used for carrying out fusion processing on the projection overlapping area in the initial projection image.

The method is particularly used for acquiring a first projection image and a second projection image corresponding to the projection overlapping area;

determining macro feature points and detail feature points in the first projection image and the second projection image;

the macro feature points in the first projection image and the second projection image are weighted and overlapped according to the first weights corresponding to the macro feature points, and the detail feature points in the first projection image and the second projection image are weighted and overlapped according to the second weights corresponding to the detail feature points;

And carrying out addition operation on the weighted and overlapped macro characteristic points and detail characteristic points.

The matching unit 502 is further configured to segment the adjacent images respectively, and obtain a target image block corresponding to each image in the adjacent images; the target image block is an image block containing a shooting overlapping area;

and inputting the target image block corresponding to each image into the pixel matching model.

The method is also used for screening the pixel matching points output by the pixel matching model according to the confidence coefficient of the pixel matching points to obtain target pixel matching points;

the determining unit 503 is further configured to determine a pose calibration result between adjacent cameras according to the position of the target pixel matching point on the adjacent images.

The determining unit 503 is further configured to correct the pose calibration result according to a beam adjustment method;

and the method is also used for sequentially splicing the plurality of images according to the corrected pose calibration result.

In a possible embodiment, the obtaining unit 501 is further configured to perform distortion preprocessing on the plurality of images respectively;

the matching unit 502 is further configured to input the preprocessed neighboring image into the pixel matching model.

In one possible embodiment, the image stitching device 500 further includes a training unit 506,

The method comprises the steps of acquiring a first training sample, wherein the first training sample is an adjacent image sample acquired by a plurality of cameras and a pixel matching point marked for the adjacent image sample;

inputting the first training sample into a pixel matching model, and obtaining output pixel matching points corresponding to adjacent image samples through the pixel matching model;

determining the pixel matching result loss between the output pixel matching point and the marked pixel matching point;

and adjusting model parameters of the pixel matching model according to the pixel matching result loss.

The training method is also used for finishing training the pixel matching model when the first training condition is reached;

the first training condition is that the preset training times are reached or the loss of the pixel matching result between the output pixel matching point and the marked pixel matching point is smaller than a preset loss threshold value.

In a possible implementation manner, the training unit 506 is further configured to obtain a second training sample, where the second training sample is an adjacent standard image sample collected by the plurality of cameras under the standard pose;

inputting the second training sample into a pixel matching model, and obtaining output pixel matching points corresponding to the adjacent standard image samples through the pixel matching model;

determining pose calibration output results between adjacent cameras according to positions of output pixel matching points corresponding to adjacent standard image samples on the adjacent standard image samples; the adjacent standard image samples correspond to adjacent cameras;

Determining loss of camera pose results between the pose calibration output result and the standard pose of the adjacent camera;

and adjusting model parameters of the pixel matching model according to the loss of the pose result of the camera.

And the training module is also used for finishing training the pixel matching model when the second training condition is reached;

the second training condition is that the loss of the camera pose result between the preset training times or pose calibration output result and the standard pose of the adjacent camera is smaller than a preset loss threshold value.

In addition, this embodiment also provides an electronic device, as shown in fig. 6, which is configured to implement any one of the possible technical solutions in the embodiments of the present application. In the embodiment shown in fig. 6, the electronic device further comprises a bus 602 and a communication interface 603, wherein the processor 601, the communication interface 603 and the memory 600 are connected by the bus 602.

The memory 600 may include a high-speed random access memory (RAM, random Access Memory), and may further include a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory. The communication connection between the system network element and at least one other network element is implemented via at least one communication interface 603 (which may be wired or wireless), which may use the internet, a wide area network, a local network, a metropolitan area network, etc. Bus 602 may be an ISA (IndustryStandard Architecture ) bus, a PCI (Peripheral ComponentInterconnect, peripheral component interconnect standard) bus, or EISA (Extended Industry StandardArchitecture ) bus, among others. The bus 102 may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, only one rectangular bar is shown in fig. 6, but not only one bus or one type of bus.

The processor 601 may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in the processor 601 or instructions in the form of software. The processor 601 may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP for short), etc.; but also digital signal processors (DigitalSignal Processor, DSP for short), application specific integrated circuits (Application Specific IntegratedCircuit, ASIC for short), field-programmable gate arrays (Field-Programmable Gate Array, FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the methods disclosed in connection with the embodiments of the present application may be embodied directly in hardware, in a decoded processor, or in a combination of hardware and software models in a decoded processor. The software model may be located in a state-of-the-art storage medium such as random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers, etc. The storage medium is located in a memory, and the processor 601 reads information in the memory, and in combination with its hardware, performs the steps of the method of the foregoing embodiment.

The embodiment of the application also provides a computer readable storage medium, which comprises computer instructions, wherein the computer instructions are used for realizing any one of the feasible technical schemes in the embodiment of the application when being executed by a processor.

It should be noted that, the information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region, and provide corresponding operation entries for the user to select authorization or rejection.

While the invention has been described in terms of preferred embodiments, it is not intended to be limiting, but rather, it will be apparent to those skilled in the art that various changes and modifications can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A method of image stitching, the method comprising:

sequentially inputting the adjacent images into a pixel matching model, and obtaining pixel matching points between the adjacent images through the pixel matching model;

determining pose calibration results between adjacent cameras according to the positions of the pixel matching points on the adjacent images; the adjacent image corresponds to the adjacent camera;

and sequentially splicing the plurality of images according to the pose calibration result to obtain a panoramic image corresponding to the target scene.

2. The method of claim 1, wherein the plurality of cameras are fixedly mounted on the work vehicle, and the acquiring a plurality of images of the target scene taken by the plurality of cameras comprises:

controlling the cameras to shoot the target scene at the same moment to obtain the images;

the method further comprises the steps of:

planning a motion trail according to the panoramic image corresponding to the target scene;

and controlling the engineering vehicle to move according to the movement track.

3. The method according to any one of claims 1 to 2, wherein the sequentially stitching the plurality of images according to the pose calibration result includes:

projecting the plurality of images onto the same imaging plane according to the pose calibration result to obtain an initial projection image; a projection overlapping area exists in the initial projection image, and the projection overlapping area corresponds to a shooting overlapping area between the adjacent images;

and carrying out fusion processing on the projection overlapping area in the initial projection image.

4. A method according to claim 3, wherein said fusing of projection overlap regions in said initial projection image comprises:

acquiring a first projection image and a second projection image corresponding to the projection overlapping region;

And carrying out addition operation on the macro feature points and the detail feature points after weighted superposition.

5. The method of claim 4, wherein prior to sequentially inputting the adjacent images to a pixel matching model, the method further comprises:

respectively segmenting the adjacent images to obtain target image blocks corresponding to each image in the adjacent images; the target image block is an image block containing the shooting overlapping area;

sequentially inputting the adjacent images into a pixel matching model, wherein the method comprises the following steps of:

6. The method according to claim 5, further comprising:

screening the pixel matching points output by the pixel matching model according to the confidence coefficient of the pixel matching points to obtain target pixel matching points;

the determining the pose calibration result between adjacent cameras according to the positions of the pixel matching points on the adjacent images comprises the following steps:

and determining the pose calibration result between the adjacent cameras according to the positions of the target pixel matching points on the adjacent images.

7. The method of claim 6, wherein the method further comprises:

correcting the pose calibration result according to a beam adjustment method;

the splicing of the plurality of images in sequence according to the pose calibration result comprises the following steps:

and splicing the images in sequence according to the corrected pose calibration result.

8. The method of claim 7, wherein the method further comprises:

respectively carrying out distortion preprocessing on the plurality of images;

the sequentially inputting the adjacent images into a pixel matching model comprises the following steps:

and inputting the preprocessed adjacent images into the pixel matching model.

9. The method according to claim 1, wherein the method further comprises:

acquiring a first training sample, wherein the first training sample is an adjacent image sample acquired by the cameras and a pixel matching point marked for the adjacent image sample;

inputting the first training sample into the pixel matching model, and obtaining output pixel matching points corresponding to the adjacent image samples through the pixel matching model;

10. The identification method of claim 9, wherein the method further comprises:

when the first training condition is reached, finishing training the pixel matching model;

11. The method according to claim 1, wherein the method further comprises:

acquiring a second training sample, wherein the second training sample is an adjacent standard image sample acquired by the cameras under the standard pose;

inputting the second training sample into the pixel matching model, and obtaining output pixel matching points corresponding to the adjacent standard image samples through the pixel matching model;

determining pose calibration output results between adjacent cameras according to positions of output pixel matching points corresponding to the adjacent standard image samples on the adjacent standard image samples; the adjacent standard image samples correspond to the adjacent cameras;

Determining a camera pose result loss between the pose calibration output result and the standard pose of the adjacent camera;

and adjusting model parameters of the pixel matching model according to the camera pose result loss.

12. The method of identification of claim 11, further comprising:

when the second training condition is reached, finishing training the pixel matching model;

and the second training condition is that the preset training times are reached or the loss of the camera pose result between the pose calibration output result and the standard pose of the adjacent camera is smaller than a preset loss threshold value.

13. An image stitching device, comprising:

the matching unit is used for sequentially inputting the adjacent images into a pixel matching model, and obtaining pixel matching points between the adjacent images through the pixel matching model;

the determining unit is used for determining pose calibration results between adjacent cameras according to the positions of the pixel matching points on the adjacent images; the adjacent image corresponds to the adjacent camera;

And the splicing unit is used for splicing the images in sequence according to the pose calibration result to obtain a panoramic image corresponding to the target scene.

14. An electronic device, comprising: a memory and a processor, the memory and the processor coupled;

the memory is used for storing one or more computer instructions;

the processor is configured to execute the one or more computer instructions to implement the method of any of claims 1-12.

15. A computer readable storage medium having stored thereon one or more computer instructions executable by a processor to implement the method of any of claims 1-12.