CN117291804A

CN117291804A - Binocular image real-time splicing method, device and equipment based on weighted fusion strategy

Info

Publication number: CN117291804A
Application number: CN202311280265.7A
Authority: CN
Inventors: 陈辉; 熊章; 张智; 张青军; 胡国湖; 杜沛力
Original assignee: Wuhan Xingxun Intelligent Technology Co ltd
Current assignee: Wuhan Xingxun Intelligent Technology Co ltd
Priority date: 2023-09-28
Filing date: 2023-09-28
Publication date: 2023-12-26
Anticipated expiration: 2043-09-28
Also published as: CN117291804B

Abstract

The invention relates to the technical field of image stitching, solves the problem that color inconsistency cannot be simply and effectively solved in the prior art, and realizes seamless real-time stitching of binocular images, and provides a method, a device and equipment for real-time stitching of binocular images based on a weighted fusion strategy. The method comprises the following steps: acquiring a first real-time image of a first visual angle and a second real-time image of a second visual angle of an infant care scene; performing color correction on the first real-time image, and outputting a first target image; performing matching analysis on the first target image and the second real-time image, and outputting an overlapping region; acquiring a first depth map and a second depth map, and outputting a first weight map and a second weight map; and outputting the fused image as a spliced image according to the first weight map and the second weight map. The invention provides the benefits of real-time monitoring, consistent color information, overlapping area visual information, depth perception, panoramic stitching and the like for infant care.

Description

Binocular image real-time splicing method, device and equipment based on weighted fusion strategy

Technical Field

The invention relates to the technical field of image stitching, in particular to a binocular image real-time stitching method, device and equipment based on a weighted fusion strategy.

Background

With the continuous development of camera technology, binocular cameras are widely used in fields such as safety monitoring, unmanned aerial vehicle navigation, virtual reality and augmented reality.

For example, binocular image stitching has a variety of applications in the area of infant care: by using the binocular camera splicing technology, a wider visual field range can be provided so as to monitor the activities and safety conditions of infants; the technology can be used for monitoring the position, the posture and the sleeping state of an infant on a bed, so that parents or caregivers can be helped to find any unusual situation in time, and a binocular camera splicing technology can be used for providing more accurate positioning and navigation functions in places such as infant rooms or infant gardens; by identifying the characteristics of furniture, doors and windows and the like in a room, the system can help a caretaker to quickly locate the position of an infant and provide navigation guidance, so that the caretaking process is more convenient and efficient; binocular image stitching can be used for analyzing facial expressions and eye nerves of infants, so that emotion recognition and emotion monitoring are realized; through the analysis of the facial expressions of the infants, the system can automatically judge the emotional states of the infants, such as happiness, drowsiness, anxiety and the like, so that a caretaker is helped to better know and meet the requirements of the infants; binocular image stitching techniques may also be used to perform developmental assessment and early diagnosis of infants. By analyzing the infant's movements, eye concentration and behavioral patterns, the system can provide objective data and metrics to help doctors or professionals assess infant development and discover any potential development problems or diseases early. However, in the application of the practical infant care field, the binocular camera stitching algorithm faces the problems of inconsistent colors, obvious stitching edges and the like, and the problems may cause poor image quality after stitching, and influence the visual effect and the subsequent processing process. In order to solve the above problems, some binocular camera stitching methods are proposed in the prior art. However, these methods still have shortcomings in solving the problems of color inconsistency and noticeable splice edges. For example, some methods ignore depth information when fusing images, resulting in an undesirable stitching effect. Furthermore, the prior art generally relies on linear transformations alone or global feature based methods that may have limited effectiveness in handling complex color differences between cameras when performing color correction.

The prior Chinese patent CN111062873A discloses a parallax image stitching and visualizing method based on a plurality of pairs of binocular cameras, which comprises the following steps: calculating a homography matrix H by combining internal parameters and external parameters of the binocular cameras, the placement angle among the cameras and the scene plane distance d; the value range of d is 8-15 m; calculating an image overlapping region ROI by using the homography matrix H between the obtained images, and modeling the overlapping region; transforming an image coordinate system of the parallax image by utilizing the homography matrix H; seamless splicing is carried out on the optimal suture line in the step 5); when the binocular camera is larger than two, a parallax image of a wider field angle is obtained. However, the above-mentioned patent uses a plurality of pairs of binocular cameras for image stitching, which requires a plurality of cameras, and involves a combination of internal parameters and external parameters of the cameras, adjustment of the camera placement angle, calculation of homography matrix, and the like, which increases the cost and complexity of the system, requiring more equipment and computing resources. Meanwhile, the modeling of the homography matrix and the overlapping area is calculated by the method, and the homography matrix and the overlapping area are calculated by the method, which means that the method has certain limitation on specific conditions of a scene, such as the determination of the camera placement angle and the limitation of the scene distance range. In other scenarios or angles, it may be necessary to readjust the parameters or recalculate the homography matrix, resulting in limited applications.

Therefore, how to simply and effectively solve the problem of inconsistent colors and realize seamless real-time splicing of binocular images is a problem to be solved urgently.

Disclosure of Invention

In view of the above, the invention provides a binocular image real-time splicing method, device and equipment based on a weighted fusion strategy, which are used for solving the problem that color inconsistency cannot be simply and effectively solved in the prior art, and realizing seamless binocular image real-time splicing.

The technical scheme adopted by the invention is as follows:

in a first aspect, the present invention provides a binocular image real-time stitching method based on a weighted fusion strategy, which is characterized in that the method includes:

s1: acquiring a first real-time image of a first visual angle and a second real-time image of a second visual angle different from the first visual angle of an infant care scene;

s2: performing color correction on the first real-time image, and outputting a first target image with the same color as the second real-time image;

s3: performing matching analysis on the first target image and the second real-time image, and outputting an overlapping region;

s4: acquiring a first depth map corresponding to a first target image and a second depth map corresponding to a second real-time image, carrying out weighted averaging processing on the first depth map and the second depth map, and outputting a first weight map and a second weight map;

S5: and carrying out weighted fusion processing on the first target image and the second real-time image according to the first weight graph and the second weight graph, and outputting the fused image as a spliced image.

Preferably, the S2 includes:

s21: acquiring a training image set related to infant care, wherein the training image set comprises a first training image at the first visual angle and a second training image at the second visual angle;

s22: acquiring an image with the color corrected first training image as a second target image, and outputting a loss function according to the color difference between the second target image and the second training image;

s23: training the deep learning model according to the loss function, and outputting the trained deep learning model as a color correction model when the loss function is smaller than a preset threshold value;

s24: and inputting the first real-time image into the color correction model, and outputting the first target image.

Preferably, the S22 includes:

s221: acquiring a channel difference value between the second target image and the second training image;

s222: squaring the channel difference value to obtain the square error of each channel;

S223: and accumulating and summing the square errors of all the channels, averaging, and outputting the loss function.

Preferably, the S3 includes:

s31: infant characteristic point detection is carried out on the first target image and the second real-time image, and a first characteristic point set in the first target image and a second characteristic point set in the second real-time image are obtained, wherein the infant characteristic points at least comprise: nose key points and mouth key points related to infant face shielding judgment;

s32: performing feature description on the first feature point set and the second feature point set, and outputting a first description subset and a second description subset;

s33: performing feature matching on the first descriptor set and the second descriptor set to obtain matching point pairs;

s34: outputting a homography matrix between the first target image and the second real-time image according to the matching point pairs;

s35: and mapping the first target image according to the homography matrix, and taking the intersection area of the mapping image and the second real-time image as the overlapping area.

Preferably, the S4 includes:

s41: acquiring a first depth map corresponding to a first target image and a second depth map corresponding to a second real-time image by utilizing a parallax estimation algorithm;

S42: respectively calculating pixel gradients in the first depth map and pixel gradients in the second depth map to obtain a first gradient map and a second gradient map;

s43: normalizing the first gradient map and the second gradient map to obtain a first normalized depth map and a second normalized depth map;

s44: and carrying out weighted averaging on the first normalized gradient map and the second normalized gradient map to obtain the first weight map and the second weight map.

Preferably, the S41 includes:

s411: acquiring a first dense depth map corresponding to a first target image and a second dense depth map corresponding to a second real-time image by utilizing a parallax estimation algorithm;

s412: and carrying out normalization processing on the first dense depth map and the second dense depth map, mapping a first parallax value corresponding to the first dense depth map and a parallax value corresponding to the second dense depth map into a preset parallax range, and outputting the first depth map and the second depth map.

Preferably, the S44 includes:

s441: acquiring a preset first weight parameter and a preset second weight parameter;

s442, weighting the first depth map and the first normalized gradient map according to the first weight parameter, and outputting the first weight map;

S443: and carrying out weighting processing on the second depth map and the second normalized gradient map according to the second weight parameter, and outputting the second weight map.

In a second aspect, the present invention provides a binocular image real-time stitching device based on a weighted fusion strategy, where the device includes:

the image acquisition module is used for acquiring a first real-time image of a first visual angle and a second real-time image of a second visual angle different from the first visual angle under the infant care scene;

the color correction module is used for performing color correction on the first real-time image and outputting a first target image with the same color as the second real-time image;

the matching analysis module is used for carrying out matching analysis on the first target image and the second real-time image and outputting an overlapping area;

the weight map acquisition module is used for acquiring a first depth map corresponding to the first target image and a second depth map corresponding to the second real-time image, carrying out weighted average processing on the first depth map and the second depth map, and outputting the first weight map and the second weight map;

and the image fusion module is used for carrying out weighted fusion processing on the first target image and the second real-time image according to the first weight image and the second weight image, and outputting the fused image as a spliced image.

In a third aspect, an embodiment of the present invention further provides an electronic device, including: at least one processor, at least one memory and computer program instructions stored in the memory, which when executed by the processor, implement the method as in the first aspect of the embodiments described above.

In a fourth aspect, embodiments of the present invention also provide a storage medium having stored thereon computer program instructions which, when executed by a processor, implement a method as in the first aspect of the embodiments described above.

In summary, the beneficial effects of the invention are as follows:

the invention provides a binocular image real-time splicing method, device and equipment based on a weighted fusion strategy, wherein the method comprises the following steps: acquiring a first real-time image of a first visual angle and a second real-time image of a second visual angle different from the first visual angle of an infant care scene; performing color correction on the first real-time image, and outputting a first target image with the same color as the second real-time image; performing matching analysis on the first target image and the second real-time image, and outputting an overlapping region; acquiring a first depth map corresponding to a first target image and a second depth map corresponding to a second real-time image, carrying out weighted averaging processing on the first depth map and the second depth map, and outputting a first weight map and a second weight map; and carrying out weighted fusion processing on the first target image and the second real-time image according to the first weight graph and the second weight graph, and outputting the fused image as a spliced image. According to the invention, by acquiring the first real-time image of the first visual angle and the second real-time image different from the first visual angle, the real-time monitoring of the infant care scene is realized, the caretaker can observe the dynamic condition of the infant at any time, discover the abnormal condition in time and take corresponding measures; the first real-time image is subjected to color correction, a first target image with the same color as the second real-time image is output, so that images under two visual angles can have consistent color information, more real and accurate visual information can be provided by the color consistency, and a caretaker can observe and judge infant scenes conveniently; by carrying out matching analysis on the first target image and the second real-time image, an overlapping area between the first target image and the second real-time image can be found, and the overlapping area can provide more comprehensive visual information, so that a caretaker can better know the movement condition of an infant under different visual angles; according to the first weight map and the second weight map, the first target image and the second real-time image are subjected to weighted fusion processing, and the fused spliced image is output, so that the spliced image provides scene information with more panorama and more detail, a caretaker can more comprehensively observe the infant activity condition, and the nursing efficiency and the nursing accuracy are improved. In summary, the scheme provides real-time monitoring, consistent color information, overlapping area visual information, depth perception, panoramic stitching and other benefits for infant care, and the advantages are helpful for improving the reliability, monitoring efficiency and care quality of a care system, so that better guarantee is provided for infant safety and health.

Drawings

In order to more clearly illustrate the technical solution of the embodiments of the present invention, the drawings required to be used in the embodiments of the present invention will be briefly described, and it is within the scope of the present invention to obtain other drawings according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flow chart of overall operation of a binocular image real-time stitching method based on a weighted fusion strategy in embodiment 1 of the present invention;

fig. 2 is a schematic flow chart of performing color correction on the first real-time image in embodiment 1 of the present invention;

FIG. 3 is a flow chart of the acquisition loss function in embodiment 1 of the present invention;

fig. 4 is a flow chart of acquiring an overlapping area in embodiment 1 of the present invention;

FIG. 5 is a flowchart of the method for obtaining a weight map in embodiment 1 of the present invention;

FIG. 6 is a schematic flow chart of obtaining a depth map in embodiment 1 of the present invention;

FIG. 7 is a schematic flow chart of the weighting process in embodiment 1 of the present invention;

fig. 8 is a block diagram of a binocular image real-time stitching device based on a weighted fusion strategy in embodiment 2 of the present invention;

fig. 9 is a schematic structural diagram of an electronic device in embodiment 3 of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. In the description of the present invention, it should be understood that the terms "center," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like indicate orientations or positional relationships based on the orientation or positional relationships shown in the drawings, merely to facilitate description of the present application and simplify the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element. If not conflicting, the embodiments of the present invention and the features of the embodiments may be combined with each other, which are all within the protection scope of the present invention.

Example 1

Referring to fig. 1, embodiment 1 of the invention discloses a binocular image real-time stitching method based on a weighted fusion strategy, which comprises the following steps:

specifically, under a common infant care scene, acquiring a first real-time image acquired by a care camera at a first visual angle and a second real-time image acquired by a care camera at a second visual angle, wherein the second visual angle is different from the first visual angle; by obtaining images from different perspectives, a more comprehensive monitoring and observation is provided. The first view is typically mounted in a fixed location, such as a bedside or a corner of a room, by a camera to capture an image of the front view of the infant. This view angle may provide direct facial features and expression information. The second view is obtained by the other camera being mounted at a different position or angle. For example, a second camera may be placed on the other side of the room or near the crib at an oblique angle to provide a different viewing angle. Additional monitoring coverage, such as different postures or activities of the infant in the bed, may be obtained. By simultaneously acquiring real-time images of the first and second viewing angles, a caregiver or parent can more fully understand the status, behavior, and safety of the infant. Such an image acquisition arrangement helps to provide more viewing angles for monitoring to meet the care needs of infants.

in particular, in the described scene, the first live image is color corrected in order to have the same color characteristics as the second live image. This ensures that the images from the two perspectives remain consistent in color, providing a more accurate and consistent visual experience, wherein color correction is an image processing technique that can adjust the color and hue of the images to match the colors between the different images. Outputting a first target image with the same color as the second real-time image by performing color correction on the first real-time image; the color difference of different cameras, illumination conditions or sensors can be eliminated, so that the images of two visual angles are more consistent in color, and more real and reliable visual information is provided. This helps reduce the impact of color deviation on infant care and provides more accurate image information for caretakers or parents to reference and judge.

In one embodiment, referring to fig. 2, the step S2 includes:

s21: acquiring a training image set, wherein the training image set comprises a first training image under the first visual angle and a second training image under the second visual angle;

Specifically, a pre-prepared training image set is obtained, including a first training image at a first viewing angle and a second training image at a second viewing angle, which will be used to train a model or algorithm to learn color mappings and other features between images. The first training image under the first view angle refers to an image sample from a camera under the first view angle, is used for training a model or an algorithm, and usually comes from actual shooting of an infant care site, and captures facial expression, posture or other relevant information of the infant; the second training image at the second view angle refers to an image sample from the second view angle camera for training a model or algorithm, and the images are also taken from the infant care site, providing image information at different view angles and observation angles, and creating an effective training data set for the color correction task by collecting such training image sets and pairing the images at the first view angle and the second view angle.

S22: taking the image after the color correction of the first training image as a second target image, and acquiring a loss function according to the difference between the second target image and the second training image;

Specifically, a loss function L is predefined and used for measuring the color difference between the second target image after the color correction of the first training image and the second training image; by taking the loss function, the difference between the second target image and the second training image can be quantified for training and optimizing the color correction model or algorithm.

In one embodiment, referring to fig. 3, the step S22 includes:

specifically, by subtracting the second target image and the second training image pixel by pixel on each channel, three difference images are obtained, and the difference between the second target image and the second training image on each channel is specifically calculated as follows:

ΔR＝I 1'_i(:,:,0)-I 2_i(:,:,0)

ΔG＝I 1'_i(:,:,1)-I 2_i(:,:,1)

ΔB＝I 1'_i(:,:,2)–I 2_i(:,:,2)

where I1' represents the second target image, I2 represents the second training image, (:) represents three channels of the image. 0,1,2 represent red, green, blue channels, respectively, Δr represents the difference in red channels, Δg represents the difference in green channels, and Δb represents the difference in blue channels.

Specifically, for each channel, the square of the difference image is calculated, resulting in three square error images: se_ R, SE _g and se_b, the square error on each channel is calculated specifically as follows:

SE_R＝ΔR^2

SE_G＝ΔG^2

SE_B＝ΔB^2

where se_r represents the square error of the red channel, se_g represents the square error of the green channel, and se_b represents the square error of the blue channel.

Specifically, three square error images are summed at each pixel location to obtain a total square error image. The sum of all pixels is then divided by the number of pixels of the image (n×h×w), resulting in a final MSE loss:

L＝1/(N*H*W)*Σ(SE_R+SE_G+SE_B)

where N represents the number of image pairs, H represents the height of the image, and W represents the width of the image. By summing the square errors of the pixels of all images and averaging, a scalar value can be obtained that measures the accuracy of the color matching. A smaller loss value indicates a better color matching effect, while a larger loss value indicates a larger color difference.

S23: training the deep learning model, and taking the trained deep learning model as a color correction model when the loss function is smaller than a preset threshold value;

Specifically, by calculating the loss function, the difference between the second target image and the second training image can be quantified for training and optimizing a color correction model using a U-Net neural network model. The parameters of the U-Net neural network model are updated by using a random gradient descent method, so that a loss function is minimized, the minimized standard is that the loss function is smaller than a preset threshold value, so that the color characteristics of the first training image after color correction can be as close as possible to the color characteristics of the second training image, and visual consistency and accurate color correction are realized, wherein the random gradient descent method is a common optimization algorithm used for updating the parameters of the neural network model.

specifically, the first target image and the second real-time image are subjected to matching analysis, an overlapping region is output, and the output overlapping region can be used for subsequent image stitching or fusion operation to generate a stitched image or a synthesized image, so that image display or information extraction of a wider field of view is realized.

In one embodiment, referring to fig. 4, the step S3 includes:

specifically, firstly, an infant image is processed by using a YOLOV8S target detection model, and an infant face area in the image is detected; in the infant face area obtained by target detection, detecting nose key points and mouth key points by using a special key point detection algorithm, such as a face key point detection model or a characteristic point detection algorithm; the detected nose and mouth keypoints are grouped into feature point sets that will be used for subsequent analysis and care scene applications. The YOLOV8S is an efficient target detection model, and can detect infant face areas in images in real time, so that an interested area can be rapidly determined, the calculation complexity of subsequent processing is reduced, the YOLOV8S can detect multiple targets simultaneously, which means that the model can detect the face areas of multiple infants in one image, is suitable for a multi-infant nursing scene, can accurately detect nose key points and mouth key points through a special key point detection algorithm, accurately acquires head gestures, facial expressions and position information of the nose and the mouth of the infants, and can realize automation of infant nursing by combining the YOLOV8S target detection and key point detection. Through analyzing the detected characteristic points, the system can judge the state and emotion of the infant, timely find abnormal conditions and automatically inform a nursing staff to process, the infant characteristic points can provide more comprehensive and accurate information, and the nursing staff can know the state and condition of the infant comprehensively, so that the nursing efficiency and the nursing safety are improved.

specifically, the first and second descriptor sets of feature point sets K1 and K2 are calculated using an ORB descriptor algorithm. The ORB descriptor algorithm can process the condition of rotation change in an image by introducing rotation invariance on the basis of FAST corner detection, and the ORB descriptor algorithm enables the descriptor to have certain rotation invariance by calculating the direction of each characteristic point.

specifically, a violent matching method and a nearest neighbor ratio testing method are adopted to match the first descriptor and the second descriptor. The brute force matching method simply calculates the distances between all pairs of feature points and selects the nearest matching pair. The nearest neighbor ratio test method further screens matching point pairs, namely, on the basis of violent matching, descriptors of each feature point are ordered according to distance, and then only two matching pairs closest to the feature point are reserved. The nearest neighbor ratio test is then applied, and is considered a valid match when the distance of the first match pair is significantly smaller than the distance of the second match pair. The violent matching method and the nearest neighbor ratio testing method can provide a preliminary matching result in the feature matching, and further exclude a part of mismatching, so that the accuracy of extracting the overlapping area is improved.

specifically, with the RANSAC method, the homography matrix H between the first target image and the second real-time image is estimated from the matching point pair. The RANSAC method is an iterative parameter estimation algorithm, and a homography matrix with the maximum number of inner points is finally selected as an estimation result by randomly selecting subsets of matching point pairs and calculating the homography matrix according to the subsets.

Specifically, mapping the first target image into a new image i1_new according to the homography matrix H, setting an out-of-range area as black, and calculating an intersection area of the new image i1_new and the second real-time image, wherein the area is a required overlapping area C; this overlapping area is a common area between the two images that can be used for subsequent image fusion, alignment or other related tasks, while pixels beyond the first target image range will be set to black or other background color during the mapping process. This ensures that only valid pixels common to both images are contained in the overlap region C, avoiding the effects of invalid or noisy pixels in the overlap region.

in one embodiment, referring to fig. 5, the step S4 includes:

specifically, a first depth map corresponding to a first target image and a second depth map corresponding to a second real-time image are obtained by using a parallax estimation algorithm. The parallax estimation is to estimate distance information of objects in a scene by analyzing displacement differences between corresponding pixel points in two images. Distance information of objects in the scene can be estimated by the first depth map and the second depth map.

In one embodiment, referring to fig. 6, the step S41 includes:

specifically, a BM algorithm is mainly used to calculate and obtain a dense depth map: preprocessing the first target image and the second real-time image, including operations such as graying and denoising, and converting the image into a gray image can reduce the calculated amount, and denoising can improve the accuracy of depth estimation; dividing the two images into blocks with equal sizes, wherein each block is called a search area; for the pixels in each search area, searching the most similar pixels in the second real-time image, and calculating the matching cost of the pixels, wherein the common matching cost comprises pixel difference, gray level difference and the like; for each pixel, the matching cost in the surrounding neighborhood is aggregated, and aggregation can be performed in a cumulative sum, average value and other modes; and selecting the pixel with the smallest disparity value as the final disparity value based on the matching cost. The disparity value represents the horizontal displacement of a pixel in the first target image relative to a pixel in the second real-time image; and combining the internal and external parameters of the camera to convert the parallax value into a depth value. By means of triangulation and the like, a first dense depth map corresponding to the first target image and a second dense depth map corresponding to the second real-time image can be calculated. Depth information of each pixel point in the image can be known through the first and second dense depth maps,

Specifically, after the first dense depth map and the second dense depth map are calculated by using the BM algorithm, the range of depth values may be generally arbitrary, but for convenience of use and representation, the depth values are mapped to a range between 0 and 1, so as to output the normalized first depth map and second depth map. By mapping the depth values into a fixed range, subsequent splicing processing and analysis are facilitated.

specifically, the gradient map may be calculated by a Sobe l operator or other edge detection operator. For each pixel position (x, y), a lateral gradient Gx and a longitudinal gradient Gy can be calculated separately, and the gradient magnitude G (x, y) can be calculated from these two components. Taking the Sobe l operator as an example, the gradient calculation process is as follows: applying a Sobe l operator to the depth map D1, and calculating a lateral gradient Gx1 and a longitudinal gradient Gy1:

Gx1(x,y)＝D1(x+1,y)-D1(x-1,y)；

Gy1(x,y)＝D1(x,y+1)-D1(x,y-1)；

Applying a Sobe l operator to the depth map D2, and calculating a lateral gradient Gx2 and a longitudinal gradient Gy2:

Gx2(x,y)＝D2(x+1,y)-D2(x-1,y)；

Gy2(x,y)＝D2(x,y+1)-D2(x,y-1)；

calculating gradient amplitude G (x, y): g (x, y) =sqrt (Gx (x, y)/(2+gy (x, y)/(2));

wherein Gx (x, y) =gx1 (x, y) +gx2 (x, y), gy (x, y) =gy 1 (x, y) +gy2 (x, y);

the gradient map reflects the change rate or edge intensity of each pixel point in the depth map, and can be used for subsequent tasks such as edge detection, feature extraction, image segmentation and the like. By calculating the gradient amplitude, the gradient intensity information of each pixel point can be obtained, and then the structure and the edge information of the image are analyzed.

specifically, the minimum and maximum values of the gradient maps G1 and G2 are acquired, and are denoted as min_g1 and max_g1, and min_g2 and max_g2; the gradient map G1 is normalized and its value range is mapped to [0,1]:

G1_norm＝(G1-min_G1)/(max_G1-min_G1)

the gradient map G2 is normalized and its value range is mapped to [0,1]:

G2_norm＝(G2-min_G2)/(max_G2-min_G2)

normalized gradient maps G1_norm and G2_norm will have a range of values between [0,1] so that the intensity information of the gradients can be more conveniently compared and analyzed.

Specifically, the first normalized gradient map g1_norm and the second normalized gradient map g2_norm are weighted and averaged, and the respective contribution degrees can be controlled by setting weights because the process of weighted and averaged takes the gradient information of the two images into consideration. The final first weight map and second weight map may be used in subsequent image fusion or processing to better preserve important gradient information.

In one embodiment, referring to fig. 7, the step S44 includes:

Specifically, a first weight parameter α1 and a second weight parameter α2 are defined, wherein 0< = α1, α2< = 1, for balancing the contributions of the depth information and the gradient information. Then, for each pixel position (x, y), a first weight map W1 and a second weight map W2 are calculated:

W1(x,y)＝α*D1_norm(x,y)+(1-α1)*G1_norm(x,y)

W2(x,y)＝α*D2_norm(x,y)+(1-α2)*G2_norm(x,y)

wherein d1_norm and d2_norm represent normalized depth maps D1 and D2, respectively.

Specifically, the first weight map is applied to the first target image, the second weight map is applied to the second real-time image, and the overlapping areas are subjected to weighted fusion, wherein the weighted fusion refers to the following calculation method:

I_new(x,y)＝W1(x,y)*I 3(x，y)+W2(x,y)*I4(x,y)

wherein I3 (x, y) is a pixel value in the first target image, I4 (x, y) is a pixel value in the second real-time image, and i_new (x, y) is the fused stitched image. The method can flexibly control the fusion result according to the distribution of the weight graphs through weighted fusion, reserve important information, realize smooth transition and adapt to the requirements of different scenes, thereby improving the quality and expressive force of the output spliced images.

Example 2

Referring to fig. 8, embodiment 2 of the present invention further provides a binocular image real-time stitching device based on a weighted fusion policy, where the device includes:

Specifically, the binocular image real-time splicing device based on the weighted fusion strategy provided by the embodiment of the invention comprises the following steps: the image acquisition module is used for acquiring a first real-time image of a first visual angle and a second real-time image of a second visual angle different from the first visual angle under the infant care scene; the color correction module is used for performing color correction on the first real-time image and outputting a first target image with the same color as the second real-time image; the matching analysis module is used for carrying out matching analysis on the first target image and the second real-time image and outputting an overlapping area; the weight map acquisition module is used for acquiring a first depth map corresponding to the first target image and a second depth map corresponding to the second real-time image, carrying out weighted average processing on the first depth map and the second depth map, and outputting the first weight map and the second weight map; and the image fusion module is used for carrying out weighted fusion processing on the first target image and the second real-time image according to the first weight image and the second weight image, and outputting the fused image as a spliced image. The device realizes the real-time monitoring of the infant nursing scene by acquiring the first real-time image of the first visual angle and the second real-time image different from the first visual angle, and a nursing staff can observe the dynamic condition of the infant at any time, discover the abnormal condition in time and take corresponding measures; the first real-time image is subjected to color correction, a first target image with the same color as the second real-time image is output, so that images under two visual angles can have consistent color information, more real and accurate visual information can be provided by the color consistency, and a caretaker can observe and judge infant scenes conveniently; by carrying out matching analysis on the first target image and the second real-time image, an overlapping area between the first target image and the second real-time image can be found, and the overlapping area can provide more comprehensive visual information, so that a caretaker can better know the movement condition of an infant under different visual angles; according to the first weight map and the second weight map, the first target image and the second real-time image are subjected to weighted fusion processing, and the fused spliced image is output, so that the spliced image provides scene information with more panorama and more detail, a caretaker can more comprehensively observe the infant activity condition, and the nursing efficiency and the nursing accuracy are improved. In summary, the scheme provides real-time monitoring, consistent color information, overlapping area visual information, depth perception, panoramic stitching and other benefits for infant care, and the advantages are helpful for improving the reliability, monitoring efficiency and care quality of a care system, so that better guarantee is provided for infant safety and health.

Example 3

In addition, the binocular image real-time stitching method based on the weighted fusion strategy of the embodiment 1 of the present invention described in connection with fig. 1 may be implemented by an electronic device. Fig. 9 shows a schematic hardware structure of an electronic device according to embodiment 3 of the present invention.

The electronic device may include a processor and memory storing computer program instructions.

In particular, the processor may comprise a Central Processing Unit (CPU), or an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or may be configured as one or more integrated circuits that implement embodiments of the present invention.

The memory may include mass storage for data or instructions. By way of example, and not limitation, the memory may comprise a Hard Disk Drive (HDD), floppy Disk Drive, flash memory, optical Disk, magneto-optical Disk, magnetic tape, or universal serial bus (Universal Serial Bus, USB) Drive, or a combination of two or more of the foregoing. The memory may include removable or non-removable (or fixed) media, where appropriate. The memory may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory is a non-volatile solid state memory. In a particular embodiment, the memory includes Read Only Memory (ROM). The ROM may be mask programmed ROM, programmable ROM (PROM), erasable PROM (EPROM), electrically Erasable PROM (EEPROM), electrically rewritable ROM (EAROM), or flash memory, or a combination of two or more of these, where appropriate.

The processor reads and executes the computer program instructions stored in the memory to implement any of the binocular image real-time stitching methods based on the weighted fusion strategy in the above embodiments.

In one example, the electronic device may also include a communication interface and a bus. The processor, the memory, and the communication interface are connected by a bus and complete communication with each other, as shown in fig. 9.

The communication interface is mainly used for realizing communication among the modules, the devices, the units and/or the equipment in the embodiment of the invention.

The bus includes hardware, software, or both that couple the components of the device to each other. By way of example, and not limitation, the buses may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a HyperTransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an infiniband interconnect, a Low Pin Count (LPC) bus, a memory bus, a micro channel architecture (MCa) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a video electronics standards association local (VLB) bus, or other suitable bus, or a combination of two or more of the above. The bus may include one or more buses, where appropriate. Although embodiments of the invention have been described and illustrated with respect to a particular bus, the invention contemplates any suitable bus or interconnect.

Example 4

In addition, in combination with the method for splicing binocular images based on the weighted fusion policy in the above embodiment 1, embodiment 4 of the present invention may also provide a computer readable storage medium for implementation. The computer readable storage medium has stored thereon computer program instructions; the computer program instructions, when executed by the processor, implement any of the weighted fusion policy-based binocular image real-time stitching methods of the above embodiments.

In summary, the embodiment of the invention provides a binocular image real-time splicing method, device and equipment based on a weighted fusion strategy.

It should be understood that the invention is not limited to the particular arrangements and instrumentality described above and shown in the drawings. For the sake of brevity, a detailed description of known methods is omitted here. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present invention are not limited to the specific steps described and shown, and those skilled in the art can make various changes, modifications and additions, or change the order between steps, after appreciating the spirit of the present invention.

The functional blocks shown in the above-described structural block diagrams may be implemented in hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, a plug-in, a function card, or the like. When implemented in software, the elements of the invention are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine readable medium or transmitted over transmission media or communication links by a data signal carried in a carrier wave. A "machine-readable medium" may include any medium that can store or transfer information. Examples of machine-readable media include electronic circuitry, semiconductor memory devices, ROM, flash memory, erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, radio Frequency (RF) links, and the like. The code segments may be downloaded via computer networks such as the internet, intranets, etc.

It should also be noted that the exemplary embodiments mentioned in this disclosure describe some methods or systems based on a series of steps or devices. However, the present invention is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, or may be performed in a different order from the order in the embodiments, or several steps may be performed simultaneously.

In the foregoing, only the specific embodiments of the present invention are described, and it will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the systems, modules and units described above may refer to the corresponding processes in the foregoing method embodiments, which are not repeated herein. It should be understood that the scope of the present invention is not limited thereto, and any equivalent modifications or substitutions can be easily made by those skilled in the art within the technical scope of the present invention, and they should be included in the scope of the present invention.

Claims

1. The binocular image real-time splicing method based on the weighted fusion strategy is characterized by comprising the following steps of:

2. The method for real-time stitching of binocular images based on weighted fusion strategy according to claim 1, wherein S2 comprises:

3. The method for real-time stitching of binocular images based on weighted fusion strategy according to claim 2, wherein S22 comprises:

4. The method for real-time stitching of binocular images based on weighted fusion strategy according to claim 1, wherein S3 comprises:

5. The method for real-time stitching of binocular images based on weighted fusion strategy according to claim 1, wherein S4 comprises:

6. The method for real-time stitching of binocular images based on weighted fusion strategy according to claim 5, wherein S41 comprises:

7. The method for real-time stitching of binocular images based on weighted fusion strategy according to claim 5, wherein S44 comprises:

8. A binocular image real-time stitching device based on a weighted fusion strategy, the device comprising:

9. An electronic device, comprising: at least one processor, at least one memory, and computer program instructions stored in the memory, which when executed by the processor, implement the method of any one of claims 1-7.

10. A storage medium having stored thereon computer program instructions, which when executed by a processor, implement the method of any of claims 1-7.