CN117291804A - Binocular image real-time splicing method, device and equipment based on weighted fusion strategy - Google Patents
Binocular image real-time splicing method, device and equipment based on weighted fusion strategy Download PDFInfo
- Publication number
- CN117291804A CN117291804A CN202311280265.7A CN202311280265A CN117291804A CN 117291804 A CN117291804 A CN 117291804A CN 202311280265 A CN202311280265 A CN 202311280265A CN 117291804 A CN117291804 A CN 117291804A
- Authority
- CN
- China
- Prior art keywords
- image
- real
- time
- depth map
- outputting
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 66
- 230000004927 fusion Effects 0.000 title claims abstract description 37
- 230000000007 visual effect Effects 0.000 claims abstract description 57
- 238000012937 correction Methods 0.000 claims abstract description 32
- 238000004458 analytical method Methods 0.000 claims abstract description 19
- 238000012549 training Methods 0.000 claims description 44
- 238000004422 calculation algorithm Methods 0.000 claims description 22
- 238000012545 processing Methods 0.000 claims description 21
- 230000006870 function Effects 0.000 claims description 19
- 239000011159 matrix material Substances 0.000 claims description 17
- 238000001514 detection method Methods 0.000 claims description 13
- 238000013507 mapping Methods 0.000 claims description 13
- 238000012935 Averaging Methods 0.000 claims description 11
- 238000007499 fusion processing Methods 0.000 claims description 10
- 238000004590 computer program Methods 0.000 claims description 8
- 238000013136 deep learning model Methods 0.000 claims description 6
- 238000003860 storage Methods 0.000 claims description 5
- 238000010606 normalization Methods 0.000 claims description 3
- 238000012544 monitoring process Methods 0.000 abstract description 13
- 230000008901 benefit Effects 0.000 abstract description 6
- 230000008447 perception Effects 0.000 abstract description 3
- 230000008569 process Effects 0.000 description 14
- 230000000474 nursing effect Effects 0.000 description 12
- 230000000694 effects Effects 0.000 description 7
- 238000004891 communication Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 4
- 230000008921 facial expression Effects 0.000 description 4
- 230000002159 abnormal effect Effects 0.000 description 3
- 230000009471 action Effects 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 239000003086 colorant Substances 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000003062 neural network model Methods 0.000 description 3
- 230000036544 posture Effects 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000006073 displacement reaction Methods 0.000 description 2
- 238000003708 edge detection Methods 0.000 description 2
- 230000008451 emotion Effects 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000011478 gradient descent method Methods 0.000 description 2
- 230000036541 health Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 208000019901 Anxiety disease Diseases 0.000 description 1
- 206010041349 Somnolence Diseases 0.000 description 1
- 238000007792 addition Methods 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 230000036506 anxiety Effects 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000003542 behavioural effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000008131 children development Effects 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000013399 early diagnosis Methods 0.000 description 1
- 230000008909 emotion recognition Effects 0.000 description 1
- 230000002996 emotional effect Effects 0.000 description 1
- 230000001815 facial effect Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000010998 test method Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4038—Image mosaicing, e.g. composing plane images from plane sub-images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/90—Determination of colour characteristics
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/10—Image acquisition
- G06V10/16—Image acquisition using multiple overlapping images; Image stitching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2200/00—Indexing scheme for image data processing or generation, in general
- G06T2200/32—Indexing scheme for image data processing or generation, in general involving image mosaicing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Human Computer Interaction (AREA)
- Artificial Intelligence (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Image Processing (AREA)
Abstract
The invention relates to the technical field of image stitching, solves the problem that color inconsistency cannot be simply and effectively solved in the prior art, and realizes seamless real-time stitching of binocular images, and provides a method, a device and equipment for real-time stitching of binocular images based on a weighted fusion strategy. The method comprises the following steps: acquiring a first real-time image of a first visual angle and a second real-time image of a second visual angle of an infant care scene; performing color correction on the first real-time image, and outputting a first target image; performing matching analysis on the first target image and the second real-time image, and outputting an overlapping region; acquiring a first depth map and a second depth map, and outputting a first weight map and a second weight map; and outputting the fused image as a spliced image according to the first weight map and the second weight map. The invention provides the benefits of real-time monitoring, consistent color information, overlapping area visual information, depth perception, panoramic stitching and the like for infant care.
Description
Technical Field
The invention relates to the technical field of image stitching, in particular to a binocular image real-time stitching method, device and equipment based on a weighted fusion strategy.
Background
With the continuous development of camera technology, binocular cameras are widely used in fields such as safety monitoring, unmanned aerial vehicle navigation, virtual reality and augmented reality.
For example, binocular image stitching has a variety of applications in the area of infant care: by using the binocular camera splicing technology, a wider visual field range can be provided so as to monitor the activities and safety conditions of infants; the technology can be used for monitoring the position, the posture and the sleeping state of an infant on a bed, so that parents or caregivers can be helped to find any unusual situation in time, and a binocular camera splicing technology can be used for providing more accurate positioning and navigation functions in places such as infant rooms or infant gardens; by identifying the characteristics of furniture, doors and windows and the like in a room, the system can help a caretaker to quickly locate the position of an infant and provide navigation guidance, so that the caretaking process is more convenient and efficient; binocular image stitching can be used for analyzing facial expressions and eye nerves of infants, so that emotion recognition and emotion monitoring are realized; through the analysis of the facial expressions of the infants, the system can automatically judge the emotional states of the infants, such as happiness, drowsiness, anxiety and the like, so that a caretaker is helped to better know and meet the requirements of the infants; binocular image stitching techniques may also be used to perform developmental assessment and early diagnosis of infants. By analyzing the infant's movements, eye concentration and behavioral patterns, the system can provide objective data and metrics to help doctors or professionals assess infant development and discover any potential development problems or diseases early. However, in the application of the practical infant care field, the binocular camera stitching algorithm faces the problems of inconsistent colors, obvious stitching edges and the like, and the problems may cause poor image quality after stitching, and influence the visual effect and the subsequent processing process. In order to solve the above problems, some binocular camera stitching methods are proposed in the prior art. However, these methods still have shortcomings in solving the problems of color inconsistency and noticeable splice edges. For example, some methods ignore depth information when fusing images, resulting in an undesirable stitching effect. Furthermore, the prior art generally relies on linear transformations alone or global feature based methods that may have limited effectiveness in handling complex color differences between cameras when performing color correction.
The prior Chinese patent CN111062873A discloses a parallax image stitching and visualizing method based on a plurality of pairs of binocular cameras, which comprises the following steps: calculating a homography matrix H by combining internal parameters and external parameters of the binocular cameras, the placement angle among the cameras and the scene plane distance d; the value range of d is 8-15 m; calculating an image overlapping region ROI by using the homography matrix H between the obtained images, and modeling the overlapping region; transforming an image coordinate system of the parallax image by utilizing the homography matrix H; seamless splicing is carried out on the optimal suture line in the step 5); when the binocular camera is larger than two, a parallax image of a wider field angle is obtained. However, the above-mentioned patent uses a plurality of pairs of binocular cameras for image stitching, which requires a plurality of cameras, and involves a combination of internal parameters and external parameters of the cameras, adjustment of the camera placement angle, calculation of homography matrix, and the like, which increases the cost and complexity of the system, requiring more equipment and computing resources. Meanwhile, the modeling of the homography matrix and the overlapping area is calculated by the method, and the homography matrix and the overlapping area are calculated by the method, which means that the method has certain limitation on specific conditions of a scene, such as the determination of the camera placement angle and the limitation of the scene distance range. In other scenarios or angles, it may be necessary to readjust the parameters or recalculate the homography matrix, resulting in limited applications.
Therefore, how to simply and effectively solve the problem of inconsistent colors and realize seamless real-time splicing of binocular images is a problem to be solved urgently.
Disclosure of Invention
In view of the above, the invention provides a binocular image real-time splicing method, device and equipment based on a weighted fusion strategy, which are used for solving the problem that color inconsistency cannot be simply and effectively solved in the prior art, and realizing seamless binocular image real-time splicing.
The technical scheme adopted by the invention is as follows:
in a first aspect, the present invention provides a binocular image real-time stitching method based on a weighted fusion strategy, which is characterized in that the method includes:
s1: acquiring a first real-time image of a first visual angle and a second real-time image of a second visual angle different from the first visual angle of an infant care scene;
s2: performing color correction on the first real-time image, and outputting a first target image with the same color as the second real-time image;
s3: performing matching analysis on the first target image and the second real-time image, and outputting an overlapping region;
s4: acquiring a first depth map corresponding to a first target image and a second depth map corresponding to a second real-time image, carrying out weighted averaging processing on the first depth map and the second depth map, and outputting a first weight map and a second weight map;
S5: and carrying out weighted fusion processing on the first target image and the second real-time image according to the first weight graph and the second weight graph, and outputting the fused image as a spliced image.
Preferably, the S2 includes:
s21: acquiring a training image set related to infant care, wherein the training image set comprises a first training image at the first visual angle and a second training image at the second visual angle;
s22: acquiring an image with the color corrected first training image as a second target image, and outputting a loss function according to the color difference between the second target image and the second training image;
s23: training the deep learning model according to the loss function, and outputting the trained deep learning model as a color correction model when the loss function is smaller than a preset threshold value;
s24: and inputting the first real-time image into the color correction model, and outputting the first target image.
Preferably, the S22 includes:
s221: acquiring a channel difference value between the second target image and the second training image;
s222: squaring the channel difference value to obtain the square error of each channel;
S223: and accumulating and summing the square errors of all the channels, averaging, and outputting the loss function.
Preferably, the S3 includes:
s31: infant characteristic point detection is carried out on the first target image and the second real-time image, and a first characteristic point set in the first target image and a second characteristic point set in the second real-time image are obtained, wherein the infant characteristic points at least comprise: nose key points and mouth key points related to infant face shielding judgment;
s32: performing feature description on the first feature point set and the second feature point set, and outputting a first description subset and a second description subset;
s33: performing feature matching on the first descriptor set and the second descriptor set to obtain matching point pairs;
s34: outputting a homography matrix between the first target image and the second real-time image according to the matching point pairs;
s35: and mapping the first target image according to the homography matrix, and taking the intersection area of the mapping image and the second real-time image as the overlapping area.
Preferably, the S4 includes:
s41: acquiring a first depth map corresponding to a first target image and a second depth map corresponding to a second real-time image by utilizing a parallax estimation algorithm;
S42: respectively calculating pixel gradients in the first depth map and pixel gradients in the second depth map to obtain a first gradient map and a second gradient map;
s43: normalizing the first gradient map and the second gradient map to obtain a first normalized depth map and a second normalized depth map;
s44: and carrying out weighted averaging on the first normalized gradient map and the second normalized gradient map to obtain the first weight map and the second weight map.
Preferably, the S41 includes:
s411: acquiring a first dense depth map corresponding to a first target image and a second dense depth map corresponding to a second real-time image by utilizing a parallax estimation algorithm;
s412: and carrying out normalization processing on the first dense depth map and the second dense depth map, mapping a first parallax value corresponding to the first dense depth map and a parallax value corresponding to the second dense depth map into a preset parallax range, and outputting the first depth map and the second depth map.
Preferably, the S44 includes:
s441: acquiring a preset first weight parameter and a preset second weight parameter;
s442, weighting the first depth map and the first normalized gradient map according to the first weight parameter, and outputting the first weight map;
S443: and carrying out weighting processing on the second depth map and the second normalized gradient map according to the second weight parameter, and outputting the second weight map.
In a second aspect, the present invention provides a binocular image real-time stitching device based on a weighted fusion strategy, where the device includes:
the image acquisition module is used for acquiring a first real-time image of a first visual angle and a second real-time image of a second visual angle different from the first visual angle under the infant care scene;
the color correction module is used for performing color correction on the first real-time image and outputting a first target image with the same color as the second real-time image;
the matching analysis module is used for carrying out matching analysis on the first target image and the second real-time image and outputting an overlapping area;
the weight map acquisition module is used for acquiring a first depth map corresponding to the first target image and a second depth map corresponding to the second real-time image, carrying out weighted average processing on the first depth map and the second depth map, and outputting the first weight map and the second weight map;
and the image fusion module is used for carrying out weighted fusion processing on the first target image and the second real-time image according to the first weight image and the second weight image, and outputting the fused image as a spliced image.
In a third aspect, an embodiment of the present invention further provides an electronic device, including: at least one processor, at least one memory and computer program instructions stored in the memory, which when executed by the processor, implement the method as in the first aspect of the embodiments described above.
In a fourth aspect, embodiments of the present invention also provide a storage medium having stored thereon computer program instructions which, when executed by a processor, implement a method as in the first aspect of the embodiments described above.
In summary, the beneficial effects of the invention are as follows:
the invention provides a binocular image real-time splicing method, device and equipment based on a weighted fusion strategy, wherein the method comprises the following steps: acquiring a first real-time image of a first visual angle and a second real-time image of a second visual angle different from the first visual angle of an infant care scene; performing color correction on the first real-time image, and outputting a first target image with the same color as the second real-time image; performing matching analysis on the first target image and the second real-time image, and outputting an overlapping region; acquiring a first depth map corresponding to a first target image and a second depth map corresponding to a second real-time image, carrying out weighted averaging processing on the first depth map and the second depth map, and outputting a first weight map and a second weight map; and carrying out weighted fusion processing on the first target image and the second real-time image according to the first weight graph and the second weight graph, and outputting the fused image as a spliced image. According to the invention, by acquiring the first real-time image of the first visual angle and the second real-time image different from the first visual angle, the real-time monitoring of the infant care scene is realized, the caretaker can observe the dynamic condition of the infant at any time, discover the abnormal condition in time and take corresponding measures; the first real-time image is subjected to color correction, a first target image with the same color as the second real-time image is output, so that images under two visual angles can have consistent color information, more real and accurate visual information can be provided by the color consistency, and a caretaker can observe and judge infant scenes conveniently; by carrying out matching analysis on the first target image and the second real-time image, an overlapping area between the first target image and the second real-time image can be found, and the overlapping area can provide more comprehensive visual information, so that a caretaker can better know the movement condition of an infant under different visual angles; according to the first weight map and the second weight map, the first target image and the second real-time image are subjected to weighted fusion processing, and the fused spliced image is output, so that the spliced image provides scene information with more panorama and more detail, a caretaker can more comprehensively observe the infant activity condition, and the nursing efficiency and the nursing accuracy are improved. In summary, the scheme provides real-time monitoring, consistent color information, overlapping area visual information, depth perception, panoramic stitching and other benefits for infant care, and the advantages are helpful for improving the reliability, monitoring efficiency and care quality of a care system, so that better guarantee is provided for infant safety and health.
Drawings
In order to more clearly illustrate the technical solution of the embodiments of the present invention, the drawings required to be used in the embodiments of the present invention will be briefly described, and it is within the scope of the present invention to obtain other drawings according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flow chart of overall operation of a binocular image real-time stitching method based on a weighted fusion strategy in embodiment 1 of the present invention;
fig. 2 is a schematic flow chart of performing color correction on the first real-time image in embodiment 1 of the present invention;
FIG. 3 is a flow chart of the acquisition loss function in embodiment 1 of the present invention;
fig. 4 is a flow chart of acquiring an overlapping area in embodiment 1 of the present invention;
FIG. 5 is a flowchart of the method for obtaining a weight map in embodiment 1 of the present invention;
FIG. 6 is a schematic flow chart of obtaining a depth map in embodiment 1 of the present invention;
FIG. 7 is a schematic flow chart of the weighting process in embodiment 1 of the present invention;
fig. 8 is a block diagram of a binocular image real-time stitching device based on a weighted fusion strategy in embodiment 2 of the present invention;
fig. 9 is a schematic structural diagram of an electronic device in embodiment 3 of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. In the description of the present invention, it should be understood that the terms "center," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like indicate orientations or positional relationships based on the orientation or positional relationships shown in the drawings, merely to facilitate description of the present application and simplify the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element. If not conflicting, the embodiments of the present invention and the features of the embodiments may be combined with each other, which are all within the protection scope of the present invention.
Example 1
Referring to fig. 1, embodiment 1 of the invention discloses a binocular image real-time stitching method based on a weighted fusion strategy, which comprises the following steps:
s1: acquiring a first real-time image of a first visual angle and a second real-time image of a second visual angle different from the first visual angle of an infant care scene;
specifically, under a common infant care scene, acquiring a first real-time image acquired by a care camera at a first visual angle and a second real-time image acquired by a care camera at a second visual angle, wherein the second visual angle is different from the first visual angle; by obtaining images from different perspectives, a more comprehensive monitoring and observation is provided. The first view is typically mounted in a fixed location, such as a bedside or a corner of a room, by a camera to capture an image of the front view of the infant. This view angle may provide direct facial features and expression information. The second view is obtained by the other camera being mounted at a different position or angle. For example, a second camera may be placed on the other side of the room or near the crib at an oblique angle to provide a different viewing angle. Additional monitoring coverage, such as different postures or activities of the infant in the bed, may be obtained. By simultaneously acquiring real-time images of the first and second viewing angles, a caregiver or parent can more fully understand the status, behavior, and safety of the infant. Such an image acquisition arrangement helps to provide more viewing angles for monitoring to meet the care needs of infants.
S2: performing color correction on the first real-time image, and outputting a first target image with the same color as the second real-time image;
in particular, in the described scene, the first live image is color corrected in order to have the same color characteristics as the second live image. This ensures that the images from the two perspectives remain consistent in color, providing a more accurate and consistent visual experience, wherein color correction is an image processing technique that can adjust the color and hue of the images to match the colors between the different images. Outputting a first target image with the same color as the second real-time image by performing color correction on the first real-time image; the color difference of different cameras, illumination conditions or sensors can be eliminated, so that the images of two visual angles are more consistent in color, and more real and reliable visual information is provided. This helps reduce the impact of color deviation on infant care and provides more accurate image information for caretakers or parents to reference and judge.
In one embodiment, referring to fig. 2, the step S2 includes:
s21: acquiring a training image set, wherein the training image set comprises a first training image under the first visual angle and a second training image under the second visual angle;
Specifically, a pre-prepared training image set is obtained, including a first training image at a first viewing angle and a second training image at a second viewing angle, which will be used to train a model or algorithm to learn color mappings and other features between images. The first training image under the first view angle refers to an image sample from a camera under the first view angle, is used for training a model or an algorithm, and usually comes from actual shooting of an infant care site, and captures facial expression, posture or other relevant information of the infant; the second training image at the second view angle refers to an image sample from the second view angle camera for training a model or algorithm, and the images are also taken from the infant care site, providing image information at different view angles and observation angles, and creating an effective training data set for the color correction task by collecting such training image sets and pairing the images at the first view angle and the second view angle.
S22: taking the image after the color correction of the first training image as a second target image, and acquiring a loss function according to the difference between the second target image and the second training image;
Specifically, a loss function L is predefined and used for measuring the color difference between the second target image after the color correction of the first training image and the second training image; by taking the loss function, the difference between the second target image and the second training image can be quantified for training and optimizing the color correction model or algorithm.
In one embodiment, referring to fig. 3, the step S22 includes:
s221: acquiring a channel difference value between the second target image and the second training image;
specifically, by subtracting the second target image and the second training image pixel by pixel on each channel, three difference images are obtained, and the difference between the second target image and the second training image on each channel is specifically calculated as follows:
ΔR=I 1'_i(:,:,0)-I 2_i(:,:,0)
ΔG=I 1'_i(:,:,1)-I 2_i(:,:,1)
ΔB=I 1'_i(:,:,2)–I 2_i(:,:,2)
where I1' represents the second target image, I2 represents the second training image, (:) represents three channels of the image. 0,1,2 represent red, green, blue channels, respectively, Δr represents the difference in red channels, Δg represents the difference in green channels, and Δb represents the difference in blue channels.
S222: squaring the channel difference value to obtain the square error of each channel;
Specifically, for each channel, the square of the difference image is calculated, resulting in three square error images: se_ R, SE _g and se_b, the square error on each channel is calculated specifically as follows:
SE_R=ΔR^2
SE_G=ΔG^2
SE_B=ΔB^2
where se_r represents the square error of the red channel, se_g represents the square error of the green channel, and se_b represents the square error of the blue channel.
S223: and accumulating and summing the square errors of all the channels, averaging, and outputting the loss function.
Specifically, three square error images are summed at each pixel location to obtain a total square error image. The sum of all pixels is then divided by the number of pixels of the image (n×h×w), resulting in a final MSE loss:
L=1/(N*H*W)*Σ(SE_R+SE_G+SE_B)
where N represents the number of image pairs, H represents the height of the image, and W represents the width of the image. By summing the square errors of the pixels of all images and averaging, a scalar value can be obtained that measures the accuracy of the color matching. A smaller loss value indicates a better color matching effect, while a larger loss value indicates a larger color difference.
S23: training the deep learning model, and taking the trained deep learning model as a color correction model when the loss function is smaller than a preset threshold value;
Specifically, by calculating the loss function, the difference between the second target image and the second training image can be quantified for training and optimizing a color correction model using a U-Net neural network model. The parameters of the U-Net neural network model are updated by using a random gradient descent method, so that a loss function is minimized, the minimized standard is that the loss function is smaller than a preset threshold value, so that the color characteristics of the first training image after color correction can be as close as possible to the color characteristics of the second training image, and visual consistency and accurate color correction are realized, wherein the random gradient descent method is a common optimization algorithm used for updating the parameters of the neural network model.
S24: and inputting the first real-time image into the color correction model, and outputting the first target image.
S3: performing matching analysis on the first target image and the second real-time image, and outputting an overlapping region;
specifically, the first target image and the second real-time image are subjected to matching analysis, an overlapping region is output, and the output overlapping region can be used for subsequent image stitching or fusion operation to generate a stitched image or a synthesized image, so that image display or information extraction of a wider field of view is realized.
In one embodiment, referring to fig. 4, the step S3 includes:
s31: infant characteristic point detection is carried out on the first target image and the second real-time image, and a first characteristic point set in the first target image and a second characteristic point set in the second real-time image are obtained, wherein the infant characteristic points at least comprise: nose key points and mouth key points related to infant face shielding judgment;
specifically, firstly, an infant image is processed by using a YOLOV8S target detection model, and an infant face area in the image is detected; in the infant face area obtained by target detection, detecting nose key points and mouth key points by using a special key point detection algorithm, such as a face key point detection model or a characteristic point detection algorithm; the detected nose and mouth keypoints are grouped into feature point sets that will be used for subsequent analysis and care scene applications. The YOLOV8S is an efficient target detection model, and can detect infant face areas in images in real time, so that an interested area can be rapidly determined, the calculation complexity of subsequent processing is reduced, the YOLOV8S can detect multiple targets simultaneously, which means that the model can detect the face areas of multiple infants in one image, is suitable for a multi-infant nursing scene, can accurately detect nose key points and mouth key points through a special key point detection algorithm, accurately acquires head gestures, facial expressions and position information of the nose and the mouth of the infants, and can realize automation of infant nursing by combining the YOLOV8S target detection and key point detection. Through analyzing the detected characteristic points, the system can judge the state and emotion of the infant, timely find abnormal conditions and automatically inform a nursing staff to process, the infant characteristic points can provide more comprehensive and accurate information, and the nursing staff can know the state and condition of the infant comprehensively, so that the nursing efficiency and the nursing safety are improved.
S32: performing feature description on the first feature point set and the second feature point set, and outputting a first description subset and a second description subset;
specifically, the first and second descriptor sets of feature point sets K1 and K2 are calculated using an ORB descriptor algorithm. The ORB descriptor algorithm can process the condition of rotation change in an image by introducing rotation invariance on the basis of FAST corner detection, and the ORB descriptor algorithm enables the descriptor to have certain rotation invariance by calculating the direction of each characteristic point.
S33: performing feature matching on the first descriptor set and the second descriptor set to obtain matching point pairs;
specifically, a violent matching method and a nearest neighbor ratio testing method are adopted to match the first descriptor and the second descriptor. The brute force matching method simply calculates the distances between all pairs of feature points and selects the nearest matching pair. The nearest neighbor ratio test method further screens matching point pairs, namely, on the basis of violent matching, descriptors of each feature point are ordered according to distance, and then only two matching pairs closest to the feature point are reserved. The nearest neighbor ratio test is then applied, and is considered a valid match when the distance of the first match pair is significantly smaller than the distance of the second match pair. The violent matching method and the nearest neighbor ratio testing method can provide a preliminary matching result in the feature matching, and further exclude a part of mismatching, so that the accuracy of extracting the overlapping area is improved.
S34: outputting a homography matrix between the first target image and the second real-time image according to the matching point pairs;
specifically, with the RANSAC method, the homography matrix H between the first target image and the second real-time image is estimated from the matching point pair. The RANSAC method is an iterative parameter estimation algorithm, and a homography matrix with the maximum number of inner points is finally selected as an estimation result by randomly selecting subsets of matching point pairs and calculating the homography matrix according to the subsets.
S35: and mapping the first target image according to the homography matrix, and taking the intersection area of the mapping image and the second real-time image as the overlapping area.
Specifically, mapping the first target image into a new image i1_new according to the homography matrix H, setting an out-of-range area as black, and calculating an intersection area of the new image i1_new and the second real-time image, wherein the area is a required overlapping area C; this overlapping area is a common area between the two images that can be used for subsequent image fusion, alignment or other related tasks, while pixels beyond the first target image range will be set to black or other background color during the mapping process. This ensures that only valid pixels common to both images are contained in the overlap region C, avoiding the effects of invalid or noisy pixels in the overlap region.
S4: acquiring a first depth map corresponding to a first target image and a second depth map corresponding to a second real-time image, carrying out weighted averaging processing on the first depth map and the second depth map, and outputting a first weight map and a second weight map;
in one embodiment, referring to fig. 5, the step S4 includes:
s41: acquiring a first depth map corresponding to a first target image and a second depth map corresponding to a second real-time image by utilizing a parallax estimation algorithm;
specifically, a first depth map corresponding to a first target image and a second depth map corresponding to a second real-time image are obtained by using a parallax estimation algorithm. The parallax estimation is to estimate distance information of objects in a scene by analyzing displacement differences between corresponding pixel points in two images. Distance information of objects in the scene can be estimated by the first depth map and the second depth map.
In one embodiment, referring to fig. 6, the step S41 includes:
s411: acquiring a first dense depth map corresponding to a first target image and a second dense depth map corresponding to a second real-time image by utilizing a parallax estimation algorithm;
specifically, a BM algorithm is mainly used to calculate and obtain a dense depth map: preprocessing the first target image and the second real-time image, including operations such as graying and denoising, and converting the image into a gray image can reduce the calculated amount, and denoising can improve the accuracy of depth estimation; dividing the two images into blocks with equal sizes, wherein each block is called a search area; for the pixels in each search area, searching the most similar pixels in the second real-time image, and calculating the matching cost of the pixels, wherein the common matching cost comprises pixel difference, gray level difference and the like; for each pixel, the matching cost in the surrounding neighborhood is aggregated, and aggregation can be performed in a cumulative sum, average value and other modes; and selecting the pixel with the smallest disparity value as the final disparity value based on the matching cost. The disparity value represents the horizontal displacement of a pixel in the first target image relative to a pixel in the second real-time image; and combining the internal and external parameters of the camera to convert the parallax value into a depth value. By means of triangulation and the like, a first dense depth map corresponding to the first target image and a second dense depth map corresponding to the second real-time image can be calculated. Depth information of each pixel point in the image can be known through the first and second dense depth maps,
S412: and carrying out normalization processing on the first dense depth map and the second dense depth map, mapping a first parallax value corresponding to the first dense depth map and a parallax value corresponding to the second dense depth map into a preset parallax range, and outputting the first depth map and the second depth map.
Specifically, after the first dense depth map and the second dense depth map are calculated by using the BM algorithm, the range of depth values may be generally arbitrary, but for convenience of use and representation, the depth values are mapped to a range between 0 and 1, so as to output the normalized first depth map and second depth map. By mapping the depth values into a fixed range, subsequent splicing processing and analysis are facilitated.
S42: respectively calculating pixel gradients in the first depth map and pixel gradients in the second depth map to obtain a first gradient map and a second gradient map;
specifically, the gradient map may be calculated by a Sobe l operator or other edge detection operator. For each pixel position (x, y), a lateral gradient Gx and a longitudinal gradient Gy can be calculated separately, and the gradient magnitude G (x, y) can be calculated from these two components. Taking the Sobe l operator as an example, the gradient calculation process is as follows: applying a Sobe l operator to the depth map D1, and calculating a lateral gradient Gx1 and a longitudinal gradient Gy1:
Gx1(x,y)=D1(x+1,y)-D1(x-1,y);
Gy1(x,y)=D1(x,y+1)-D1(x,y-1);
Applying a Sobe l operator to the depth map D2, and calculating a lateral gradient Gx2 and a longitudinal gradient Gy2:
Gx2(x,y)=D2(x+1,y)-D2(x-1,y);
Gy2(x,y)=D2(x,y+1)-D2(x,y-1);
calculating gradient amplitude G (x, y): g (x, y) =sqrt (Gx (x, y)/(2+gy (x, y)/(2));
wherein Gx (x, y) =gx1 (x, y) +gx2 (x, y), gy (x, y) =gy 1 (x, y) +gy2 (x, y);
the gradient map reflects the change rate or edge intensity of each pixel point in the depth map, and can be used for subsequent tasks such as edge detection, feature extraction, image segmentation and the like. By calculating the gradient amplitude, the gradient intensity information of each pixel point can be obtained, and then the structure and the edge information of the image are analyzed.
S43: normalizing the first gradient map and the second gradient map to obtain a first normalized depth map and a second normalized depth map;
specifically, the minimum and maximum values of the gradient maps G1 and G2 are acquired, and are denoted as min_g1 and max_g1, and min_g2 and max_g2; the gradient map G1 is normalized and its value range is mapped to [0,1]:
G1_norm=(G1-min_G1)/(max_G1-min_G1)
the gradient map G2 is normalized and its value range is mapped to [0,1]:
G2_norm=(G2-min_G2)/(max_G2-min_G2)
normalized gradient maps G1_norm and G2_norm will have a range of values between [0,1] so that the intensity information of the gradients can be more conveniently compared and analyzed.
S44: and carrying out weighted averaging on the first normalized gradient map and the second normalized gradient map to obtain the first weight map and the second weight map.
Specifically, the first normalized gradient map g1_norm and the second normalized gradient map g2_norm are weighted and averaged, and the respective contribution degrees can be controlled by setting weights because the process of weighted and averaged takes the gradient information of the two images into consideration. The final first weight map and second weight map may be used in subsequent image fusion or processing to better preserve important gradient information.
In one embodiment, referring to fig. 7, the step S44 includes:
s441: acquiring a preset first weight parameter and a preset second weight parameter;
s442, weighting the first depth map and the first normalized gradient map according to the first weight parameter, and outputting the first weight map;
s443: and carrying out weighting processing on the second depth map and the second normalized gradient map according to the second weight parameter, and outputting the second weight map.
Specifically, a first weight parameter α1 and a second weight parameter α2 are defined, wherein 0< = α1, α2< = 1, for balancing the contributions of the depth information and the gradient information. Then, for each pixel position (x, y), a first weight map W1 and a second weight map W2 are calculated:
W1(x,y)=α*D1_norm(x,y)+(1-α1)*G1_norm(x,y)
W2(x,y)=α*D2_norm(x,y)+(1-α2)*G2_norm(x,y)
wherein d1_norm and d2_norm represent normalized depth maps D1 and D2, respectively.
S5: and carrying out weighted fusion processing on the first target image and the second real-time image according to the first weight graph and the second weight graph, and outputting the fused image as a spliced image.
Specifically, the first weight map is applied to the first target image, the second weight map is applied to the second real-time image, and the overlapping areas are subjected to weighted fusion, wherein the weighted fusion refers to the following calculation method:
I_new(x,y)=W1(x,y)*I 3(x,y)+W2(x,y)*I4(x,y)
wherein I3 (x, y) is a pixel value in the first target image, I4 (x, y) is a pixel value in the second real-time image, and i_new (x, y) is the fused stitched image. The method can flexibly control the fusion result according to the distribution of the weight graphs through weighted fusion, reserve important information, realize smooth transition and adapt to the requirements of different scenes, thereby improving the quality and expressive force of the output spliced images.
Example 2
Referring to fig. 8, embodiment 2 of the present invention further provides a binocular image real-time stitching device based on a weighted fusion policy, where the device includes:
the image acquisition module is used for acquiring a first real-time image of a first visual angle and a second real-time image of a second visual angle different from the first visual angle under the infant care scene;
The color correction module is used for performing color correction on the first real-time image and outputting a first target image with the same color as the second real-time image;
the matching analysis module is used for carrying out matching analysis on the first target image and the second real-time image and outputting an overlapping area;
the weight map acquisition module is used for acquiring a first depth map corresponding to the first target image and a second depth map corresponding to the second real-time image, carrying out weighted average processing on the first depth map and the second depth map, and outputting the first weight map and the second weight map;
and the image fusion module is used for carrying out weighted fusion processing on the first target image and the second real-time image according to the first weight image and the second weight image, and outputting the fused image as a spliced image.
Specifically, the binocular image real-time splicing device based on the weighted fusion strategy provided by the embodiment of the invention comprises the following steps: the image acquisition module is used for acquiring a first real-time image of a first visual angle and a second real-time image of a second visual angle different from the first visual angle under the infant care scene; the color correction module is used for performing color correction on the first real-time image and outputting a first target image with the same color as the second real-time image; the matching analysis module is used for carrying out matching analysis on the first target image and the second real-time image and outputting an overlapping area; the weight map acquisition module is used for acquiring a first depth map corresponding to the first target image and a second depth map corresponding to the second real-time image, carrying out weighted average processing on the first depth map and the second depth map, and outputting the first weight map and the second weight map; and the image fusion module is used for carrying out weighted fusion processing on the first target image and the second real-time image according to the first weight image and the second weight image, and outputting the fused image as a spliced image. The device realizes the real-time monitoring of the infant nursing scene by acquiring the first real-time image of the first visual angle and the second real-time image different from the first visual angle, and a nursing staff can observe the dynamic condition of the infant at any time, discover the abnormal condition in time and take corresponding measures; the first real-time image is subjected to color correction, a first target image with the same color as the second real-time image is output, so that images under two visual angles can have consistent color information, more real and accurate visual information can be provided by the color consistency, and a caretaker can observe and judge infant scenes conveniently; by carrying out matching analysis on the first target image and the second real-time image, an overlapping area between the first target image and the second real-time image can be found, and the overlapping area can provide more comprehensive visual information, so that a caretaker can better know the movement condition of an infant under different visual angles; according to the first weight map and the second weight map, the first target image and the second real-time image are subjected to weighted fusion processing, and the fused spliced image is output, so that the spliced image provides scene information with more panorama and more detail, a caretaker can more comprehensively observe the infant activity condition, and the nursing efficiency and the nursing accuracy are improved. In summary, the scheme provides real-time monitoring, consistent color information, overlapping area visual information, depth perception, panoramic stitching and other benefits for infant care, and the advantages are helpful for improving the reliability, monitoring efficiency and care quality of a care system, so that better guarantee is provided for infant safety and health.
Example 3
In addition, the binocular image real-time stitching method based on the weighted fusion strategy of the embodiment 1 of the present invention described in connection with fig. 1 may be implemented by an electronic device. Fig. 9 shows a schematic hardware structure of an electronic device according to embodiment 3 of the present invention.
The electronic device may include a processor and memory storing computer program instructions.
In particular, the processor may comprise a Central Processing Unit (CPU), or an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or may be configured as one or more integrated circuits that implement embodiments of the present invention.
The memory may include mass storage for data or instructions. By way of example, and not limitation, the memory may comprise a Hard Disk Drive (HDD), floppy Disk Drive, flash memory, optical Disk, magneto-optical Disk, magnetic tape, or universal serial bus (Universal Serial Bus, USB) Drive, or a combination of two or more of the foregoing. The memory may include removable or non-removable (or fixed) media, where appropriate. The memory may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory is a non-volatile solid state memory. In a particular embodiment, the memory includes Read Only Memory (ROM). The ROM may be mask programmed ROM, programmable ROM (PROM), erasable PROM (EPROM), electrically Erasable PROM (EEPROM), electrically rewritable ROM (EAROM), or flash memory, or a combination of two or more of these, where appropriate.
The processor reads and executes the computer program instructions stored in the memory to implement any of the binocular image real-time stitching methods based on the weighted fusion strategy in the above embodiments.
In one example, the electronic device may also include a communication interface and a bus. The processor, the memory, and the communication interface are connected by a bus and complete communication with each other, as shown in fig. 9.
The communication interface is mainly used for realizing communication among the modules, the devices, the units and/or the equipment in the embodiment of the invention.
The bus includes hardware, software, or both that couple the components of the device to each other. By way of example, and not limitation, the buses may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a HyperTransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an infiniband interconnect, a Low Pin Count (LPC) bus, a memory bus, a micro channel architecture (MCa) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a video electronics standards association local (VLB) bus, or other suitable bus, or a combination of two or more of the above. The bus may include one or more buses, where appropriate. Although embodiments of the invention have been described and illustrated with respect to a particular bus, the invention contemplates any suitable bus or interconnect.
Example 4
In addition, in combination with the method for splicing binocular images based on the weighted fusion policy in the above embodiment 1, embodiment 4 of the present invention may also provide a computer readable storage medium for implementation. The computer readable storage medium has stored thereon computer program instructions; the computer program instructions, when executed by the processor, implement any of the weighted fusion policy-based binocular image real-time stitching methods of the above embodiments.
In summary, the embodiment of the invention provides a binocular image real-time splicing method, device and equipment based on a weighted fusion strategy.
It should be understood that the invention is not limited to the particular arrangements and instrumentality described above and shown in the drawings. For the sake of brevity, a detailed description of known methods is omitted here. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present invention are not limited to the specific steps described and shown, and those skilled in the art can make various changes, modifications and additions, or change the order between steps, after appreciating the spirit of the present invention.
The functional blocks shown in the above-described structural block diagrams may be implemented in hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, a plug-in, a function card, or the like. When implemented in software, the elements of the invention are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine readable medium or transmitted over transmission media or communication links by a data signal carried in a carrier wave. A "machine-readable medium" may include any medium that can store or transfer information. Examples of machine-readable media include electronic circuitry, semiconductor memory devices, ROM, flash memory, erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, radio Frequency (RF) links, and the like. The code segments may be downloaded via computer networks such as the internet, intranets, etc.
It should also be noted that the exemplary embodiments mentioned in this disclosure describe some methods or systems based on a series of steps or devices. However, the present invention is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, or may be performed in a different order from the order in the embodiments, or several steps may be performed simultaneously.
In the foregoing, only the specific embodiments of the present invention are described, and it will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the systems, modules and units described above may refer to the corresponding processes in the foregoing method embodiments, which are not repeated herein. It should be understood that the scope of the present invention is not limited thereto, and any equivalent modifications or substitutions can be easily made by those skilled in the art within the technical scope of the present invention, and they should be included in the scope of the present invention.
Claims (10)
1. The binocular image real-time splicing method based on the weighted fusion strategy is characterized by comprising the following steps of:
s1: acquiring a first real-time image of a first visual angle and a second real-time image of a second visual angle different from the first visual angle of an infant care scene;
S2: performing color correction on the first real-time image, and outputting a first target image with the same color as the second real-time image;
s3: performing matching analysis on the first target image and the second real-time image, and outputting an overlapping region;
s4: acquiring a first depth map corresponding to a first target image and a second depth map corresponding to a second real-time image, carrying out weighted averaging processing on the first depth map and the second depth map, and outputting a first weight map and a second weight map;
s5: and carrying out weighted fusion processing on the first target image and the second real-time image according to the first weight graph and the second weight graph, and outputting the fused image as a spliced image.
2. The method for real-time stitching of binocular images based on weighted fusion strategy according to claim 1, wherein S2 comprises:
s21: acquiring a training image set related to infant care, wherein the training image set comprises a first training image at the first visual angle and a second training image at the second visual angle;
s22: acquiring an image with the color corrected first training image as a second target image, and outputting a loss function according to the color difference between the second target image and the second training image;
S23: training the deep learning model according to the loss function, and outputting the trained deep learning model as a color correction model when the loss function is smaller than a preset threshold value;
s24: and inputting the first real-time image into the color correction model, and outputting the first target image.
3. The method for real-time stitching of binocular images based on weighted fusion strategy according to claim 2, wherein S22 comprises:
s221: acquiring a channel difference value between the second target image and the second training image;
s222: squaring the channel difference value to obtain the square error of each channel;
s223: and accumulating and summing the square errors of all the channels, averaging, and outputting the loss function.
4. The method for real-time stitching of binocular images based on weighted fusion strategy according to claim 1, wherein S3 comprises:
s31: infant characteristic point detection is carried out on the first target image and the second real-time image, and a first characteristic point set in the first target image and a second characteristic point set in the second real-time image are obtained, wherein the infant characteristic points at least comprise: nose key points and mouth key points related to infant face shielding judgment;
S32: performing feature description on the first feature point set and the second feature point set, and outputting a first description subset and a second description subset;
s33: performing feature matching on the first descriptor set and the second descriptor set to obtain matching point pairs;
s34: outputting a homography matrix between the first target image and the second real-time image according to the matching point pairs;
s35: and mapping the first target image according to the homography matrix, and taking the intersection area of the mapping image and the second real-time image as the overlapping area.
5. The method for real-time stitching of binocular images based on weighted fusion strategy according to claim 1, wherein S4 comprises:
s41: acquiring a first depth map corresponding to a first target image and a second depth map corresponding to a second real-time image by utilizing a parallax estimation algorithm;
s42: respectively calculating pixel gradients in the first depth map and pixel gradients in the second depth map to obtain a first gradient map and a second gradient map;
s43: normalizing the first gradient map and the second gradient map to obtain a first normalized depth map and a second normalized depth map;
s44: and carrying out weighted averaging on the first normalized gradient map and the second normalized gradient map to obtain the first weight map and the second weight map.
6. The method for real-time stitching of binocular images based on weighted fusion strategy according to claim 5, wherein S41 comprises:
s411: acquiring a first dense depth map corresponding to a first target image and a second dense depth map corresponding to a second real-time image by utilizing a parallax estimation algorithm;
s412: and carrying out normalization processing on the first dense depth map and the second dense depth map, mapping a first parallax value corresponding to the first dense depth map and a parallax value corresponding to the second dense depth map into a preset parallax range, and outputting the first depth map and the second depth map.
7. The method for real-time stitching of binocular images based on weighted fusion strategy according to claim 5, wherein S44 comprises:
s441: acquiring a preset first weight parameter and a preset second weight parameter;
s442, weighting the first depth map and the first normalized gradient map according to the first weight parameter, and outputting the first weight map;
s443: and carrying out weighting processing on the second depth map and the second normalized gradient map according to the second weight parameter, and outputting the second weight map.
8. A binocular image real-time stitching device based on a weighted fusion strategy, the device comprising:
The image acquisition module is used for acquiring a first real-time image of a first visual angle and a second real-time image of a second visual angle different from the first visual angle under the infant care scene;
the color correction module is used for performing color correction on the first real-time image and outputting a first target image with the same color as the second real-time image;
the matching analysis module is used for carrying out matching analysis on the first target image and the second real-time image and outputting an overlapping area;
the weight map acquisition module is used for acquiring a first depth map corresponding to the first target image and a second depth map corresponding to the second real-time image, carrying out weighted average processing on the first depth map and the second depth map, and outputting the first weight map and the second weight map;
and the image fusion module is used for carrying out weighted fusion processing on the first target image and the second real-time image according to the first weight image and the second weight image, and outputting the fused image as a spliced image.
9. An electronic device, comprising: at least one processor, at least one memory, and computer program instructions stored in the memory, which when executed by the processor, implement the method of any one of claims 1-7.
10. A storage medium having stored thereon computer program instructions, which when executed by a processor, implement the method of any of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311280265.7A CN117291804B (en) | 2023-09-28 | 2023-09-28 | Binocular image real-time splicing method, device and equipment based on weighted fusion strategy |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311280265.7A CN117291804B (en) | 2023-09-28 | 2023-09-28 | Binocular image real-time splicing method, device and equipment based on weighted fusion strategy |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117291804A true CN117291804A (en) | 2023-12-26 |
CN117291804B CN117291804B (en) | 2024-09-13 |
Family
ID=89256851
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311280265.7A Active CN117291804B (en) | 2023-09-28 | 2023-09-28 | Binocular image real-time splicing method, device and equipment based on weighted fusion strategy |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117291804B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117876221A (en) * | 2024-03-12 | 2024-04-12 | 大连理工大学 | Robust image splicing method based on neural network structure search |
CN118279142A (en) * | 2024-06-03 | 2024-07-02 | 四川新视创伟超高清科技有限公司 | Large scene image stitching method and system |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104506828A (en) * | 2015-01-13 | 2015-04-08 | 中南大学 | Halogen-free low-smoke low-toxic flame-resistant epoxy resin system |
CN110660088A (en) * | 2018-06-30 | 2020-01-07 | 华为技术有限公司 | Image processing method and device |
CN111523398A (en) * | 2020-03-30 | 2020-08-11 | 西安交通大学 | Method and device for fusing 2D face detection and 3D face recognition |
CN112634341A (en) * | 2020-12-24 | 2021-04-09 | 湖北工业大学 | Method for constructing depth estimation model of multi-vision task cooperation |
CN113793266A (en) * | 2021-09-16 | 2021-12-14 | 深圳市高川自动化技术有限公司 | Multi-view machine vision image splicing method, system and storage medium |
US20220046218A1 (en) * | 2019-12-17 | 2022-02-10 | Dalian University Of Technology | Disparity image stitching and visualization method based on multiple pairs of binocular cameras |
US20220295035A1 (en) * | 2019-11-29 | 2022-09-15 | Alphacircle Co., Ltd. | Device and method for broadcasting virtual reality images input from plurality of cameras in real time |
US20230043464A1 (en) * | 2020-04-22 | 2023-02-09 | Huawei Technologies Co., Ltd. | Device and method for depth estimation using color images |
CN115830094A (en) * | 2022-12-21 | 2023-03-21 | 沈阳工业大学 | Unsupervised stereo matching method |
CN115937743A (en) * | 2022-12-09 | 2023-04-07 | 武汉星巡智能科技有限公司 | Image fusion-based infant nursing behavior identification method, device and system |
CN115953460A (en) * | 2022-08-09 | 2023-04-11 | 重庆科技学院 | Visual odometer method based on self-supervision deep learning |
CN116612532A (en) * | 2023-05-25 | 2023-08-18 | 武汉星巡智能科技有限公司 | Infant target nursing behavior recognition method, device, equipment and storage medium |
CN116682176A (en) * | 2023-06-01 | 2023-09-01 | 武汉星巡智能科技有限公司 | Method, device, equipment and storage medium for intelligently generating infant video tag |
CN116797640A (en) * | 2023-06-02 | 2023-09-22 | 北京航空航天大学 | Depth and 3D key point estimation method for intelligent companion line inspection device |
-
2023
- 2023-09-28 CN CN202311280265.7A patent/CN117291804B/en active Active
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104506828A (en) * | 2015-01-13 | 2015-04-08 | 中南大学 | Halogen-free low-smoke low-toxic flame-resistant epoxy resin system |
CN110660088A (en) * | 2018-06-30 | 2020-01-07 | 华为技术有限公司 | Image processing method and device |
US20220295035A1 (en) * | 2019-11-29 | 2022-09-15 | Alphacircle Co., Ltd. | Device and method for broadcasting virtual reality images input from plurality of cameras in real time |
US20220046218A1 (en) * | 2019-12-17 | 2022-02-10 | Dalian University Of Technology | Disparity image stitching and visualization method based on multiple pairs of binocular cameras |
CN111523398A (en) * | 2020-03-30 | 2020-08-11 | 西安交通大学 | Method and device for fusing 2D face detection and 3D face recognition |
US20230043464A1 (en) * | 2020-04-22 | 2023-02-09 | Huawei Technologies Co., Ltd. | Device and method for depth estimation using color images |
CN112634341A (en) * | 2020-12-24 | 2021-04-09 | 湖北工业大学 | Method for constructing depth estimation model of multi-vision task cooperation |
CN113793266A (en) * | 2021-09-16 | 2021-12-14 | 深圳市高川自动化技术有限公司 | Multi-view machine vision image splicing method, system and storage medium |
CN115953460A (en) * | 2022-08-09 | 2023-04-11 | 重庆科技学院 | Visual odometer method based on self-supervision deep learning |
CN115937743A (en) * | 2022-12-09 | 2023-04-07 | 武汉星巡智能科技有限公司 | Image fusion-based infant nursing behavior identification method, device and system |
CN115830094A (en) * | 2022-12-21 | 2023-03-21 | 沈阳工业大学 | Unsupervised stereo matching method |
CN116612532A (en) * | 2023-05-25 | 2023-08-18 | 武汉星巡智能科技有限公司 | Infant target nursing behavior recognition method, device, equipment and storage medium |
CN116682176A (en) * | 2023-06-01 | 2023-09-01 | 武汉星巡智能科技有限公司 | Method, device, equipment and storage medium for intelligently generating infant video tag |
CN116797640A (en) * | 2023-06-02 | 2023-09-22 | 北京航空航天大学 | Depth and 3D key point estimation method for intelligent companion line inspection device |
Non-Patent Citations (4)
Title |
---|
ZHANG ZHIHUA 等: "RGB-D saliency detection based on multiple perspectives fusion", 《COMPUTER ENGINEERING AND SCIENCE》, vol. 40, no. 04, 30 April 2018 (2018-04-30) * |
鞠芹 等: "基于多目立体匹配的深度获取方法", 《计算机工程》, no. 14, 20 July 2010 (2010-07-20) * |
首照宇 等: "基于SURF和动态ROI的实时视频拼接", 《计算机工程与设计》, no. 03, 16 March 2013 (2013-03-16) * |
黄敏青;: "基于色彩校正和融合的图像拼接算法", 《现代计算机》, no. 09, 25 March 2020 (2020-03-25) * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117876221A (en) * | 2024-03-12 | 2024-04-12 | 大连理工大学 | Robust image splicing method based on neural network structure search |
CN118279142A (en) * | 2024-06-03 | 2024-07-02 | 四川新视创伟超高清科技有限公司 | Large scene image stitching method and system |
Also Published As
Publication number | Publication date |
---|---|
CN117291804B (en) | 2024-09-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN117291804B (en) | Binocular image real-time splicing method, device and equipment based on weighted fusion strategy | |
CN110197169B (en) | Non-contact learning state monitoring system and learning state detection method | |
CN110473192B (en) | Digestive tract endoscope image recognition model training and recognition method, device and system | |
US9895131B2 (en) | Method and system of scanner automation for X-ray tube with 3D camera | |
US11335456B2 (en) | Sensing device for medical facilities | |
CN113850865A (en) | Human body posture positioning method and system based on binocular vision and storage medium | |
US11823326B2 (en) | Image processing method | |
WO2021259365A1 (en) | Target temperature measurement method and apparatus, and temperature measurement system | |
WO2022109185A1 (en) | Systems and methods for artificial intelligence based image analysis for placement of surgical appliance | |
US11176661B2 (en) | Image processing apparatus and image processing method | |
CN111275754B (en) | Face acne mark proportion calculation method based on deep learning | |
CN116563391B (en) | Automatic laser structure calibration method based on machine vision | |
CN113221815A (en) | Gait identification method based on automatic detection technology of skeletal key points | |
CN114639168B (en) | Method and system for recognizing running gesture | |
Khan et al. | Joint use of a low thermal resolution thermal camera and an RGB camera for respiration measurement | |
CN116664817A (en) | Power device state change detection method based on image difference | |
CN115187550A (en) | Target registration method, device, equipment, storage medium and program product | |
Chakravarty et al. | Machine Learning and Computer Visualization for Monocular Biomechanical Analysis | |
CN112489745A (en) | Sensing device for medical facility and implementation method | |
CN104318265B (en) | Ignore the left and right visual division line localization method of Computer aided decision system in half side space | |
US12138015B2 (en) | Sensing device for medical facilities | |
CN113409312B (en) | Image processing method and device for biomedical images | |
CN116503387B (en) | Image detection method, device, equipment, system and readable storage medium | |
CN111241870A (en) | Terminal device and face image recognition method and system thereof | |
Hajj-Ali | Depth-based Patient Monitoring in the NICU with Non-Ideal Camera Placement |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |