CN117857839A - Video fusion method and system based on eagle eye camera - Google Patents
Video fusion method and system based on eagle eye camera Download PDFInfo
- Publication number
- CN117857839A CN117857839A CN202410045089.7A CN202410045089A CN117857839A CN 117857839 A CN117857839 A CN 117857839A CN 202410045089 A CN202410045089 A CN 202410045089A CN 117857839 A CN117857839 A CN 117857839A
- Authority
- CN
- China
- Prior art keywords
- video
- spliced
- image
- images
- video data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000007500 overflow downdraw method Methods 0.000 title claims abstract description 14
- 238000012544 monitoring process Methods 0.000 claims abstract description 40
- 230000000007 visual effect Effects 0.000 claims abstract description 36
- 238000012545 processing Methods 0.000 claims abstract description 19
- 239000011159 matrix material Substances 0.000 claims description 36
- 230000004927 fusion Effects 0.000 claims description 33
- 238000000034 method Methods 0.000 claims description 32
- 230000008569 process Effects 0.000 claims description 18
- 238000000605 extraction Methods 0.000 claims description 11
- 230000009466 transformation Effects 0.000 claims description 9
- 230000003321 amplification Effects 0.000 claims description 5
- 238000003199 nucleic acid amplification method Methods 0.000 claims description 5
- 230000008030 elimination Effects 0.000 claims description 3
- 238000003379 elimination reaction Methods 0.000 claims description 3
- 238000012216 screening Methods 0.000 claims description 3
- 230000002194 synthesizing effect Effects 0.000 claims description 3
- 230000002452 interceptive effect Effects 0.000 claims 1
- 230000008859 change Effects 0.000 description 11
- 230000000694 effects Effects 0.000 description 7
- 230000009471 action Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 238000001914 filtration Methods 0.000 description 4
- 238000004088 simulation Methods 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000004886 process control Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 230000003213 activating effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000003139 buffering effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000007499 fusion processing Methods 0.000 description 1
- 230000001483 mobilizing effect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/23424—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for inserting or substituting an advertisement
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/106—Processing image signals
- H04N13/156—Mixing image signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/275—Image signal generators from 3D object models, e.g. computer-generated stereoscopic image signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/293—Generating mixed stereoscopic images; Generating mixed monoscopic and stereoscopic images, e.g. a stereoscopic image overlay window on a monoscopic image background
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/44016—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/698—Control of cameras or camera modules for achieving an enlarged field of view, e.g. panoramic image capture
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/95—Computational photography systems, e.g. light-field imaging systems
- H04N23/951—Computational photography systems, e.g. light-field imaging systems by using two or more images to influence resolution, frame rate or aspect ratio
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/18—Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
- H04N7/181—Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast for receiving images from a plurality of remote sources
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computing Systems (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Marketing (AREA)
- Closed-Circuit Television Systems (AREA)
Abstract
The invention provides a video fusion method and a system based on an eagle eye camera, wherein the video fusion method comprises the following steps: s1: collecting video data of a plurality of groups of at least two space orientations in a preset area under the corresponding visual angles; s2: processing video data by a preset streaming media server, and performing multi-channel video splicing on the processed video data under the view angles corresponding to all the spatial directions to obtain a spliced video after combination; s3: and establishing a three-dimensional scene model of the preset area, fusing the spliced video into the three-dimensional scene model to obtain a fused video, and performing display control on the fused video. The invention splices and fuses the real-time monitoring pictures with different visual angles and matches the real-time monitoring pictures with the model in the three-dimensional scene so as to realize panoramic real-time monitoring of the target scene; and the monitoring personnel can more comprehensively and intuitively observe the real-time monitoring information in the virtual scene, thereby greatly improving the monitoring efficiency and accuracy.
Description
Technical Field
The invention relates to the technical field of video monitoring, in particular to a video fusion method and system based on an eagle eye camera.
Background
In conventional video surveillance systems, a person to be monitored needs to find useful information from a plurality of surveillance pictures at different viewing angles.
When the existing global real-time monitoring requirement of a larger space range in a factory is met, monitoring personnel can only conduct emergency treatment by experience, and the situation that handling is not timely and even misjudgment is easy to occur.
Disclosure of Invention
In view of the above drawbacks of the prior art, the present invention is directed to a video fusion method and system based on an eagle eye camera, which are used for solving the problem that in the prior art, when the global real-time monitoring requirement of a larger space range in a factory is faced, monitoring personnel can only perform emergency treatment by experience, so that the situation of untimely and even erroneous judgment is easily caused.
To achieve the above and other related objects, the present invention provides a video fusion method based on an eagle eye camera, comprising the steps of:
s1: collecting video data of a plurality of groups of at least two space orientations in a preset area under the corresponding visual angles;
s2: processing video data by a preset streaming media server, and performing multi-channel video splicing on the processed video data under the view angles corresponding to all the spatial directions to obtain a spliced video after combination;
s3: and establishing a three-dimensional scene model of the preset area, fusing the spliced video into the three-dimensional scene model to obtain a fused video, and performing display control on the fused video.
In an embodiment of the present invention, S1 includes:
first video data under a plurality of groups of visual angles corresponding to the space orientations from top to bottom in a preset area are collected, second video data under a plurality of groups of visual angles corresponding to the space orientations at two sides corresponding to the length direction of the preset area are collected, wherein the angles of the visual angles corresponding to the space orientations at the same side are kept consistent, and corresponding preset positions are set.
In an embodiment of the present invention, in S2, video data under the processed view angles corresponding to each spatial orientation is subjected to multi-path video camera video stitching to obtain a combined stitched video, which includes:
s21: converting video data corresponding to the multiple paths of cameras into multiple frames of continuous image data, and performing image graying and image denoising pretreatment on the image data;
s22: and carrying out image registration on the preprocessed image data to obtain homography matrixes of two images to be spliced on the same plane, controlling the splicing of the two images to be spliced to obtain spliced images, and synthesizing the spliced images into a spliced video.
In an embodiment of the present invention, in S22, image registration is performed on the preprocessed image data to obtain a homography matrix in which two images to be stitched are transformed to the same plane, including:
extracting characteristic points from the preprocessed image data to obtain characteristic points in a target monitoring image corresponding to rotation invariance;
connecting the same characteristic points in the two images to form a matching pair, and acquiring the coordinates and distance information of the matching points corresponding to the characteristic points;
screening out incorrect matching points in the coordinates and distance information of the matching points, and eliminating the incorrect matching pairs corresponding to the matching points to obtain the matching pairs after incorrect matching is eliminated;
based on the matching pair after the mismatching is eliminated, a homography matrix between the two images is estimated.
In an embodiment of the present invention, in S22, feature point extraction is performed on the preprocessed image data by using an image feature extraction (ORB) algorithm, matching point coordinates and distance information corresponding to the feature points are obtained by using a fast approximate nearest neighbor (FLANN) matching method, a mismatching pair corresponding to the matching points is eliminated by using a random sample consensus (RANSAC) algorithm, a matching pair after mismatching is eliminated is obtained, and a homography matrix between two images is estimated.
In an embodiment of the present invention, in S22, controlling to stitch two images to be stitched includes:
based on the obtained homography matrix, converting one image into a view angle corresponding to the spatial orientation corresponding to the other image, fusing through overlapping of weight images to obtain a spliced image, and storing data source information corresponding to the two images, the homography matrix and weight parameters in image transformation through a table.
In an embodiment of the present invention, in S3, performing display control on the fused video includes:
and controlling the amplification of the visual angle corresponding to the corresponding spatial orientation in the fusion video and the adjustment of the visual angle corresponding to the spatial orientation, and returning to a preset position when the control is finished.
The invention also provides a video fusion system based on the eagle eye camera, which comprises: the data acquisition unit acquires video data under the view angles corresponding to at least two spatial orientations of a plurality of groups in a preset area; the video splicing unit is used for presetting a streaming media server to process video data, and carrying out multi-path video camera video splicing on the processed video data under the view angles corresponding to all the space orientations to obtain combined spliced video; and the fusion display unit establishes a three-dimensional scene model of the preset area, fuses the spliced video into the three-dimensional scene model to obtain a fused video, and performs display control on the fused video.
In one embodiment of the present invention, video data includes: first video data collected by a plurality of panoramic cameras under a plurality of groups of view angles corresponding to the top-down spatial directions of a preset area; and second video data acquired by the spherical cameras under a plurality of groups of visual angles corresponding to the space orientations at two sides corresponding to the length direction of the preset area, wherein the angles of the visual angles corresponding to the space orientations of the spherical cameras at the same side are kept consistent, and corresponding preset positions are set.
In one embodiment of the present invention, a fusion display unit includes: the three-dimensional scene module controls the storage, the processing and the updating of the three-dimensional scene model; the fusion module is used for controlling the video stream output by the video splicing unit to register and fuse the scenes in the three-dimensional scene model; the user control module provides an interaction interface between a user and the system and controls adjustment of a visual angle corresponding to a preset space azimuth; and the display module is used for controlling the fusion video to be displayed based on the control adjustment of the user control module.
The invention has the beneficial effects that: the invention provides a video fusion method and a system based on an eagle eye camera. Therefore, the method and the device realize comprehensive management and display of the monitoring video information in the three-dimensional scene, integrate the sub-lens monitoring videos with different positions and different visual angles into the three-dimensional scene model in real time, realize real-time global monitoring of the whole large scene in the monitoring area range, do not need to switch any video recording screen of the video camera, facilitate timely command and handling of various accidents, and greatly improve the practical efficiency of video monitoring. And the eagle eye panoramic camera is added at the top of the factory building, so that the control of the action track of the whole personnel in the factory building is realized under the adjustment treatment of a single camera. The problem that only the action track of the personnel can be seen and the behavior of the personnel is unclear under the condition that only the top cameras are spliced and fused by the cameras at the two sides is effectively solved, so that the personnel safety supervision degree in a factory building is further improved, and the work load of supervision personnel is reduced.
Drawings
Fig. 1 is a flow chart of the video fusion method of the present invention.
Fig. 2 is a flow chart of the video stitching process of the present invention.
Fig. 3 is a flow chart of the video stitching method of the present invention.
Fig. 4 is a block diagram of a video fusion system according to the present invention.
Fig. 5 is a schematic diagram of a preferred embodiment of the field camera spot placement of the video fusion system of the present invention.
FIG. 6 is a block diagram of a video fusion method according to a preferred embodiment of the present invention.
Detailed Description
Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict.
It should be noted that the illustrations provided in the following embodiments merely illustrate the basic concept of the present invention by way of illustration, and only the components related to the present invention are shown in the drawings and are not drawn according to the number, shape and size of the components in actual implementation, and the form, number and proportion of the components in actual implementation may be arbitrarily changed, and the layout of the components may be more complicated.
In the following description, numerous details are set forth in order to provide a more thorough explanation of embodiments of the present invention, it will be apparent, however, to one skilled in the art that embodiments of the present invention may be practiced without these specific details, in other embodiments, well-known structures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring the embodiments of the present invention.
Referring to fig. 1, the invention provides a video fusion method based on an eagle eye camera, comprising the following steps:
s1: collecting video data of a plurality of groups of at least two space orientations in a preset area under the corresponding visual angles;
s2: processing video data by a preset streaming media server, and performing multi-channel video splicing on the processed video data under the view angles corresponding to all the spatial directions to obtain a spliced video after combination;
s3: and establishing a three-dimensional scene model of the preset area, fusing the spliced video into the three-dimensional scene model to obtain a fused video, and performing display control on the fused video.
Through the above, it is not difficult to find that when a preset area with a larger space range, such as a factory, is monitored in global real time, video data under the view angles corresponding to different space orientations are acquired through monitoring under the view angles corresponding to different space orientations. The viewing angles corresponding to the spatial orientations also correspond to a plurality of viewing angles, that is, the viewing angles corresponding to the spatial orientations are video acquisition viewing angles under different spatial orientations, for example, the viewing angles corresponding to the spatial orientations along the two sides in the length direction, the two sides in the width direction, the two sides in the height direction and the like of the preset area are monitored. In addition, multiple groups of visual angles corresponding to the corresponding spatial orientations are also arranged along the two sides of the preset area in the length direction, the two sides of the width direction and the height direction so as to obtain more shooting ranges. After obtaining video data under the view angles corresponding to the spatial orientations, processing works including video data analysis, caching, forwarding and the like are carried out on the video data through a preset streaming media server. On one hand, the stability of video data acquired by the corresponding cameras under the view angles corresponding to the corresponding space directions is guaranteed, and on the other hand, when the client configuration requirement is higher and general computer equipment cannot meet the conditions, the traditional multi-view monitoring can be realized through the streaming media server, so that the utilization degree of the corresponding cameras under the view angles corresponding to the corresponding space directions is effectively improved, and the stability of the system is further improved. In addition, the acquired video data can be subjected to functions such as AI personnel behavior analysis and the like, so that personnel safety standards in a preset area are further improved. After processing the video data by using a preset streaming media server, performing multi-channel video camera video stitching on the processed video data under the view angles corresponding to the spatial orientations, thereby realizing the combined stitching of the video data under the view angles corresponding to the different spatial orientations. And finally, carrying out three-dimensional scene modeling according to the specific condition of the preset area so as to realize three-dimensional simulation restoration of the factory. And fusing the combined spliced video into the three-dimensional scene model to obtain a final display result, namely fusing the video. And the monitoring personnel can adjust the visual angles corresponding to the corresponding space orientations, so that the video display effect after the visual angles corresponding to the space orientations are adjusted is adjusted.
In step S1, it includes:
first video data under a plurality of groups of visual angles corresponding to the space orientations from top to bottom in a preset area are collected, second video data under a plurality of groups of visual angles corresponding to the space orientations at two sides corresponding to the length direction of the preset area are collected, wherein the angles of the visual angles corresponding to the space orientations at the same side are kept consistent, and corresponding preset positions are set.
In an embodiment of the present invention, when capturing video data, the spatial orientation may include a video data capturing view angle from top to bottom in the preset area, a video data capturing view angle from one side to the other side in the length direction of the preset area, and the like. Of course, the video data acquisition viewing angle from the bottom side to the top side of the preset area is not limited. In order to better perform global real-time monitoring on the factory building, first video data under the view angles of a plurality of groups are acquired by utilizing the overlooking space orientation from top to bottom, and second video data is obtained under the view angles of a plurality of groups from the left side to the right side and from the right side to the left side of the length direction of the factory building. And then by utilizing the first video data, the second video data, and the first video data and the second video data to perform splicing and fusion. Specifically, the eagle eye camera is added at the top end of the factory building, and the ball machines are added at the two sides of the factory building, so that monitoring personnel can more comprehensively and intuitively observe real-time monitoring information in a virtual scene, and the monitoring efficiency and accuracy are greatly improved. Specifically, when the viewing angle is adjusted, for example, when the dome camera is used, the spherical cameras which are installed at the same side and have the same height are required to be arranged on the same horizontal line, and the shooting directions are parallel to each other. When the preset position is set, that is, the spherical cameras on the same side, the center of the picture shot by the cameras is required to be on a horizontal line, and the shooting directions are in positions corresponding to the parallel shooting directions. The effect of the final video stitching is affected by the fact that the camera is used in other ways, which easily causes deviations in the shooting angle. Therefore, the preset bit is set, so that the device can be conveniently used in the later stage. Of course, if the operation such as angle switching of the angle of view of the camera is not required, the preset bit may not be set.
In step S2, video data under the processed view angle corresponding to the corresponding spatial orientation is subjected to multi-path video splicing by a video camera, so as to obtain a spliced video after combination, which includes:
s21: converting video data corresponding to the multiple paths of cameras into multiple frames of continuous image data, and performing image graying and image denoising pretreatment on the image data;
s22: and carrying out image registration on the preprocessed image data to obtain homography matrixes of two images to be spliced on the same plane, controlling the splicing of the two images to be spliced to obtain spliced images, and synthesizing the spliced images into a spliced video.
In one embodiment of the present invention, when video data is spliced, multiple frames of continuous image data are converted by video data captured by multiple cameras. In particular, a multiple camera, i.e. a multiple camera made up of a plurality of cameras corresponding to several groups of at least two spatially oriented viewing angles. After the image data is converted, the gray processing is carried out on the image data, so that the data volume of subsequent processing can be simplified, and the speed of feature extraction can be effectively improved. For example, in a factory building, light changes can be generated in the daytime and at night, and interference caused by the light changes can be avoided after the gray-scale treatment. In an actual environment, due to limitations of environmental factors, equipment performance and the like, video data acquired by a camera often introduces noise into an image, and the quality of video stitching is negatively affected. Therefore, through denoising processing, the influence of noise on the splicing effect can be reduced, and the quality of the video is improved. After preprocessing the image data, the image registration of the image data is carried out, so that two images to be spliced are converted to the same plane, splicing of the images is realized, and spliced video is formed by the spliced images. It is worth noting that under the scene that the camera position is fixed and kept unchanged, the homography matrix is calculated at one time by using the image registration algorithm and is used as a fixed parameter for storage, and the parameters can be directly read by subsequent image stitching, so that the time consumed by subsequent video stitching is reduced.
In step S22, image registration is performed on the preprocessed image data to obtain a homography matrix in which two images to be spliced are transformed to the same plane, including:
extracting characteristic points from the preprocessed image data to obtain characteristic points in a target monitoring image corresponding to rotation invariance;
connecting the same characteristic points in the two images to form a matching pair, and acquiring the coordinates and distance information of the matching points corresponding to the characteristic points;
screening out incorrect matching points in the coordinates and distance information of the matching points, and eliminating the incorrect matching pairs corresponding to the matching points to obtain the matching pairs after incorrect matching is eliminated;
based on the matching pair after the mismatching is eliminated, a homography matrix between the two images is estimated.
In an embodiment of the present invention, in the process of registering image data, feature points need to be extracted and detected first. Each feature point information has unique identity and characteristics (namely rotation non-deformation characteristics), and the extraction of the feature points is helpful for simplifying image data and extracting key information. And then, matching the characteristic points, namely, connecting the same characteristic points in the two images in a certain form to form a matching pair, so that the matching pair can be used for searching information association between the two images, and is convenient for later splicing operation. By filtering out mismatching, namely, in the characteristic point matching process, the wrong matching points can be screened out from the matching results through a set algorithm and rule, so that the accuracy and reliability of characteristic point matching are improved, and the matching efficiency is improved. The spliced image is realized by image transformation, that is, by converting one of the images into a second image view according to the obtained homography matrix and then fusing, and the process comprises a plurality of weight parameters. Specifically, the homography matrix estimation is to use the feature point matching pair after the mismatching is filtered out to estimate the homography matrix between two images. Through the obtained homography matrix, all points on one image are converted into view angles of the other image, and then fusion is achieved through simple weight image superposition, so that a panoramic image after splicing is obtained.
Preferably, in step S22, feature point extraction is performed on the preprocessed image data by using an image feature extraction (ORB) algorithm, matching point coordinates and distance information corresponding to the feature points are obtained by using a fast nearest neighbor (FLANN) matching method, a mismatching pair corresponding to the matching points is eliminated by using a random sample consensus (RANSAC) algorithm, a matching pair after mismatching elimination is obtained, and a homography matrix between two images is estimated. The image feature extraction (ORB, oriented FAST and Rotsted BRIEF) algorithm converts the image after the graying and denoising processes into feature points with rotation invariance. The fast approximate nearest neighbor (Fast Library For Approximate Nearest Neighbors, FLANN) matching method can achieve the acquisition of the coordinates and distance information of the matching points, and facilitates the later splicing operation. The random sample consensus (RANSAC, random Sample Consensus) is used to eliminate the mismatch pairs and improve the matching efficiency.
In step S22, the step of controlling to splice two images to be spliced includes:
based on the obtained homography matrix, converting one image into a view angle corresponding to the spatial orientation corresponding to the other image, fusing through overlapping of weight images to obtain a spliced image, and storing data source information corresponding to the two images, the homography matrix and weight parameters in image transformation through a table.
In an embodiment of the invention, when the splicing processing between the image data is performed, the two images to be spliced are converted into homography matrixes on the same plane by solving the homography matrix, so that the two images are further converted into the same plane, one image is converted into a view angle corresponding to the spatial orientation corresponding to the other image, the images are fused through overlapping the weighted images, the spliced images are obtained, and then the spliced images are recombined into the spliced video. And when the image calibration is carried out, the data source information, the homography matrix and the weight parameters in the image transformation corresponding to the two images are stored as a table. Thus, the data source information of the two images is used as a query condition, and the contents in the table are searched. And under the condition that the video image is not greatly changed in the follow-up process, image splicing can be directly realized only according to the homography matrix and the changed weight parameters, so that the time spent by an algorithm is greatly reduced, and the video fluency is improved.
Specifically, in step S3, performing display control on the fused video includes:
and controlling the amplification of the visual angle corresponding to the corresponding spatial orientation in the fusion video and the adjustment of the visual angle corresponding to the spatial orientation, and returning to a preset position when the control is finished.
In an embodiment of the invention, the whole picture after the multi-path video splicing, namely the spliced video, is displayed in a three-dimensional scene model such as a three-dimensional map and the like, so that the overall process control of the position by the supervisory personnel and the personnel position tracking of the overall process are realized. For a specific position, the optical amplification function can be performed by mobilizing a single dome camera, or the camera can be rotated to a visual angle through a tripod head, and the camera returns to a preset position after the end.
In a preferred embodiment shown in fig. 2, when the video fusion process is performed, video data at each corresponding view angle is obtained according to multiple views at different spatial orientations. And processes the video data into successive image frames. And carrying out gray level processing and denoising processing on the image frame to obtain a preprocessed image. And then the preprocessed image is subjected to four processes of image registration, such as feature point extraction, feature point matching, false matching pair filtering, image transformation and the like. And estimating the homography matrix between the two images according to the feature point matching pair after the mismatching is filtered. All points on one image are converted into view angles of the other image through the homography matrix, and then fusion is achieved through simple weight image superposition, so that panoramic images after splicing are obtained, and image frame splicing is achieved. The stitching of the images is done immediately after the stitching of the image frames. Specifically, when image registration is performed, the image after the graying treatment is converted into a feature point with rotation invariance through a ORB (Oriented FAST and Rotsted BRIEF) algorithm; the coordinates and the distance information of the matching points are obtained by a quick approximate nearest neighbor (Fast Library For Approximate Nearest Neighbors, FLANN for short) matching method, so that the later splicing operation is convenient; and the mismatching pair is eliminated through a RANSAC (Random Sample Consensus) algorithm, so that the matching efficiency is improved. The homography matrix solved by the matching pair is obtained through the RANSAC algorithm, and the homography matrix is generally used for describing the transformation relation of some points in the same plane between two images. Finally, the homography matrix which transforms two images to be spliced into the same plane is solved, so that the two images can be transformed into the same plane, and the splicing is completed. In addition, under the scene that the camera position is fixed and kept unchanged, the homography matrix can be calculated at one time by using an image registration algorithm and used as a fixed parameter to be stored, and the parameters can be directly read by subsequent image stitching, so that the time consumed by subsequent video stitching is reduced.
In a preferred embodiment shown in fig. 3, in the process of video stitching, the corresponding image data is obtained by making each of the input video stream 1, the input video stream 2, the input video stream 3, and the like corresponding to the video data into the corresponding video frame 1, the video frame 2, the video frame 3, and the like. By judging whether the video stitching is completed or not, that is, whether the video stitching is performed for the first time or not is judged by judging whether the video stitching is performed for the first time or not. If one round of video splicing is completed and the storage of the related parameters is completed, confirming that the initialization is completed. When the initialization is not completed, the characteristic points in the target monitoring image are detected through the image registration processing such as detection, matching, filtering out mismatching pairs, homography matrix estimation, image change and the like of the characteristic points, and each characteristic point has unique identity and characteristics, so that simplification of image data and extraction of key information are facilitated. And then the same characteristic points in the two images are connected in a certain form through characteristic point matching, so that a matching pair is formed, and information association between the two images is searched. In the characteristic point matching process, the wrong matching points are screened out from the matching results through a certain algorithm and rule so as to achieve the purpose of filtering out the wrong matching, thereby improving the accuracy and reliability of the characteristic point matching. And then the feature point matching pairs with the mismatching filtered are used for estimating the homography matrix between the two images. The points on one image can be completely converted into the view angles of the other image through the homography matrix, and then fusion is realized through simple weight image superposition, so that the panoramic image after splicing is obtained. And converting one image into a second image view angle through image transformation, namely through the obtained homography matrix, and then fusing to realize the spliced image, wherein the process also comprises a plurality of weight parameters. The data source information, homography matrix and weight parameters in image transformation of two pictures are stored as a table through a lookup table; and the data source information of the two pictures is used as a query condition, and the contents in the table are searched. And if the video image is not changed greatly in the follow-up process, the image splicing is directly realized only according to the homography matrix and the changed weight parameters, so that the time spent by an algorithm is greatly reduced, and the video fluency is improved. And finally, processing the spliced video frames into video streams and outputting the video streams. And when the initialization is not completed, judging whether to update, namely judging whether the scene shot by the camera has large-scale change, and when the scene has large-scale change, updating parameters in a table to realize the video splicing effect.
Specifically, when the update determination is performed, it is necessary to determine whether the scene has changed, and the determination operation is performed at the software level. For example, in a factory building, the change of light formed in the daytime and at night may not involve updating, because the interference generated by the change of light is eliminated after the gray-scale treatment. Another type of update movement, mainly for large devices, will result in a large change in the picture, where an update is considered necessary. An image comparison algorithm is generally adopted, similarity is calculated, and a threshold value is set according to requirements to judge. In practical application, each frame of picture is not judged, and the time interval can be set to be 1 minute, so that the requirement on the performance of a computer is effectively reduced. Specifically, the method may be to judge the picture every minute, judge whether a scene change of a large extent occurs, and if the degree of change is relatively large, update the scene, and re-perform the flow of the stitching algorithm.
As shown in fig. 4, a video fusion system based on an eagle eye camera includes: the data acquisition unit acquires video data under the view angles corresponding to at least two spatial orientations of a plurality of groups in a preset area; the video splicing unit is used for presetting a streaming media server to process video data, and carrying out multi-path video camera video splicing on the processed video data under the view angles corresponding to all the space orientations to obtain combined spliced video; and the fusion display unit establishes a three-dimensional scene model of the preset area, fuses the spliced video into the three-dimensional scene model to obtain a fused video, and performs display control on the fused video.
In an embodiment of the present invention, the data acquisition unit is configured to implement video data under several sets of viewing angles corresponding to different spatial orientations. And video data corresponding to the real-time monitoring pictures with different visual angles are spliced and fused through the video splicing unit, so that a spliced video after combination is obtained. And then the combined spliced video is subjected to fusion display unit to realize model matching in the three-dimensional scene model, so that panoramic real-time monitoring of the target scene is realized. And by utilizing the multi-group visual angle monitoring under the multi-space azimuth, monitoring personnel can more comprehensively and intuitively observe real-time monitoring information in a virtual scene, and further, the monitoring efficiency and accuracy are greatly improved.
Further, the video data includes: first video data collected by a plurality of panoramic cameras under a plurality of groups of view angles corresponding to the top-down spatial directions of a preset area; and second video data acquired by the spherical cameras under a plurality of groups of visual angles corresponding to the space orientations at two sides corresponding to the length direction of the preset area, wherein the angles of the visual angles corresponding to the space orientations of the spherical cameras at the same side are kept consistent, and corresponding preset positions are set.
In an embodiment of the invention, by adding the eagle eye panoramic camera at the top of a preset area such as a factory building, the problem that the movement track of a person is difficult to accurately obtain under the condition of a single camera can be solved, and the control of the movement track of the whole person in the factory building is further realized. And the problem that only the action track of the personnel can be seen and the action of the personnel is not clear under the condition of only the top camera is effectively solved by splicing and fusing the dome cameras on two sides of the factory building. Thereby further improving the personnel safety supervision degree in the factory building and relieving the work load of the supervision personnel.
Referring to fig. 5, a preferred embodiment of the camera arrangement may be provided with two panoramic hawk-eye cameras with a view angle of 1600 ten thousand pixels and 180 degrees in a cuboid factory building with a size of 100×50×40, that is, in a region 20 m near the edge between the top of two ends of the factory building. The heights of the cameras are kept as consistent as possible by arranging 10 ball cameras with 400 ten thousand pixels at intervals of 6-10 meters at the two sides of the long side of the factory building, wherein the heights of the ball cameras are about 30-36 meters. The brand of the camera adopts a sea-health camera, and is accessed to the server in a wired mode. Before the camera is installed, the camera is activated through a sea-health special configuration platform, specific IP, account numbers and passwords are set for the camera according to the existing IP distribution habit, and the camera is streamed by configuring an RTSP (real-time streaming protocol, real Time Streaming Protocol) address at a streaming media server. Through adjusting the rifle bolt angle, guarantee that one side ball machine angle keeps unanimous and set up corresponding preset position, make things convenient for the later stage to carry out video concatenation and adjustment. In the process of activating the camera, a unified NTP timing server address needs to be set, and the accuracy and consistency of the camera time are ensured to a certain extent.
Still further, as shown in fig. 6, the fusion display unit includes: the three-dimensional scene module controls the storage, the processing and the updating of the three-dimensional scene model; the fusion module is used for controlling the video stream output by the video splicing unit to register and fuse the scenes in the three-dimensional scene model; the user control module provides an interaction interface between a user and the system and controls adjustment of a visual angle corresponding to a preset space azimuth; and the display module is used for controlling the fusion video to be displayed based on the control adjustment of the user control module.
In an embodiment of the present invention, when fusion display of video data and a three-dimensional scene model is performed, a certain change, such as movement of a machine, will occur in a factory in practical application, and at this time, the corresponding three-dimensional model needs to be rearranged in the three-dimensional scene, so as to implement storage, processing and updating of the three-dimensional scene model through the three-dimensional scene module. Specifically, when updating, it is necessary to determine whether a scene shot by the camera has a large-scale change, and when there is a large-scale change, it is necessary to update parameters in the table to achieve the video stitching effect. After the three-dimensional scene model is obtained, the fusion module is used for fusing the spliced video into the three-dimensional scene model. And then the user control module is used for realizing the overall process control of the supervisory personnel on the position, so that personnel position tracking is carried out on the whole process. For a specific position, a single dome camera can be mobilized to perform an optical amplification function, or the camera can return to a preset position after finishing through rotating the view angle of the cradle head.
Specifically, during video fusion, the construction of a three-dimensional model in a factory building can be formed according to the existing conditions by considering two modes of forward modeling and reverse modeling. The three-dimensional simulation reduction of the factory building can be performed according to the existing CAD drawing of the factory building. The spliced video is assumed to be shot by a virtual camera, and the simulation spliced video is formed by projecting the camera along a view cone. And constructing the effect of fusing the video and the three-dimensional scene by adopting a manual registration mode of the view cone, and finally displaying.
In a preferred embodiment shown in fig. 6, when video data is acquired, data acquisition is performed through a panoramic camera installed at the top of a factory building and spherical cameras on two sides of the factory building, and video data from the camera is processed at a server side through a streaming media server, including processing operations such as analysis, buffering and forwarding of the video data, on one hand, stability of the video data acquired by the camera is guaranteed, on the other hand, because the scheme has higher requirements on client configuration, traditional multi-view monitoring can be realized through the streaming media server under the condition that a general computer does not meet the conditions, the utilization degree of the camera is effectively improved, and the stability of a system is further improved. In addition, the acquired video data can be subjected to functions such as AI personnel behavior analysis and the like, so that personnel safety standards in a factory building are further improved. And converting the acquired video data into video frames through a video module, splicing the image data, and performing video splicing after the spliced image is obtained. And meanwhile, the three-dimensional scene module performs three-dimensional simulation reduction of the factory building according to the CAD drawing of the existing factory building to obtain a three-dimensional scene model. And then, fusing the spliced video into the three-dimensional scene model through a fusion module. The user can realize the operations of moving, rotating and the like of the visual angle through the mouse through the user control module, and the area is enlarged and reduced, so that the monitoring requirement is met. Through the user control module, the user can flexibly adjust the observed visual angle, and browse and search the interested area in the three-dimensional scene. And the control display of the fusion video is carried out by a display module, namely a computer display (for man-machine interaction and program display), such as screen throwing, large screen display and the like.
In summary, the invention acquires video data by using a plurality of view angles in a plurality of spatial orientations, and splices video data based on the view angles to obtain a spliced video. And fusing the spliced video into the three-dimensional scene model which is continuously stored, processed and updated, so as to realize the fusion and the control display of the spliced video in the three-dimensional scene model. By the method, the monitoring video information can be comprehensively managed and displayed in the three-dimensional scene, and the shot monitoring videos with different positions and different visual angles are fused into the three-dimensional scene model in real time. The real-time global monitoring of the whole large scene of the monitoring area range is realized, any video recording screen of the camera is not required to be switched, various accidents are conveniently commanded and handled in time, the practical efficiency of video monitoring is greatly improved, and particularly, multiple groups are respectively installed under the space orientations of the top of the preset area and the two sides of the length direction through the eagle eye panoramic camera and the dome camera. The eagle eye panoramic camera at the top can realize the control of the action track of the whole person in the preset areas such as a factory building. And the problem that only the action track of the personnel can be seen and the behavior of the personnel is unclear under the condition that only the top cameras are spliced and fused by the cameras at the two sides is effectively solved, so that the personnel safety supervision degree in a factory building is further improved, and the work load of supervision personnel is reduced. Therefore, the invention effectively overcomes various defects in the prior art and has high industrial utilization value.
The above embodiments are merely illustrative of the principles of the present invention and its effectiveness, and are not intended to limit the invention. Modifications and variations may be made to the above-described embodiments by those skilled in the art without departing from the spirit and scope of the invention. Accordingly, it is intended that all equivalent modifications and variations of the invention be covered by the claims, which are within the ordinary skill of the art, be within the spirit and scope of the present disclosure.
Claims (10)
1. The video fusion method based on the eagle eye camera is characterized by comprising the following steps of:
s1: collecting video data of a plurality of groups of at least two space orientations in a preset area under the corresponding visual angles;
s2: processing video data by a preset streaming media server, and performing multi-path video camera video splicing on the processed video data under the view angles corresponding to the spatial orientations to obtain combined spliced video;
s3: and establishing a three-dimensional scene model of the preset area, fusing the spliced video into the three-dimensional scene model to obtain a fused video, and performing display control on the fused video.
2. The eagle eye camera based video fusion method according to claim 1, characterized in that: in the step S1, the method includes:
and acquiring first video data under the view angles of a plurality of groups corresponding to the space azimuth from top to bottom in the preset area, and acquiring second video data under the view angles of a plurality of groups corresponding to the space azimuth on two sides corresponding to the length direction of the preset area, wherein the angles of the view angles corresponding to the space azimuth on the same side are kept consistent, and corresponding preset positions are set.
3. The eagle eye camera based video fusion method according to claim 1, characterized in that: in the step S2, video data under the processed view angles corresponding to the spatial orientations are subjected to video stitching of multiple paths of cameras, so as to obtain a combined stitched video, which comprises the following steps:
s21: converting the video data corresponding to the multiple paths of cameras into multi-frame continuous image data, and performing image graying and image denoising pretreatment on the image data;
s22: and carrying out image registration on the preprocessed image data to obtain homography matrixes obtained by converting two images to be spliced into the same plane, controlling the splicing of the two images to be spliced to obtain spliced images, and synthesizing the spliced images into the spliced video.
4. A method of video fusion based on an eagle eye camera according to claim 3, characterized in that: in S22, performing image registration on the preprocessed image data to obtain a homography matrix in which two images to be spliced are transformed to the same plane, where the homography matrix includes:
extracting characteristic points from the preprocessed image data to obtain the characteristic points in the target monitoring image corresponding to rotation invariance;
connecting the same characteristic points in the two images to form a matching pair, and acquiring matching point coordinates and distance information corresponding to the characteristic points;
screening out incorrect matching points in the coordinates and distance information of the matching points, and eliminating the incorrect matching pairs corresponding to the matching points to obtain the matching pairs after incorrect matching is eliminated;
and estimating the homography matrix between the two images based on the matching pair after the mismatching is eliminated.
5. The eagle eye camera based video fusing method of claim 4, wherein: in S22, feature point extraction is performed on the preprocessed image data by an image feature extraction (ORB) algorithm, matching point coordinates and distance information corresponding to the feature points are obtained by a fast approximate nearest neighbor (FLANN) matching method, a mismatching elimination pair corresponding to the matching points is eliminated by a random sample consensus (RANSAC) algorithm, a mismatching elimination matching pair is obtained, and the homography matrix between two images is estimated.
6. The eagle eye camera based video fusing method of claim 4, wherein: in S22, the step of controlling the splicing of the two images to be spliced includes:
based on the obtained homography matrix, converting one image into a view angle corresponding to the spatial orientation corresponding to the other image, fusing through overlapping of weight images to obtain a spliced image, and storing data source information corresponding to the two images, the homography matrix and weight parameters in image transformation through a table.
7. The eagle eye camera based video fusing method of claim 6, wherein: in the step S3, performing display control on the fused video includes:
and controlling the amplification of the visual angle corresponding to the spatial orientation and the adjustment of the visual angle corresponding to the spatial orientation in the fusion video, and returning to a preset position when the control is finished.
8. A eagle eye camera-based video fusion system, comprising:
the data acquisition unit acquires video data under the view angles corresponding to at least two spatial orientations of a plurality of groups in a preset area;
the video splicing unit is used for presetting a streaming media server to process video data, and performing multi-channel video splicing on the processed video data under the view angles corresponding to the spatial orientations to obtain combined spliced video; and
and the fusion display unit establishes a three-dimensional scene model of the preset area, fuses the spliced video into the three-dimensional scene model to obtain a fusion video, and performs display control on the fusion video.
9. The eagle eye camera based video fusing system of claim 8, wherein the video data comprises: first video data collected by a plurality of panoramic cameras under a plurality of groups of view angles corresponding to the space orientation from top to bottom in the preset area; and second video data acquired by a plurality of spherical cameras under a plurality of groups of visual angles corresponding to the space orientations at two sides corresponding to the length direction of the preset area, wherein the angles of the visual angles corresponding to the space orientations at the same side of the spherical cameras are kept consistent, and corresponding preset bits are set.
10. The eagle eye camera based video fusing system of claim 8, wherein the fused display unit comprises:
the three-dimensional scene module is used for controlling the storage, the processing and the updating of the three-dimensional scene model;
the fusion module is used for controlling the video stream output by the video splicing unit to register and fuse the scenes in the three-dimensional scene model;
the user control module provides an interactive interface between a user and the system and controls adjustment of a visual angle corresponding to the preset space orientation; and
and the display module is used for controlling the fusion video to be displayed based on the control adjustment of the user control module.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410045089.7A CN117857839A (en) | 2024-01-11 | 2024-01-11 | Video fusion method and system based on eagle eye camera |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410045089.7A CN117857839A (en) | 2024-01-11 | 2024-01-11 | Video fusion method and system based on eagle eye camera |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117857839A true CN117857839A (en) | 2024-04-09 |
Family
ID=90537998
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410045089.7A Pending CN117857839A (en) | 2024-01-11 | 2024-01-11 | Video fusion method and system based on eagle eye camera |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117857839A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118555462A (en) * | 2024-07-26 | 2024-08-27 | 中国科学院青藏高原研究所 | Bionic eagle eye monitoring equipment |
-
2024
- 2024-01-11 CN CN202410045089.7A patent/CN117857839A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118555462A (en) * | 2024-07-26 | 2024-08-27 | 中国科学院青藏高原研究所 | Bionic eagle eye monitoring equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112053446B (en) | Real-time monitoring video and three-dimensional scene fusion method based on three-dimensional GIS | |
CN102148965B (en) | Video monitoring system for multi-target tracking close-up shooting | |
CN110536074B (en) | Intelligent inspection system and inspection method | |
CN117857839A (en) | Video fusion method and system based on eagle eye camera | |
CN105872496B (en) | A kind of ultra high-definition video fusion method | |
CN105516656A (en) | Article real condition viewing method and system | |
CN112419233B (en) | Data annotation method, device, equipment and computer readable storage medium | |
CN111242025A (en) | Action real-time monitoring method based on YOLO | |
CN114442805A (en) | Monitoring scene display method and system, electronic equipment and storage medium | |
CN112207821B (en) | Target searching method of visual robot and robot | |
CN111083438B (en) | Unmanned inspection method, system and device based on video fusion and storage medium | |
JP7043601B2 (en) | Methods and devices for generating environmental models and storage media | |
CN109889777A (en) | The switching methods of exhibiting and system of 3D outdoor scene vision monitoring | |
CN112261293A (en) | Remote inspection method and device for transformer substation and electronic equipment | |
CN114286062A (en) | Automatic wharf digital cabin system based on panoramic stitching and video AI | |
CN111429518A (en) | Labeling method, labeling device, computing equipment and storage medium | |
CN109636763A (en) | A kind of intelligence compound eye monitoring system | |
CN115294207A (en) | Fusion scheduling system and method for smart campus monitoring video and three-dimensional GIS model | |
CN103136739B (en) | Controlled camera supervised video and three-dimensional model method for registering under a kind of complex scene | |
CN113627005A (en) | Intelligent visual monitoring method | |
CN112558761A (en) | Remote virtual reality interaction system and method for mobile terminal | |
CN112561795A (en) | Spark and OpenCV-based real-time panoramic image generation implementation system and method | |
Huang et al. | Design and application of intelligent patrol system based on virtual reality | |
KR100321904B1 (en) | An apparatus and method for extracting of camera motion in virtual studio | |
KR101996907B1 (en) | Apparatus for tracking object |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |