CN116996742A - Video fusion method and system based on three-dimensional scene - Google Patents
Video fusion method and system based on three-dimensional scene Download PDFInfo
- Publication number
- CN116996742A CN116996742A CN202310884577.2A CN202310884577A CN116996742A CN 116996742 A CN116996742 A CN 116996742A CN 202310884577 A CN202310884577 A CN 202310884577A CN 116996742 A CN116996742 A CN 116996742A
- Authority
- CN
- China
- Prior art keywords
- video image
- standard
- video
- coordinates corresponding
- points
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000007500 overflow downdraw method Methods 0.000 title claims abstract description 10
- 230000004927 fusion Effects 0.000 claims abstract description 44
- 230000000694 effects Effects 0.000 claims abstract description 16
- 238000000034 method Methods 0.000 claims description 29
- 238000009877 rendering Methods 0.000 claims description 19
- 238000013507 mapping Methods 0.000 claims description 9
- 230000006870 function Effects 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 abstract description 12
- 238000004422 calculation algorithm Methods 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 5
- 238000012544 monitoring process Methods 0.000 description 4
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000004590 computer program Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000003139 buffering effect Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000000691 measurement method Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000035515 penetration Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/44016—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/44012—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving rendering scenes according to scene graphs, e.g. MPEG-4 scene graphs
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/81—Monomedia components thereof
- H04N21/816—Monomedia components thereof involving special video data, e.g 3D video
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Processing Or Creating Images (AREA)
Abstract
The application discloses a video fusion method based on a three-dimensional scene, which has higher efficiency than a mode of calibrating a camera or manually adjusting virtual camera parameters to fuse, does not need a carrier, and greatly improves the efficiency of video fusion. The application is a brand new video fusion technology. Compared with some video fusion technologies on the prior market, many of the prior video fusion technologies have the problems of complex operation, ideal applicable scene, large limitation, poor fusion effect and the like. The video fusion technology can automatically, quickly and accurately fuse a video to be fused only by selecting four pairs of standard point coordinates (comprising the pixel coordinates of the video image and the world coordinates corresponding to the three-dimensional live-action model), has good fusion effect, and greatly reduces the cost of fusing the video to the three-dimensional live-action model.
Description
Technical Field
The application relates to the technical field of image processing, in particular to a video fusion method and system based on a three-dimensional scene.
Background
The video fusion technology plays an important role in the field of digital twinning in smart cities. The method can meet the requirement of projecting the real-time monitoring video onto the three-dimensional real-scene model data in the business scene of the smart city, thereby achieving the effect of virtual-real fusion, and being widely used in the fields of security, unmanned inspection and the like. How to automatically or semi-automatically project the video onto the three-dimensional live-action model data is the first step of achieving the video fusion effect and is the most critical step. Currently, a few video fusion technologies exist in the market, for example, in the prior art, chinese patent 202211528984.1 discloses a video fusion method, a device, an electronic apparatus and a storage medium, where the method adopted is to load a three-dimensional model in the GIS system to construct a virtual scene similar to reality; projecting the real-time monitoring video to a GIS system; and carrying out irregular clipping on the real-time monitoring video, and fusing the clipped real-time monitoring video into the constructed virtual scene.
However, the above method is limited to the shape of the three-dimensional model when video fusion occurs, and the problems of video penetration model and video repetition easily occur, which results in poor user experience.
Disclosure of Invention
The video fusion technology of the application is that standard points are sampled through video key frames, then the position and the gesture of a virtual camera of a video in a live-action three-dimensional scene are calculated, and then video streams are projected into the live-action three-dimensional scene according to the position and the gesture of the virtual camera, thereby realizing the effect of video fusion.
The present application aims to solve at least one of the technical problems existing in the prior art. Therefore, the application discloses a video fusion method based on a three-dimensional scene, which comprises the following steps:
step 1, acquiring a preset video image sequence, intercepting a single-frame video image at a preset position from the video image sequence, initializing image coordinates on the single-frame video image, and selecting a plurality of standard points at the preset coordinate positions on the video image;
step 2, a three-dimensional real-scene model to be fused with the video image sequence is obtained, a coordinate mapping relation between the three-dimensional real-scene model and the video image to be fused is established, and world coordinates corresponding to the standard points are determined from the three-dimensional real-scene model;
step 3, making a connection line from world coordinates corresponding to a first standard point of the video image to world coordinates corresponding to a second standard point of the video image in the three-dimensional live-action model, generating a preset number of interpolation points at vertical intervals on an extension line of the connection line according to a first preset length interval, placing the positions of the video fused virtual cameras on the interpolation points and facing the positions of the world coordinates corresponding to the first standard point, then executing rendering operation, and storing rendered data in a frame buffer;
step 4, obtaining screen coordinates corresponding to other standard points except the first standard point in the frame buffer, obtaining the screen coordinates corresponding to the other standard points through Euclidean distance method, comparing the screen coordinates corresponding to the other standard points with preset coordinate positions of the standard points on the video image, and taking the interpolation point with the highest similarity degree as the first temporary position of the virtual camera;
step 5, making a connection line from world coordinates corresponding to a first standard point of the video image to world coordinates corresponding to a second standard point of the video image in the three-dimensional live-action model, generating a preset number of interpolation points at vertical intervals again along the connection line direction by taking the first temporary position as a center according to horizontal intervals of a second preset length, placing the positions of the video fused virtual cameras on the interpolation points and facing the positions of the world coordinates corresponding to the first standard point, re-executing rendering operation, repeating the step 4 to determine the second temporary position of the virtual cameras,
and 6, continuing to adjust the interpolation interval and repeating the step 5 until the position of the virtual camera with the minimum Euclidean distance, namely the optimal fusion effect, is obtained, and projecting the video stream into the three-dimensional live-action model from the obtained position and orientation of the optimal virtual camera.
Still further, the plurality of standard points are 4 standard points of determined positions, which are located at a center point position, a bottom leftmost lower corner, a bottom middle point, and a bottom rightmost lower corner of the single frame video image.
Still further, the making a connection between world coordinates corresponding to a first standard point of the video image and world coordinates corresponding to a second standard point of the video image in the three-dimensional live-action model further includes: the first standard point is the center point coordinate of the image, and the second standard point is the center point coordinate of the bottom of the image.
Further, the first preset length and the second preset length are length values input by a user, the first preset length is initially set to 10 meters, and the second preset length is initially set to 1 meter.
Further, the obtaining the similarity between the screen coordinates corresponding to the rest standard points and the preset coordinate positions of the plurality of standard points on the video image through euclidean distance method further includes: the similarity obtained by the Euclidean distance calculation formula is expressed as:
√[(p1-q1)2+(p2-q2)2+(p3-q3)2]
wherein, p1, p2, p3 are corresponding screen coordinate values representing standard points in the frame buffer, q1, q2, q3 represent corresponding preset coordinate values on the video image.
The application also discloses a video fusion system based on the three-dimensional scene, which comprises the following modules:
the coordinate point selection module acquires a preset video image sequence, intercepts a single-frame video image at a preset position from the video image sequence, initializes image coordinates on the single-frame video image, and selects a plurality of standard points at the preset coordinate positions on the video image;
the coordinate mapping module is used for acquiring a three-dimensional real-scene model to be fused with the video image sequence, establishing a coordinate mapping relation between the three-dimensional real-scene model and the video image to be fused, and determining world coordinates corresponding to the standard points from the three-dimensional real-scene model;
the virtual camera initial rendering module is used for making a connection line from world coordinates corresponding to a first standard point of a video image to world coordinates corresponding to a second standard point of the video image in the three-dimensional live-action model, generating a preset number of interpolation points at vertical intervals on an extension line of the connection line according to a first preset length interval, placing the positions of the virtual camera fused with the video on the interpolation points and facing the positions of the world coordinates corresponding to the first standard point, then executing rendering operation, and storing rendered data in a frame buffer;
the virtual camera positioning module is used for obtaining screen coordinates corresponding to other standard points except the first standard point in the frame buffer, obtaining the screen coordinates corresponding to the other standard points through Euclidean distance method, comparing the similarity between the screen coordinates corresponding to the other standard points and preset coordinate positions of the standard points on the video image, and taking the interpolation point with the highest similarity degree as the first temporary position of the virtual camera;
a positioning updating module, which is used for making a connection line from world coordinates corresponding to a first standard point of the video image to world coordinates corresponding to a second standard point of the video image in the three-dimensional live-action model, generating a preset number of interpolation points at a vertical interval again according to a horizontal interval of a second preset length along the connection line direction by taking the first temporary position as the center, placing the positions of the video fused virtual cameras on the interpolation points and facing the positions of the world coordinates corresponding to the first standard point, re-executing rendering operation, repeating the step 4 to determine the second temporary position of the virtual cameras,
and the fusion module is used for continuously adjusting the interpolation interval and repeating the functions executed by the positioning updating module until the position of the virtual camera with the minimum Euclidean distance, namely the best fusion effect, is obtained, and the video stream is projected into the three-dimensional real scene model from the obtained position and orientation of the optimal virtual camera.
Still further, the plurality of standard points are 4 standard points of determined positions, which are located at a center point position, a bottom leftmost lower corner, a bottom middle point, and a bottom rightmost lower corner of the single frame video image.
Still further, the making a connection between world coordinates corresponding to a first standard point of the video image and world coordinates corresponding to a second standard point of the video image in the three-dimensional live-action model further includes: the first standard point is the center point coordinate of the image, and the second standard point is the center point coordinate of the bottom of the image.
Further, the first preset length and the second preset length are length values input by a user, the first preset length is initially set to 10 meters, and the second preset length is initially set to 1 meter.
Further, the obtaining the similarity between the screen coordinates corresponding to the rest standard points and the preset coordinate positions of the plurality of standard points on the video image through euclidean distance method further includes: the similarity obtained by the Euclidean distance calculation formula is expressed as:
√[(p1-q1)2+(p2-q2)2+(p3-q3)2]
wherein, p1, p2, p3 are corresponding screen coordinate values representing standard points in the frame buffer, q1, q2, q3 represent corresponding preset coordinate values on the video image.
Compared with the prior art, the application has the beneficial effects that: compared with the mode of calibrating the camera or manually adjusting the parameter fusion of the virtual camera, the video fusion technology provided by the application has higher efficiency, does not need a carrier, and greatly improves the video fusion efficiency. The application is a brand new video fusion technology. Compared with some video fusion technologies on the prior market, many of the prior video fusion technologies have the problems of complex operation, ideal applicable scene, large limitation, poor fusion effect and the like. The video fusion technology can automatically, quickly and accurately fuse a video to be fused only by selecting four pairs of standard point coordinates (comprising the pixel coordinates of the video image and the world coordinates corresponding to the three-dimensional live-action model), has good fusion effect, and greatly reduces the cost of fusing the video to the three-dimensional live-action model.
Drawings
The application will be further understood from the following description taken in conjunction with the accompanying drawings. The components in the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the embodiments. In the figures, like reference numerals designate corresponding parts throughout the different views.
Fig. 1 is a standard point selection diagram of a video image in an embodiment of the application.
FIG. 2 is a flow chart of implementing three-dimensional scene-based video fusion in an embodiment of the application.
FIG. 3 is a schematic diagram of the placement of the positions of virtual cameras fusing video onto these interpolation points in an embodiment of the application.
FIG. 4 is a flow chart of another implementation of three-dimensional scene-based video fusion in an embodiment of the application.
Detailed Description
The technical scheme of the application will be described in more detail below with reference to the accompanying drawings and examples.
A mobile terminal implementing various embodiments of the present application will now be described with reference to the accompanying drawings. In the following description, suffixes such as "module", "component", or "unit" for representing elements are used only for facilitating the description of the present application, and are not of specific significance per se. Thus, "module" and "component" may be used in combination.
Mobile terminals may be implemented in a variety of forms. For example, the terminals described in the present application may include mobile terminals such as mobile phones, smart phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), navigation devices, and the like, and fixed terminals such as digital TVs, desktop computers, and the like. In the following, it is assumed that the terminal is a mobile terminal. However, it will be understood by those skilled in the art that the configuration according to the embodiment of the present application can be applied to a fixed type terminal in addition to elements particularly used for a moving purpose.
A video fusion method based on three-dimensional scene as shown in fig. 1-4, the video fusion method comprising the steps of:
step 1, acquiring a preset video image sequence, intercepting a single-frame video image at a preset position from the video image sequence, initializing image coordinates on the single-frame video image, and selecting a plurality of standard points at the preset coordinate positions on the video image;
step 2, a three-dimensional real-scene model to be fused with the video image sequence is obtained, a coordinate mapping relation between the three-dimensional real-scene model and the video image to be fused is established, and world coordinates corresponding to the standard points are determined from the three-dimensional real-scene model;
step 3, making a connection line from world coordinates corresponding to a first standard point of the video image to world coordinates corresponding to a second standard point of the video image in the three-dimensional live-action model, generating a preset number of interpolation points at vertical intervals on an extension line of the connection line according to a first preset length interval, placing the positions of the video fused virtual cameras on the interpolation points and facing the positions of the world coordinates corresponding to the first standard point, then executing rendering operation, and storing rendered data in a frame buffer;
step 4, obtaining screen coordinates corresponding to other standard points except the first standard point in the frame buffer, obtaining the screen coordinates corresponding to the other standard points through Euclidean distance method, comparing the screen coordinates corresponding to the other standard points with preset coordinate positions of the standard points on the video image, and taking the interpolation point with the highest similarity degree as the first temporary position of the virtual camera;
step 5, making a connection line from world coordinates corresponding to a first standard point of the video image to world coordinates corresponding to a second standard point of the video image in the three-dimensional live-action model, generating a preset number of interpolation points at vertical intervals again along the connection line direction by taking the first temporary position as a center according to horizontal intervals of a second preset length, placing the positions of the video fused virtual cameras on the interpolation points and facing the positions of the world coordinates corresponding to the first standard point, re-executing rendering operation, repeating the step 4 to determine the second temporary position of the virtual cameras,
and 6, continuing to adjust the interpolation interval and repeating the step 5 until the position of the virtual camera with the minimum Euclidean distance, namely the optimal fusion effect, is obtained, and projecting the video stream into the three-dimensional live-action model from the obtained position and orientation of the optimal virtual camera.
Still further, the plurality of standard points are 4 standard points of determined positions, which are located at a center point position, a bottom leftmost lower corner, a bottom middle point, and a bottom rightmost lower corner of the single frame video image.
Still further, the making a connection between world coordinates corresponding to a first standard point of the video image and world coordinates corresponding to a second standard point of the video image in the three-dimensional live-action model further includes: the first standard point is the center point coordinate of the image, and the second standard point is the center point coordinate of the bottom of the image.
Further, the first preset length and the second preset length are length values input by a user, the first preset length is initially set to 10 meters, and the second preset length is initially set to 1 meter.
Further, the obtaining the similarity between the screen coordinates corresponding to the rest standard points and the preset coordinate positions of the plurality of standard points on the video image through euclidean distance method further includes: the similarity obtained by the Euclidean distance calculation formula is expressed as:
√[(p1-q1)2+(p2-q2)2+(p3-q3)2]
wherein, p1, p2, p3 are corresponding screen coordinate values representing standard points in the frame buffer, q1, q2, q3 represent corresponding preset coordinate values on the video image.
The application also discloses a video fusion system based on the three-dimensional scene, which comprises the following modules:
the coordinate point selection module acquires a preset video image sequence, intercepts a single-frame video image at a preset position from the video image sequence, initializes image coordinates on the single-frame video image, and selects a plurality of standard points at the preset coordinate positions on the video image;
the coordinate mapping module is used for acquiring a three-dimensional real-scene model to be fused with the video image sequence, establishing a coordinate mapping relation between the three-dimensional real-scene model and the video image to be fused, and determining world coordinates corresponding to the standard points from the three-dimensional real-scene model;
the virtual camera initial rendering module is used for making a connection line from world coordinates corresponding to a first standard point of a video image to world coordinates corresponding to a second standard point of the video image in the three-dimensional live-action model, generating a preset number of interpolation points at vertical intervals on an extension line of the connection line according to a first preset length interval, placing the positions of the virtual camera fused with the video on the interpolation points and facing the positions of the world coordinates corresponding to the first standard point, then executing rendering operation, and storing rendered data in a frame buffer;
the virtual camera positioning module is used for obtaining screen coordinates corresponding to other standard points except the first standard point in the frame buffer, obtaining the screen coordinates corresponding to the other standard points through Euclidean distance method, comparing the similarity between the screen coordinates corresponding to the other standard points and preset coordinate positions of the standard points on the video image, and taking the interpolation point with the highest similarity degree as the first temporary position of the virtual camera;
a positioning updating module, which is used for making a connection line from world coordinates corresponding to a first standard point of the video image to world coordinates corresponding to a second standard point of the video image in the three-dimensional live-action model, generating a preset number of interpolation points at a vertical interval again according to a horizontal interval of a second preset length along the connection line direction by taking the first temporary position as the center, placing the positions of the video fused virtual cameras on the interpolation points and facing the positions of the world coordinates corresponding to the first standard point, re-executing rendering operation, repeating the step 4 to determine the second temporary position of the virtual cameras,
and the fusion module is used for continuously adjusting the interpolation interval and repeating the functions executed by the positioning updating module until the position of the virtual camera with the minimum Euclidean distance, namely the best fusion effect, is obtained, and the video stream is projected into the three-dimensional real scene model from the obtained position and orientation of the optimal virtual camera.
Still further, the plurality of standard points are 4 standard points of determined positions, which are located at a center point position, a bottom leftmost lower corner, a bottom middle point, and a bottom rightmost lower corner of the single frame video image.
Still further, the making a connection between world coordinates corresponding to a first standard point of the video image and world coordinates corresponding to a second standard point of the video image in the three-dimensional live-action model further includes: the first standard point is the center point coordinate of the image, and the second standard point is the center point coordinate of the bottom of the image.
Further, the first preset length and the second preset length are length values input by a user, the first preset length is initially set to 10 meters, and the second preset length is initially set to 1 meter.
Further, the obtaining the similarity between the screen coordinates corresponding to the rest standard points and the preset coordinate positions of the plurality of standard points on the video image through euclidean distance method further includes: the similarity obtained by the Euclidean distance calculation formula is expressed as:
√[(p1-q1)2+(p2-q2)2+(p3-q3)2]
wherein, p1, p2, p3 are corresponding screen coordinate values representing standard points in the frame buffer, q1, q2, q3 represent corresponding preset coordinate values on the video image.
In this embodiment, the implementation technical scheme includes the following steps:
a frame of video image is cut from the video stream, four standard points are taken on the image, namely a1 (lower left corner point of the image), a2 (middle point of the bottom of the image), a3 (lower right corner point of the image) and a4 (center point of the image). Reference is made to fig. 1. And then, world coordinates B1 (world coordinates corresponding to a 1), B2 (world coordinates corresponding to a 2), B3 (world coordinates corresponding to a 3) and B4 (world coordinates corresponding to a 4) corresponding to the standard points are taken from the three-dimensional live-action model.
And (3) a connecting line of B4-B2 is made in the three-dimensional live-action model, and 100 interpolation points are generated at vertical intervals according to horizontal intervals of 10 meters in the direction of an extension line of the connecting line of B4-B2. The positions of the virtual cameras fused by the video are placed on the interpolation points, the virtual cameras look at the B4 coordinate points, and then the virtual cameras are rendered into a frame buffer. Reference is made to fig. 3.
Screen coordinates c1, c2, c3 corresponding to B1, B2, B3 are found in the frame buffer. The euclidean distance algorithm is used to find the degree of similarity between c1, c2, c3 and a1, a2, a3, and the coordinate of the interpolation point with the minimum euclidean distance, i.e. the highest degree of similarity, is tentatively set as the position P1 of the virtual camera.
Calculation formula of Euclidean distance of [ (p 1-q 1) 2+ (p 2-q 2) 2+ … + (pn-qn) 2]
It is known that P1 generates 100 interpolation points at horizontal intervals of 1 m and vertical intervals in the left-right direction of the B4-B2 straight line on the extension line of the B4-B2 line with P1 as the center. The position of the virtual camera is placed on these interpolation points, and the virtual camera looks at the B4 coordinate point and then is rendered into the frame buffer. And (3) repeating the step (3) to obtain a temporary position P2 of the virtual camera.
Continuing to adjust down the interpolation interval 0.5,0.3,0.1,0.01. And (4) repeating the step until the position Pn of the virtual camera with the minimum Euclidean distance, namely the optimal fusion effect is obtained.
And 5, projecting the position and the orientation of the video stream (note: the virtual camera looks at B4, and B4 is the world coordinate point of the three-dimensional real scene model corresponding to the video image center point) from the virtual camera obtained in the step 5 into the three-dimensional real scene model, and finishing the effect of fusing the video into the three-dimensional real scene.
The embodiment also discloses an implementation step:
step 1: a frame of video image is cut from the video stream, and the pixel coordinates of 4 index points, a1 (lower left), a2 (lower middle), a3 (lower right), a4 (center point) of the image are taken. Taking the world coordinates B1 (left lower), B2 (lower middle), B3 (right lower), B4 (central point) in the corresponding three-dimensional live-action model
Step 2: and (3) making a B4-B2 connecting line in the three-dimensional live-action model, and generating 100 interpolation points on an extension line of the B4-B2 connecting line according to a horizontal interval of 10 meters and a vertical interval. A virtual camera is placed on these interpolation points, with the camera facing the view B4. Rendering to frame buffer
Step 3: screen coordinates c1, c2, c3 corresponding to B1, B2, B3 are found in the frame buffer. The euclidean distance method was used to find the degree of similarity between c1, c2, c3 and a1, a2, a 3. The interpolation point with the highest similarity is tentatively set as the position P1 of the virtual camera.
Step 4: p1 is on the line of B4-B2, takes P1 as the center, and generates 100 interpolation points along the left-right direction of the B4-B2 straight line according to the horizontal interval of 1 meter and the vertical interval. A virtual camera is placed on these interpolation points, with the camera facing the view B4. Rendering to a frame buffer, repeating the step 3, and obtaining the position P2 of the virtual camera
Step 5: the interpolation interval is set to 0.5,0.3,0.1,0.01. And (4) repeating the step 4. Until the optimal video fusion virtual camera position is found.
Step 6: and 5, projecting the video stream from the position and the orientation of the virtual camera in the step 5 to the three-dimensional live-action model, and finishing the effect of fusing the video to the three-dimensional live-action model.
Frame buffering
Also known as post-frame caching, is a technique in computer graphics that can be used to accelerate the rendering process. At rendering time, the graphics data will be stored in the frame buffer awaiting output to the display.
Video fusion
And projecting the video into the solid three-dimensional model, and viewing the video playing effect in the three-dimensional scene.
Euclidean distance algorithm
Euclidean distance algorithm, also known as Euclidean distance algorithm, is a distance measurement method commonly used in the field of machine learning. If there are two points p and q whose coordinates in n-dimensional space are (p 1, p2, …, pn) and (q 1, q2, …, qn), respectively, the Euclidean distance between p and q is defined as:
√[(p1-q1)2+(p2-q2)2+…+(pn-qn)2]
this distance represents the distance of two points in an n-dimensional space, a measure of the length in Euclidean space as the distance between the points. In the fields of machine-learned classification, clustering algorithms, etc., euclidean distance algorithms are often used to calculate the similarity or distance between samples. In general, if the euclidean distance between two points is smaller, the higher their similarity is, and the farther the distance is, the lower their similarity is.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
While the application has been described above with reference to various embodiments, it should be understood that many changes and modifications can be made without departing from the scope of the application. It is therefore intended that the foregoing detailed description be regarded as illustrative rather than limiting, and that it be understood that it is the following claims, including all equivalents, that are intended to define the spirit and scope of this application. The above examples should be understood as illustrative only and not limiting the scope of the application. Various changes and modifications to the present application may be made by one skilled in the art after reading the teachings herein, and such equivalent changes and modifications are intended to fall within the scope of the application as defined in the appended claims.
Claims (10)
1. The video fusion method based on the three-dimensional scene is characterized by comprising the following steps of:
step 1, acquiring a preset video image sequence, intercepting a single-frame video image at a preset position from the video image sequence, initializing image coordinates on the single-frame video image, and selecting a plurality of standard points at the preset coordinate positions on the video image;
step 2, a three-dimensional real-scene model to be fused with the preset video image sequence is obtained, a coordinate mapping relation between the three-dimensional real-scene model and the video image to be fused is established, and world coordinates corresponding to the standard points are determined from the three-dimensional real-scene model;
step 3, making a connection line from world coordinates corresponding to a first standard point of the video image to world coordinates corresponding to a second standard point of the video image in the three-dimensional live-action model, generating a preset number of interpolation points at vertical intervals on an extension line of the connection line according to a first preset length interval, placing the positions of the video fused virtual cameras on the interpolation points and facing the positions of the world coordinates corresponding to the first standard point, then executing rendering operation, and storing rendered data in a frame buffer;
step 4, obtaining screen coordinates corresponding to other standard points except the first standard point in the frame buffer, obtaining the screen coordinates corresponding to the other standard points through Euclidean distance method, comparing the screen coordinates corresponding to the other standard points with preset coordinate positions of the standard points on the video image, and taking the interpolation point with the highest similarity degree as the first temporary position of the virtual camera;
step 5, a connection line from world coordinates corresponding to a first standard point of the video image to world coordinates corresponding to a second standard point of the video image is made in the three-dimensional live-action model, a preset number of interpolation points are generated at vertical intervals again along the connection line direction by taking the first temporary position as a center according to horizontal intervals of a second preset length, the positions of the virtual cameras fused with the video are placed on the interpolation points, rendering operation is executed again after the positions of the world coordinates corresponding to the first standard point are oriented, and the second temporary position of the virtual camera is determined in the step 4;
and 6, continuing to adjust the interpolation interval and repeating the step 5 until the position with the minimum Euclidean distance is obtained, and projecting the video stream into the three-dimensional real scene model from the obtained position and orientation of the optimal virtual camera.
2. The method of claim 1, wherein the plurality of standard points are 4 standard points of determined positions, which are located at a center point position, a bottom left-most corner, a bottom middle point, and a bottom right-most corner of the single frame video image.
3. The method of claim 2, wherein making a connection from world coordinates corresponding to a first standard point of a video image to world coordinates corresponding to a second standard point of the video image in the three-dimensional live-action model further comprises: the first standard point is the center point coordinate of the image, and the second standard point is the center point coordinate of the bottom of the image.
4. The video fusion method according to claim 1, wherein the first preset length and the second preset length are length values input by a user, the first preset length is initially set to 10 meters, and the second preset length is initially set to 1 meter.
5. The method of claim 1, wherein in the step 4, the similarity obtained by euclidean distance method is expressed as d:
d=√[(p1-q1)2+(p2-q2)2+(p3-q3)2]
wherein, p1, p2, p3 are corresponding screen coordinate values representing standard points in the frame buffer, q1, q2, q3 represent corresponding preset coordinate values on the video image, and the sign of v represents open square.
6. A video fusion system based on three-dimensional scenes, the video fusion system comprising the following modules:
the coordinate point selection module acquires a preset video image sequence, intercepts a single-frame video image at a preset position from the video image sequence, initializes image coordinates on the single-frame video image, and selects a plurality of standard points at the preset coordinate positions on the video image;
the coordinate mapping module is used for acquiring a three-dimensional real-scene model to be fused with the video image sequence, establishing a coordinate mapping relation between the three-dimensional real-scene model and the video image to be fused, and determining world coordinates corresponding to the standard points from the three-dimensional real-scene model;
the virtual camera initial rendering module is used for making a connection line from world coordinates corresponding to a first standard point of a video image to world coordinates corresponding to a second standard point of the video image in the three-dimensional live-action model, generating a preset number of interpolation points at vertical intervals on an extension line of the connection line according to a first preset length interval, placing the positions of the virtual camera fused with the video on the interpolation points and facing the positions of the world coordinates corresponding to the first standard point, then executing rendering operation, and storing rendered data in a frame buffer;
the virtual camera positioning module is used for obtaining screen coordinates corresponding to other standard points except the first standard point in the frame buffer, obtaining the screen coordinates corresponding to the other standard points through Euclidean distance method, comparing the similarity between the screen coordinates corresponding to the other standard points and preset coordinate positions of the standard points on the video image, and taking the interpolation point with the highest similarity degree as the first temporary position of the virtual camera;
a positioning updating module, which is used for making a connection line from world coordinates corresponding to a first standard point of the video image to world coordinates corresponding to a second standard point of the video image in the three-dimensional live-action model, generating a preset number of interpolation points at a vertical interval again according to a horizontal interval of a second preset length along the connection line direction by taking the first temporary position as the center, placing the positions of the video fused virtual cameras on the interpolation points and facing the positions of the world coordinates corresponding to the first standard point, re-executing rendering operation, repeating the step 4 to determine the second temporary position of the virtual cameras,
and the fusion module is used for continuously adjusting the interpolation interval and repeating the functions executed by the positioning updating module until the position of the virtual camera with the minimum Euclidean distance, namely the best fusion effect, is obtained, and the video stream is projected into the three-dimensional real scene model from the obtained position and orientation of the optimal virtual camera.
7. The three-dimensional scene-based video fusion system of claim 6, wherein the plurality of standard points are 4 location-determining standard points located at a center point location, a bottom left-most corner, a bottom middle point, and a bottom right-most corner of the single frame video image.
8. The three-dimensional scene-based video fusion system of claim 7, wherein the making of a connection from world coordinates corresponding to a first standard point of a video image to world coordinates corresponding to a second standard point of the video image in the three-dimensional live-action model further comprises: the first standard point is the center point coordinate of the image, and the second standard point is the center point coordinate of the bottom of the image.
9. The three-dimensional scene-based video fusion system of claim 6, wherein the first preset length and the second preset length are length values entered by a user, the first preset length being initially set to 10 meters and the second preset length being initially set to 1 meter.
10. The video fusion system based on three-dimensional scene as defined in claim 6, wherein the obtaining the screen coordinates corresponding to the rest standard points by euclidean distance method and comparing the screen coordinates with the preset coordinate positions of the plurality of standard points on the video image further comprises: the similarity obtained by Euclidean distance method is expressed as d:
d=√[(p1-q1)2+(p2-q2)2+(p3-q3)2]
wherein, p1, p2, p3 are corresponding screen coordinate values representing standard points in the frame buffer, q1, q2, q3 represent corresponding preset coordinate values on the video image, and the sign of v represents open square.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310884577.2A CN116996742B (en) | 2023-07-18 | 2023-07-18 | Video fusion method and system based on three-dimensional scene |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310884577.2A CN116996742B (en) | 2023-07-18 | 2023-07-18 | Video fusion method and system based on three-dimensional scene |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116996742A true CN116996742A (en) | 2023-11-03 |
CN116996742B CN116996742B (en) | 2024-08-13 |
Family
ID=88525844
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310884577.2A Active CN116996742B (en) | 2023-07-18 | 2023-07-18 | Video fusion method and system based on three-dimensional scene |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116996742B (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20110088995A (en) * | 2010-01-29 | 2011-08-04 | 주식회사 울프슨랩 | Method and system to visualize surveillance camera videos within 3d models, and program recording medium |
US20180262789A1 (en) * | 2016-03-16 | 2018-09-13 | Adcor Magnet Systems, Llc | System for georeferenced, geo-oriented realtime video streams |
CN112053446A (en) * | 2020-07-11 | 2020-12-08 | 南京国图信息产业有限公司 | Real-time monitoring video and three-dimensional scene fusion method based on three-dimensional GIS |
CN112184922A (en) * | 2020-10-15 | 2021-01-05 | 洛阳众智软件科技股份有限公司 | Fusion method, device and equipment of two-dimensional video and three-dimensional scene and storage medium |
CN112437276A (en) * | 2020-11-20 | 2021-03-02 | 埃洛克航空科技(北京)有限公司 | WebGL-based three-dimensional video fusion method and system |
CN113810626A (en) * | 2020-06-15 | 2021-12-17 | 浙江宇视科技有限公司 | Video fusion method, device and equipment based on three-dimensional map and storage medium |
CN113870163A (en) * | 2021-09-24 | 2021-12-31 | 埃洛克航空科技(北京)有限公司 | Video fusion method and device based on three-dimensional scene, storage medium and electronic device |
CN115546377A (en) * | 2022-12-01 | 2022-12-30 | 杭州靖安科技有限公司 | Video fusion method and device, electronic equipment and storage medium |
-
2023
- 2023-07-18 CN CN202310884577.2A patent/CN116996742B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20110088995A (en) * | 2010-01-29 | 2011-08-04 | 주식회사 울프슨랩 | Method and system to visualize surveillance camera videos within 3d models, and program recording medium |
US20180262789A1 (en) * | 2016-03-16 | 2018-09-13 | Adcor Magnet Systems, Llc | System for georeferenced, geo-oriented realtime video streams |
CN113810626A (en) * | 2020-06-15 | 2021-12-17 | 浙江宇视科技有限公司 | Video fusion method, device and equipment based on three-dimensional map and storage medium |
CN112053446A (en) * | 2020-07-11 | 2020-12-08 | 南京国图信息产业有限公司 | Real-time monitoring video and three-dimensional scene fusion method based on three-dimensional GIS |
CN112184922A (en) * | 2020-10-15 | 2021-01-05 | 洛阳众智软件科技股份有限公司 | Fusion method, device and equipment of two-dimensional video and three-dimensional scene and storage medium |
CN112437276A (en) * | 2020-11-20 | 2021-03-02 | 埃洛克航空科技(北京)有限公司 | WebGL-based three-dimensional video fusion method and system |
CN113870163A (en) * | 2021-09-24 | 2021-12-31 | 埃洛克航空科技(北京)有限公司 | Video fusion method and device based on three-dimensional scene, storage medium and electronic device |
CN115546377A (en) * | 2022-12-01 | 2022-12-30 | 杭州靖安科技有限公司 | Video fusion method and device, electronic equipment and storage medium |
Non-Patent Citations (1)
Title |
---|
宁泽西, 秦绪佳, 陈佳舟: "基于三维场景的视频融合方法", 《计算机科学》, no. 2, pages 281 - 285 * |
Also Published As
Publication number | Publication date |
---|---|
CN116996742B (en) | 2024-08-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109618222B (en) | A kind of splicing video generation method, device, terminal device and storage medium | |
US10521468B2 (en) | Animated seek preview for panoramic videos | |
US20180324415A1 (en) | Real-time automatic vehicle camera calibration | |
WO2021213067A1 (en) | Object display method and apparatus, device and storage medium | |
CN112954450B (en) | Video processing method and device, electronic equipment and storage medium | |
CN107766349B (en) | Method, device, equipment and client for generating text | |
CN110084797B (en) | Plane detection method, plane detection device, electronic equipment and storage medium | |
CN104394422A (en) | Video segmentation point acquisition method and device | |
CN105844256A (en) | Panorama video frame image processing method and device | |
CN111475676B (en) | Video data processing method, system, device, equipment and readable storage medium | |
CN103914876A (en) | Method and apparatus for displaying video on 3D map | |
CN114531553B (en) | Method, device, electronic equipment and storage medium for generating special effect video | |
CN112150603B (en) | Initial visual angle control and presentation method and system based on three-dimensional point cloud | |
CN111491187A (en) | Video recommendation method, device, equipment and storage medium | |
CN110781823A (en) | Screen recording detection method and device, readable medium and electronic equipment | |
JP2019529992A (en) | Display device and control method thereof | |
CN110796664A (en) | Image processing method, image processing device, electronic equipment and computer readable storage medium | |
CN109743566A (en) | A kind of method and apparatus of the video format of VR for identification | |
Xiong et al. | Snap angle prediction for 360 panoramas | |
CN114863071A (en) | Target object labeling method and device, storage medium and electronic equipment | |
CN112906553B (en) | Image processing method, apparatus, device and medium | |
US9058684B1 (en) | Level of detail blurring and 3D model data selection | |
CN112927163A (en) | Image data enhancement method and device, electronic equipment and storage medium | |
CN116996742B (en) | Video fusion method and system based on three-dimensional scene | |
EP3177005B1 (en) | Display control system, display control device, display control method, and program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |