CN116996742A - Video fusion method and system based on three-dimensional scene - Google Patents

Video fusion method and system based on three-dimensional scene Download PDF

Info

Publication number
CN116996742A
CN116996742A CN202310884577.2A CN202310884577A CN116996742A CN 116996742 A CN116996742 A CN 116996742A CN 202310884577 A CN202310884577 A CN 202310884577A CN 116996742 A CN116996742 A CN 116996742A
Authority
CN
China
Prior art keywords
video image
standard
video
coordinates corresponding
points
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310884577.2A
Other languages
Chinese (zh)
Other versions
CN116996742B (en
Inventor
石立阳
曹琪
黄星淮
祝昌宝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Digital Technology Guangzhou Co ltd
Original Assignee
Digital Technology Guangzhou Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Digital Technology Guangzhou Co ltd filed Critical Digital Technology Guangzhou Co ltd
Priority to CN202310884577.2A priority Critical patent/CN116996742B/en
Publication of CN116996742A publication Critical patent/CN116996742A/en
Application granted granted Critical
Publication of CN116996742B publication Critical patent/CN116996742B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44012Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving rendering scenes according to scene graphs, e.g. MPEG-4 scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/816Monomedia components thereof involving special video data, e.g 3D video
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The application discloses a video fusion method based on a three-dimensional scene, which has higher efficiency than a mode of calibrating a camera or manually adjusting virtual camera parameters to fuse, does not need a carrier, and greatly improves the efficiency of video fusion. The application is a brand new video fusion technology. Compared with some video fusion technologies on the prior market, many of the prior video fusion technologies have the problems of complex operation, ideal applicable scene, large limitation, poor fusion effect and the like. The video fusion technology can automatically, quickly and accurately fuse a video to be fused only by selecting four pairs of standard point coordinates (comprising the pixel coordinates of the video image and the world coordinates corresponding to the three-dimensional live-action model), has good fusion effect, and greatly reduces the cost of fusing the video to the three-dimensional live-action model.

Description

Video fusion method and system based on three-dimensional scene
Technical Field
The application relates to the technical field of image processing, in particular to a video fusion method and system based on a three-dimensional scene.
Background
The video fusion technology plays an important role in the field of digital twinning in smart cities. The method can meet the requirement of projecting the real-time monitoring video onto the three-dimensional real-scene model data in the business scene of the smart city, thereby achieving the effect of virtual-real fusion, and being widely used in the fields of security, unmanned inspection and the like. How to automatically or semi-automatically project the video onto the three-dimensional live-action model data is the first step of achieving the video fusion effect and is the most critical step. Currently, a few video fusion technologies exist in the market, for example, in the prior art, chinese patent 202211528984.1 discloses a video fusion method, a device, an electronic apparatus and a storage medium, where the method adopted is to load a three-dimensional model in the GIS system to construct a virtual scene similar to reality; projecting the real-time monitoring video to a GIS system; and carrying out irregular clipping on the real-time monitoring video, and fusing the clipped real-time monitoring video into the constructed virtual scene.
However, the above method is limited to the shape of the three-dimensional model when video fusion occurs, and the problems of video penetration model and video repetition easily occur, which results in poor user experience.
Disclosure of Invention
The video fusion technology of the application is that standard points are sampled through video key frames, then the position and the gesture of a virtual camera of a video in a live-action three-dimensional scene are calculated, and then video streams are projected into the live-action three-dimensional scene according to the position and the gesture of the virtual camera, thereby realizing the effect of video fusion.
The present application aims to solve at least one of the technical problems existing in the prior art. Therefore, the application discloses a video fusion method based on a three-dimensional scene, which comprises the following steps:
step 1, acquiring a preset video image sequence, intercepting a single-frame video image at a preset position from the video image sequence, initializing image coordinates on the single-frame video image, and selecting a plurality of standard points at the preset coordinate positions on the video image;
step 2, a three-dimensional real-scene model to be fused with the video image sequence is obtained, a coordinate mapping relation between the three-dimensional real-scene model and the video image to be fused is established, and world coordinates corresponding to the standard points are determined from the three-dimensional real-scene model;
step 3, making a connection line from world coordinates corresponding to a first standard point of the video image to world coordinates corresponding to a second standard point of the video image in the three-dimensional live-action model, generating a preset number of interpolation points at vertical intervals on an extension line of the connection line according to a first preset length interval, placing the positions of the video fused virtual cameras on the interpolation points and facing the positions of the world coordinates corresponding to the first standard point, then executing rendering operation, and storing rendered data in a frame buffer;
step 4, obtaining screen coordinates corresponding to other standard points except the first standard point in the frame buffer, obtaining the screen coordinates corresponding to the other standard points through Euclidean distance method, comparing the screen coordinates corresponding to the other standard points with preset coordinate positions of the standard points on the video image, and taking the interpolation point with the highest similarity degree as the first temporary position of the virtual camera;
step 5, making a connection line from world coordinates corresponding to a first standard point of the video image to world coordinates corresponding to a second standard point of the video image in the three-dimensional live-action model, generating a preset number of interpolation points at vertical intervals again along the connection line direction by taking the first temporary position as a center according to horizontal intervals of a second preset length, placing the positions of the video fused virtual cameras on the interpolation points and facing the positions of the world coordinates corresponding to the first standard point, re-executing rendering operation, repeating the step 4 to determine the second temporary position of the virtual cameras,
and 6, continuing to adjust the interpolation interval and repeating the step 5 until the position of the virtual camera with the minimum Euclidean distance, namely the optimal fusion effect, is obtained, and projecting the video stream into the three-dimensional live-action model from the obtained position and orientation of the optimal virtual camera.
Still further, the plurality of standard points are 4 standard points of determined positions, which are located at a center point position, a bottom leftmost lower corner, a bottom middle point, and a bottom rightmost lower corner of the single frame video image.
Still further, the making a connection between world coordinates corresponding to a first standard point of the video image and world coordinates corresponding to a second standard point of the video image in the three-dimensional live-action model further includes: the first standard point is the center point coordinate of the image, and the second standard point is the center point coordinate of the bottom of the image.
Further, the first preset length and the second preset length are length values input by a user, the first preset length is initially set to 10 meters, and the second preset length is initially set to 1 meter.
Further, the obtaining the similarity between the screen coordinates corresponding to the rest standard points and the preset coordinate positions of the plurality of standard points on the video image through euclidean distance method further includes: the similarity obtained by the Euclidean distance calculation formula is expressed as:
√[(p1-q1)2+(p2-q2)2+(p3-q3)2]
wherein, p1, p2, p3 are corresponding screen coordinate values representing standard points in the frame buffer, q1, q2, q3 represent corresponding preset coordinate values on the video image.
The application also discloses a video fusion system based on the three-dimensional scene, which comprises the following modules:
the coordinate point selection module acquires a preset video image sequence, intercepts a single-frame video image at a preset position from the video image sequence, initializes image coordinates on the single-frame video image, and selects a plurality of standard points at the preset coordinate positions on the video image;
the coordinate mapping module is used for acquiring a three-dimensional real-scene model to be fused with the video image sequence, establishing a coordinate mapping relation between the three-dimensional real-scene model and the video image to be fused, and determining world coordinates corresponding to the standard points from the three-dimensional real-scene model;
the virtual camera initial rendering module is used for making a connection line from world coordinates corresponding to a first standard point of a video image to world coordinates corresponding to a second standard point of the video image in the three-dimensional live-action model, generating a preset number of interpolation points at vertical intervals on an extension line of the connection line according to a first preset length interval, placing the positions of the virtual camera fused with the video on the interpolation points and facing the positions of the world coordinates corresponding to the first standard point, then executing rendering operation, and storing rendered data in a frame buffer;
the virtual camera positioning module is used for obtaining screen coordinates corresponding to other standard points except the first standard point in the frame buffer, obtaining the screen coordinates corresponding to the other standard points through Euclidean distance method, comparing the similarity between the screen coordinates corresponding to the other standard points and preset coordinate positions of the standard points on the video image, and taking the interpolation point with the highest similarity degree as the first temporary position of the virtual camera;
a positioning updating module, which is used for making a connection line from world coordinates corresponding to a first standard point of the video image to world coordinates corresponding to a second standard point of the video image in the three-dimensional live-action model, generating a preset number of interpolation points at a vertical interval again according to a horizontal interval of a second preset length along the connection line direction by taking the first temporary position as the center, placing the positions of the video fused virtual cameras on the interpolation points and facing the positions of the world coordinates corresponding to the first standard point, re-executing rendering operation, repeating the step 4 to determine the second temporary position of the virtual cameras,
and the fusion module is used for continuously adjusting the interpolation interval and repeating the functions executed by the positioning updating module until the position of the virtual camera with the minimum Euclidean distance, namely the best fusion effect, is obtained, and the video stream is projected into the three-dimensional real scene model from the obtained position and orientation of the optimal virtual camera.
Still further, the plurality of standard points are 4 standard points of determined positions, which are located at a center point position, a bottom leftmost lower corner, a bottom middle point, and a bottom rightmost lower corner of the single frame video image.
Still further, the making a connection between world coordinates corresponding to a first standard point of the video image and world coordinates corresponding to a second standard point of the video image in the three-dimensional live-action model further includes: the first standard point is the center point coordinate of the image, and the second standard point is the center point coordinate of the bottom of the image.
Further, the first preset length and the second preset length are length values input by a user, the first preset length is initially set to 10 meters, and the second preset length is initially set to 1 meter.
Further, the obtaining the similarity between the screen coordinates corresponding to the rest standard points and the preset coordinate positions of the plurality of standard points on the video image through euclidean distance method further includes: the similarity obtained by the Euclidean distance calculation formula is expressed as:
√[(p1-q1)2+(p2-q2)2+(p3-q3)2]
wherein, p1, p2, p3 are corresponding screen coordinate values representing standard points in the frame buffer, q1, q2, q3 represent corresponding preset coordinate values on the video image.
Compared with the prior art, the application has the beneficial effects that: compared with the mode of calibrating the camera or manually adjusting the parameter fusion of the virtual camera, the video fusion technology provided by the application has higher efficiency, does not need a carrier, and greatly improves the video fusion efficiency. The application is a brand new video fusion technology. Compared with some video fusion technologies on the prior market, many of the prior video fusion technologies have the problems of complex operation, ideal applicable scene, large limitation, poor fusion effect and the like. The video fusion technology can automatically, quickly and accurately fuse a video to be fused only by selecting four pairs of standard point coordinates (comprising the pixel coordinates of the video image and the world coordinates corresponding to the three-dimensional live-action model), has good fusion effect, and greatly reduces the cost of fusing the video to the three-dimensional live-action model.
Drawings
The application will be further understood from the following description taken in conjunction with the accompanying drawings. The components in the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the embodiments. In the figures, like reference numerals designate corresponding parts throughout the different views.
Fig. 1 is a standard point selection diagram of a video image in an embodiment of the application.
FIG. 2 is a flow chart of implementing three-dimensional scene-based video fusion in an embodiment of the application.
FIG. 3 is a schematic diagram of the placement of the positions of virtual cameras fusing video onto these interpolation points in an embodiment of the application.
FIG. 4 is a flow chart of another implementation of three-dimensional scene-based video fusion in an embodiment of the application.
Detailed Description
The technical scheme of the application will be described in more detail below with reference to the accompanying drawings and examples.
A mobile terminal implementing various embodiments of the present application will now be described with reference to the accompanying drawings. In the following description, suffixes such as "module", "component", or "unit" for representing elements are used only for facilitating the description of the present application, and are not of specific significance per se. Thus, "module" and "component" may be used in combination.
Mobile terminals may be implemented in a variety of forms. For example, the terminals described in the present application may include mobile terminals such as mobile phones, smart phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), navigation devices, and the like, and fixed terminals such as digital TVs, desktop computers, and the like. In the following, it is assumed that the terminal is a mobile terminal. However, it will be understood by those skilled in the art that the configuration according to the embodiment of the present application can be applied to a fixed type terminal in addition to elements particularly used for a moving purpose.
A video fusion method based on three-dimensional scene as shown in fig. 1-4, the video fusion method comprising the steps of:
step 1, acquiring a preset video image sequence, intercepting a single-frame video image at a preset position from the video image sequence, initializing image coordinates on the single-frame video image, and selecting a plurality of standard points at the preset coordinate positions on the video image;
step 2, a three-dimensional real-scene model to be fused with the video image sequence is obtained, a coordinate mapping relation between the three-dimensional real-scene model and the video image to be fused is established, and world coordinates corresponding to the standard points are determined from the three-dimensional real-scene model;
step 3, making a connection line from world coordinates corresponding to a first standard point of the video image to world coordinates corresponding to a second standard point of the video image in the three-dimensional live-action model, generating a preset number of interpolation points at vertical intervals on an extension line of the connection line according to a first preset length interval, placing the positions of the video fused virtual cameras on the interpolation points and facing the positions of the world coordinates corresponding to the first standard point, then executing rendering operation, and storing rendered data in a frame buffer;
step 4, obtaining screen coordinates corresponding to other standard points except the first standard point in the frame buffer, obtaining the screen coordinates corresponding to the other standard points through Euclidean distance method, comparing the screen coordinates corresponding to the other standard points with preset coordinate positions of the standard points on the video image, and taking the interpolation point with the highest similarity degree as the first temporary position of the virtual camera;
step 5, making a connection line from world coordinates corresponding to a first standard point of the video image to world coordinates corresponding to a second standard point of the video image in the three-dimensional live-action model, generating a preset number of interpolation points at vertical intervals again along the connection line direction by taking the first temporary position as a center according to horizontal intervals of a second preset length, placing the positions of the video fused virtual cameras on the interpolation points and facing the positions of the world coordinates corresponding to the first standard point, re-executing rendering operation, repeating the step 4 to determine the second temporary position of the virtual cameras,
and 6, continuing to adjust the interpolation interval and repeating the step 5 until the position of the virtual camera with the minimum Euclidean distance, namely the optimal fusion effect, is obtained, and projecting the video stream into the three-dimensional live-action model from the obtained position and orientation of the optimal virtual camera.
Still further, the plurality of standard points are 4 standard points of determined positions, which are located at a center point position, a bottom leftmost lower corner, a bottom middle point, and a bottom rightmost lower corner of the single frame video image.
Still further, the making a connection between world coordinates corresponding to a first standard point of the video image and world coordinates corresponding to a second standard point of the video image in the three-dimensional live-action model further includes: the first standard point is the center point coordinate of the image, and the second standard point is the center point coordinate of the bottom of the image.
Further, the first preset length and the second preset length are length values input by a user, the first preset length is initially set to 10 meters, and the second preset length is initially set to 1 meter.
Further, the obtaining the similarity between the screen coordinates corresponding to the rest standard points and the preset coordinate positions of the plurality of standard points on the video image through euclidean distance method further includes: the similarity obtained by the Euclidean distance calculation formula is expressed as:
√[(p1-q1)2+(p2-q2)2+(p3-q3)2]
wherein, p1, p2, p3 are corresponding screen coordinate values representing standard points in the frame buffer, q1, q2, q3 represent corresponding preset coordinate values on the video image.
The application also discloses a video fusion system based on the three-dimensional scene, which comprises the following modules:
the coordinate point selection module acquires a preset video image sequence, intercepts a single-frame video image at a preset position from the video image sequence, initializes image coordinates on the single-frame video image, and selects a plurality of standard points at the preset coordinate positions on the video image;
the coordinate mapping module is used for acquiring a three-dimensional real-scene model to be fused with the video image sequence, establishing a coordinate mapping relation between the three-dimensional real-scene model and the video image to be fused, and determining world coordinates corresponding to the standard points from the three-dimensional real-scene model;
the virtual camera initial rendering module is used for making a connection line from world coordinates corresponding to a first standard point of a video image to world coordinates corresponding to a second standard point of the video image in the three-dimensional live-action model, generating a preset number of interpolation points at vertical intervals on an extension line of the connection line according to a first preset length interval, placing the positions of the virtual camera fused with the video on the interpolation points and facing the positions of the world coordinates corresponding to the first standard point, then executing rendering operation, and storing rendered data in a frame buffer;
the virtual camera positioning module is used for obtaining screen coordinates corresponding to other standard points except the first standard point in the frame buffer, obtaining the screen coordinates corresponding to the other standard points through Euclidean distance method, comparing the similarity between the screen coordinates corresponding to the other standard points and preset coordinate positions of the standard points on the video image, and taking the interpolation point with the highest similarity degree as the first temporary position of the virtual camera;
a positioning updating module, which is used for making a connection line from world coordinates corresponding to a first standard point of the video image to world coordinates corresponding to a second standard point of the video image in the three-dimensional live-action model, generating a preset number of interpolation points at a vertical interval again according to a horizontal interval of a second preset length along the connection line direction by taking the first temporary position as the center, placing the positions of the video fused virtual cameras on the interpolation points and facing the positions of the world coordinates corresponding to the first standard point, re-executing rendering operation, repeating the step 4 to determine the second temporary position of the virtual cameras,
and the fusion module is used for continuously adjusting the interpolation interval and repeating the functions executed by the positioning updating module until the position of the virtual camera with the minimum Euclidean distance, namely the best fusion effect, is obtained, and the video stream is projected into the three-dimensional real scene model from the obtained position and orientation of the optimal virtual camera.
Still further, the plurality of standard points are 4 standard points of determined positions, which are located at a center point position, a bottom leftmost lower corner, a bottom middle point, and a bottom rightmost lower corner of the single frame video image.
Still further, the making a connection between world coordinates corresponding to a first standard point of the video image and world coordinates corresponding to a second standard point of the video image in the three-dimensional live-action model further includes: the first standard point is the center point coordinate of the image, and the second standard point is the center point coordinate of the bottom of the image.
Further, the first preset length and the second preset length are length values input by a user, the first preset length is initially set to 10 meters, and the second preset length is initially set to 1 meter.
Further, the obtaining the similarity between the screen coordinates corresponding to the rest standard points and the preset coordinate positions of the plurality of standard points on the video image through euclidean distance method further includes: the similarity obtained by the Euclidean distance calculation formula is expressed as:
√[(p1-q1)2+(p2-q2)2+(p3-q3)2]
wherein, p1, p2, p3 are corresponding screen coordinate values representing standard points in the frame buffer, q1, q2, q3 represent corresponding preset coordinate values on the video image.
In this embodiment, the implementation technical scheme includes the following steps:
a frame of video image is cut from the video stream, four standard points are taken on the image, namely a1 (lower left corner point of the image), a2 (middle point of the bottom of the image), a3 (lower right corner point of the image) and a4 (center point of the image). Reference is made to fig. 1. And then, world coordinates B1 (world coordinates corresponding to a 1), B2 (world coordinates corresponding to a 2), B3 (world coordinates corresponding to a 3) and B4 (world coordinates corresponding to a 4) corresponding to the standard points are taken from the three-dimensional live-action model.
And (3) a connecting line of B4-B2 is made in the three-dimensional live-action model, and 100 interpolation points are generated at vertical intervals according to horizontal intervals of 10 meters in the direction of an extension line of the connecting line of B4-B2. The positions of the virtual cameras fused by the video are placed on the interpolation points, the virtual cameras look at the B4 coordinate points, and then the virtual cameras are rendered into a frame buffer. Reference is made to fig. 3.
Screen coordinates c1, c2, c3 corresponding to B1, B2, B3 are found in the frame buffer. The euclidean distance algorithm is used to find the degree of similarity between c1, c2, c3 and a1, a2, a3, and the coordinate of the interpolation point with the minimum euclidean distance, i.e. the highest degree of similarity, is tentatively set as the position P1 of the virtual camera.
Calculation formula of Euclidean distance of [ (p 1-q 1) 2+ (p 2-q 2) 2+ … + (pn-qn) 2]
It is known that P1 generates 100 interpolation points at horizontal intervals of 1 m and vertical intervals in the left-right direction of the B4-B2 straight line on the extension line of the B4-B2 line with P1 as the center. The position of the virtual camera is placed on these interpolation points, and the virtual camera looks at the B4 coordinate point and then is rendered into the frame buffer. And (3) repeating the step (3) to obtain a temporary position P2 of the virtual camera.
Continuing to adjust down the interpolation interval 0.5,0.3,0.1,0.01. And (4) repeating the step until the position Pn of the virtual camera with the minimum Euclidean distance, namely the optimal fusion effect is obtained.
And 5, projecting the position and the orientation of the video stream (note: the virtual camera looks at B4, and B4 is the world coordinate point of the three-dimensional real scene model corresponding to the video image center point) from the virtual camera obtained in the step 5 into the three-dimensional real scene model, and finishing the effect of fusing the video into the three-dimensional real scene.
The embodiment also discloses an implementation step:
step 1: a frame of video image is cut from the video stream, and the pixel coordinates of 4 index points, a1 (lower left), a2 (lower middle), a3 (lower right), a4 (center point) of the image are taken. Taking the world coordinates B1 (left lower), B2 (lower middle), B3 (right lower), B4 (central point) in the corresponding three-dimensional live-action model
Step 2: and (3) making a B4-B2 connecting line in the three-dimensional live-action model, and generating 100 interpolation points on an extension line of the B4-B2 connecting line according to a horizontal interval of 10 meters and a vertical interval. A virtual camera is placed on these interpolation points, with the camera facing the view B4. Rendering to frame buffer
Step 3: screen coordinates c1, c2, c3 corresponding to B1, B2, B3 are found in the frame buffer. The euclidean distance method was used to find the degree of similarity between c1, c2, c3 and a1, a2, a 3. The interpolation point with the highest similarity is tentatively set as the position P1 of the virtual camera.
Step 4: p1 is on the line of B4-B2, takes P1 as the center, and generates 100 interpolation points along the left-right direction of the B4-B2 straight line according to the horizontal interval of 1 meter and the vertical interval. A virtual camera is placed on these interpolation points, with the camera facing the view B4. Rendering to a frame buffer, repeating the step 3, and obtaining the position P2 of the virtual camera
Step 5: the interpolation interval is set to 0.5,0.3,0.1,0.01. And (4) repeating the step 4. Until the optimal video fusion virtual camera position is found.
Step 6: and 5, projecting the video stream from the position and the orientation of the virtual camera in the step 5 to the three-dimensional live-action model, and finishing the effect of fusing the video to the three-dimensional live-action model.
Frame buffering
Also known as post-frame caching, is a technique in computer graphics that can be used to accelerate the rendering process. At rendering time, the graphics data will be stored in the frame buffer awaiting output to the display.
Video fusion
And projecting the video into the solid three-dimensional model, and viewing the video playing effect in the three-dimensional scene.
Euclidean distance algorithm
Euclidean distance algorithm, also known as Euclidean distance algorithm, is a distance measurement method commonly used in the field of machine learning. If there are two points p and q whose coordinates in n-dimensional space are (p 1, p2, …, pn) and (q 1, q2, …, qn), respectively, the Euclidean distance between p and q is defined as:
√[(p1-q1)2+(p2-q2)2+…+(pn-qn)2]
this distance represents the distance of two points in an n-dimensional space, a measure of the length in Euclidean space as the distance between the points. In the fields of machine-learned classification, clustering algorithms, etc., euclidean distance algorithms are often used to calculate the similarity or distance between samples. In general, if the euclidean distance between two points is smaller, the higher their similarity is, and the farther the distance is, the lower their similarity is.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
While the application has been described above with reference to various embodiments, it should be understood that many changes and modifications can be made without departing from the scope of the application. It is therefore intended that the foregoing detailed description be regarded as illustrative rather than limiting, and that it be understood that it is the following claims, including all equivalents, that are intended to define the spirit and scope of this application. The above examples should be understood as illustrative only and not limiting the scope of the application. Various changes and modifications to the present application may be made by one skilled in the art after reading the teachings herein, and such equivalent changes and modifications are intended to fall within the scope of the application as defined in the appended claims.

Claims (10)

1. The video fusion method based on the three-dimensional scene is characterized by comprising the following steps of:
step 1, acquiring a preset video image sequence, intercepting a single-frame video image at a preset position from the video image sequence, initializing image coordinates on the single-frame video image, and selecting a plurality of standard points at the preset coordinate positions on the video image;
step 2, a three-dimensional real-scene model to be fused with the preset video image sequence is obtained, a coordinate mapping relation between the three-dimensional real-scene model and the video image to be fused is established, and world coordinates corresponding to the standard points are determined from the three-dimensional real-scene model;
step 3, making a connection line from world coordinates corresponding to a first standard point of the video image to world coordinates corresponding to a second standard point of the video image in the three-dimensional live-action model, generating a preset number of interpolation points at vertical intervals on an extension line of the connection line according to a first preset length interval, placing the positions of the video fused virtual cameras on the interpolation points and facing the positions of the world coordinates corresponding to the first standard point, then executing rendering operation, and storing rendered data in a frame buffer;
step 4, obtaining screen coordinates corresponding to other standard points except the first standard point in the frame buffer, obtaining the screen coordinates corresponding to the other standard points through Euclidean distance method, comparing the screen coordinates corresponding to the other standard points with preset coordinate positions of the standard points on the video image, and taking the interpolation point with the highest similarity degree as the first temporary position of the virtual camera;
step 5, a connection line from world coordinates corresponding to a first standard point of the video image to world coordinates corresponding to a second standard point of the video image is made in the three-dimensional live-action model, a preset number of interpolation points are generated at vertical intervals again along the connection line direction by taking the first temporary position as a center according to horizontal intervals of a second preset length, the positions of the virtual cameras fused with the video are placed on the interpolation points, rendering operation is executed again after the positions of the world coordinates corresponding to the first standard point are oriented, and the second temporary position of the virtual camera is determined in the step 4;
and 6, continuing to adjust the interpolation interval and repeating the step 5 until the position with the minimum Euclidean distance is obtained, and projecting the video stream into the three-dimensional real scene model from the obtained position and orientation of the optimal virtual camera.
2. The method of claim 1, wherein the plurality of standard points are 4 standard points of determined positions, which are located at a center point position, a bottom left-most corner, a bottom middle point, and a bottom right-most corner of the single frame video image.
3. The method of claim 2, wherein making a connection from world coordinates corresponding to a first standard point of a video image to world coordinates corresponding to a second standard point of the video image in the three-dimensional live-action model further comprises: the first standard point is the center point coordinate of the image, and the second standard point is the center point coordinate of the bottom of the image.
4. The video fusion method according to claim 1, wherein the first preset length and the second preset length are length values input by a user, the first preset length is initially set to 10 meters, and the second preset length is initially set to 1 meter.
5. The method of claim 1, wherein in the step 4, the similarity obtained by euclidean distance method is expressed as d:
d=√[(p1-q1)2+(p2-q2)2+(p3-q3)2]
wherein, p1, p2, p3 are corresponding screen coordinate values representing standard points in the frame buffer, q1, q2, q3 represent corresponding preset coordinate values on the video image, and the sign of v represents open square.
6. A video fusion system based on three-dimensional scenes, the video fusion system comprising the following modules:
the coordinate point selection module acquires a preset video image sequence, intercepts a single-frame video image at a preset position from the video image sequence, initializes image coordinates on the single-frame video image, and selects a plurality of standard points at the preset coordinate positions on the video image;
the coordinate mapping module is used for acquiring a three-dimensional real-scene model to be fused with the video image sequence, establishing a coordinate mapping relation between the three-dimensional real-scene model and the video image to be fused, and determining world coordinates corresponding to the standard points from the three-dimensional real-scene model;
the virtual camera initial rendering module is used for making a connection line from world coordinates corresponding to a first standard point of a video image to world coordinates corresponding to a second standard point of the video image in the three-dimensional live-action model, generating a preset number of interpolation points at vertical intervals on an extension line of the connection line according to a first preset length interval, placing the positions of the virtual camera fused with the video on the interpolation points and facing the positions of the world coordinates corresponding to the first standard point, then executing rendering operation, and storing rendered data in a frame buffer;
the virtual camera positioning module is used for obtaining screen coordinates corresponding to other standard points except the first standard point in the frame buffer, obtaining the screen coordinates corresponding to the other standard points through Euclidean distance method, comparing the similarity between the screen coordinates corresponding to the other standard points and preset coordinate positions of the standard points on the video image, and taking the interpolation point with the highest similarity degree as the first temporary position of the virtual camera;
a positioning updating module, which is used for making a connection line from world coordinates corresponding to a first standard point of the video image to world coordinates corresponding to a second standard point of the video image in the three-dimensional live-action model, generating a preset number of interpolation points at a vertical interval again according to a horizontal interval of a second preset length along the connection line direction by taking the first temporary position as the center, placing the positions of the video fused virtual cameras on the interpolation points and facing the positions of the world coordinates corresponding to the first standard point, re-executing rendering operation, repeating the step 4 to determine the second temporary position of the virtual cameras,
and the fusion module is used for continuously adjusting the interpolation interval and repeating the functions executed by the positioning updating module until the position of the virtual camera with the minimum Euclidean distance, namely the best fusion effect, is obtained, and the video stream is projected into the three-dimensional real scene model from the obtained position and orientation of the optimal virtual camera.
7. The three-dimensional scene-based video fusion system of claim 6, wherein the plurality of standard points are 4 location-determining standard points located at a center point location, a bottom left-most corner, a bottom middle point, and a bottom right-most corner of the single frame video image.
8. The three-dimensional scene-based video fusion system of claim 7, wherein the making of a connection from world coordinates corresponding to a first standard point of a video image to world coordinates corresponding to a second standard point of the video image in the three-dimensional live-action model further comprises: the first standard point is the center point coordinate of the image, and the second standard point is the center point coordinate of the bottom of the image.
9. The three-dimensional scene-based video fusion system of claim 6, wherein the first preset length and the second preset length are length values entered by a user, the first preset length being initially set to 10 meters and the second preset length being initially set to 1 meter.
10. The video fusion system based on three-dimensional scene as defined in claim 6, wherein the obtaining the screen coordinates corresponding to the rest standard points by euclidean distance method and comparing the screen coordinates with the preset coordinate positions of the plurality of standard points on the video image further comprises: the similarity obtained by Euclidean distance method is expressed as d:
d=√[(p1-q1)2+(p2-q2)2+(p3-q3)2]
wherein, p1, p2, p3 are corresponding screen coordinate values representing standard points in the frame buffer, q1, q2, q3 represent corresponding preset coordinate values on the video image, and the sign of v represents open square.
CN202310884577.2A 2023-07-18 2023-07-18 Video fusion method and system based on three-dimensional scene Active CN116996742B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310884577.2A CN116996742B (en) 2023-07-18 2023-07-18 Video fusion method and system based on three-dimensional scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310884577.2A CN116996742B (en) 2023-07-18 2023-07-18 Video fusion method and system based on three-dimensional scene

Publications (2)

Publication Number Publication Date
CN116996742A true CN116996742A (en) 2023-11-03
CN116996742B CN116996742B (en) 2024-08-13

Family

ID=88525844

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310884577.2A Active CN116996742B (en) 2023-07-18 2023-07-18 Video fusion method and system based on three-dimensional scene

Country Status (1)

Country Link
CN (1) CN116996742B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20110088995A (en) * 2010-01-29 2011-08-04 주식회사 울프슨랩 Method and system to visualize surveillance camera videos within 3d models, and program recording medium
US20180262789A1 (en) * 2016-03-16 2018-09-13 Adcor Magnet Systems, Llc System for georeferenced, geo-oriented realtime video streams
CN112053446A (en) * 2020-07-11 2020-12-08 南京国图信息产业有限公司 Real-time monitoring video and three-dimensional scene fusion method based on three-dimensional GIS
CN112184922A (en) * 2020-10-15 2021-01-05 洛阳众智软件科技股份有限公司 Fusion method, device and equipment of two-dimensional video and three-dimensional scene and storage medium
CN112437276A (en) * 2020-11-20 2021-03-02 埃洛克航空科技(北京)有限公司 WebGL-based three-dimensional video fusion method and system
CN113810626A (en) * 2020-06-15 2021-12-17 浙江宇视科技有限公司 Video fusion method, device and equipment based on three-dimensional map and storage medium
CN113870163A (en) * 2021-09-24 2021-12-31 埃洛克航空科技(北京)有限公司 Video fusion method and device based on three-dimensional scene, storage medium and electronic device
CN115546377A (en) * 2022-12-01 2022-12-30 杭州靖安科技有限公司 Video fusion method and device, electronic equipment and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20110088995A (en) * 2010-01-29 2011-08-04 주식회사 울프슨랩 Method and system to visualize surveillance camera videos within 3d models, and program recording medium
US20180262789A1 (en) * 2016-03-16 2018-09-13 Adcor Magnet Systems, Llc System for georeferenced, geo-oriented realtime video streams
CN113810626A (en) * 2020-06-15 2021-12-17 浙江宇视科技有限公司 Video fusion method, device and equipment based on three-dimensional map and storage medium
CN112053446A (en) * 2020-07-11 2020-12-08 南京国图信息产业有限公司 Real-time monitoring video and three-dimensional scene fusion method based on three-dimensional GIS
CN112184922A (en) * 2020-10-15 2021-01-05 洛阳众智软件科技股份有限公司 Fusion method, device and equipment of two-dimensional video and three-dimensional scene and storage medium
CN112437276A (en) * 2020-11-20 2021-03-02 埃洛克航空科技(北京)有限公司 WebGL-based three-dimensional video fusion method and system
CN113870163A (en) * 2021-09-24 2021-12-31 埃洛克航空科技(北京)有限公司 Video fusion method and device based on three-dimensional scene, storage medium and electronic device
CN115546377A (en) * 2022-12-01 2022-12-30 杭州靖安科技有限公司 Video fusion method and device, electronic equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
宁泽西, 秦绪佳, 陈佳舟: "基于三维场景的视频融合方法", 《计算机科学》, no. 2, pages 281 - 285 *

Also Published As

Publication number Publication date
CN116996742B (en) 2024-08-13

Similar Documents

Publication Publication Date Title
CN109618222B (en) A kind of splicing video generation method, device, terminal device and storage medium
US10521468B2 (en) Animated seek preview for panoramic videos
US20180324415A1 (en) Real-time automatic vehicle camera calibration
WO2021213067A1 (en) Object display method and apparatus, device and storage medium
CN112954450B (en) Video processing method and device, electronic equipment and storage medium
CN107766349B (en) Method, device, equipment and client for generating text
CN110084797B (en) Plane detection method, plane detection device, electronic equipment and storage medium
CN104394422A (en) Video segmentation point acquisition method and device
CN105844256A (en) Panorama video frame image processing method and device
CN111475676B (en) Video data processing method, system, device, equipment and readable storage medium
CN103914876A (en) Method and apparatus for displaying video on 3D map
CN114531553B (en) Method, device, electronic equipment and storage medium for generating special effect video
CN112150603B (en) Initial visual angle control and presentation method and system based on three-dimensional point cloud
CN111491187A (en) Video recommendation method, device, equipment and storage medium
CN110781823A (en) Screen recording detection method and device, readable medium and electronic equipment
JP2019529992A (en) Display device and control method thereof
CN110796664A (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
CN109743566A (en) A kind of method and apparatus of the video format of VR for identification
Xiong et al. Snap angle prediction for 360 panoramas
CN114863071A (en) Target object labeling method and device, storage medium and electronic equipment
CN112906553B (en) Image processing method, apparatus, device and medium
US9058684B1 (en) Level of detail blurring and 3D model data selection
CN112927163A (en) Image data enhancement method and device, electronic equipment and storage medium
CN116996742B (en) Video fusion method and system based on three-dimensional scene
EP3177005B1 (en) Display control system, display control device, display control method, and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant