WO2023207452A1 - 基于虚拟现实的视频生成方法、装置、设备及介质 - Google Patents

基于虚拟现实的视频生成方法、装置、设备及介质 Download PDF

Info

Publication number
WO2023207452A1
WO2023207452A1 PCT/CN2023/083335 CN2023083335W WO2023207452A1 WO 2023207452 A1 WO2023207452 A1 WO 2023207452A1 CN 2023083335 W CN2023083335 W CN 2023083335W WO 2023207452 A1 WO2023207452 A1 WO 2023207452A1
Authority
WO
WIPO (PCT)
Prior art keywords
depth
real
camera
depth map
virtual
Prior art date
Application number
PCT/CN2023/083335
Other languages
English (en)
French (fr)
Inventor
周鑫
李锐
李想
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2023207452A1 publication Critical patent/WO2023207452A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/30Image reproducers
    • H04N13/363Image reproducers using image projection screens
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/122Improving the 3D impression of stereoscopic images by modifying image signal contents, e.g. by filtering or adding monoscopic depth cues
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/271Image signal generators wherein the generated image signals comprise depth maps or disparity maps

Definitions

  • This application relates to the field of virtual reality, and in particular to a video generation method, device, equipment and medium based on virtual reality.
  • Virtual production refers to computer-aided production and visual film production methods. Virtual production includes a variety of methods, such as visualization, performance capture, hybrid virtual production, live LED wall in-camera, etc.
  • LED Light Emitting Diode
  • the reality camera will simultaneously capture the actors, props, and display content on the LED wall, and input the captured content into the computer.
  • the computer outputs the shooting content of the real-life camera in real time.
  • the embodiments of this application provide a video generation method, device, equipment and medium based on virtual reality.
  • the technical solution is as follows:
  • a virtual reality-based video generation method is provided.
  • the method is executed by a computer system.
  • the method includes:
  • the video frame sequence is obtained by collecting the target scene by the real camera.
  • the target scene includes the real foreground and the virtual background.
  • the virtual background is displayed on the physical screen in the real environment;
  • the real foreground depth map includes the depth information from the real foreground to the real camera
  • the virtual background depth map includes the depth information from the virtual background to the real camera after being mapped to the real environment
  • the fused depth map includes depth information from each reference point in the target scene in the real environment to the real camera;
  • a target video with a depth of field effect is generated.
  • a virtual reality-based video generation device which device includes:
  • the acquisition module is used to obtain the target video frame from the video frame sequence.
  • the video frame sequence is obtained by collecting the target scene by the real camera.
  • the target scene includes the real foreground and the virtual background.
  • the virtual background is displayed on the physical screen in the real environment;
  • the acquisition module is also used to obtain the real foreground depth map and the virtual background depth map of the target video frame.
  • the real foreground depth map includes the depth information from the real foreground to the real camera, and the virtual background depth map includes the virtual background after being mapped to the real environment.
  • Realistic camera depth information includes the depth information from the real foreground to the real camera, and the virtual background depth map includes the virtual background after being mapped to the real environment.
  • the fusion module is used to fuse the real foreground depth map and the virtual background depth map to obtain a fused depth map.
  • the fused depth map includes depth information from each reference point in the target scene in the real environment to the real camera;
  • the update module is used to adjust the display parameters of the target video frame according to the fusion depth map and generate the depth of field effect map of the target video frame;
  • the update module is also used to generate a target video with a depth of field effect based on the depth of field effect map of the target video frame.
  • a computer device including: a processor and a memory, At least one instruction, at least one program, code set or instruction set is stored in the memory, and at least one instruction, at least one program, code set or instruction set is loaded and executed by the processor to implement the above aspect of the virtual reality-based video generation method.
  • a computer storage medium stores at least one program code.
  • the program code is loaded and executed by a processor to implement the above aspect of the virtual reality-based video generation method.
  • a computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the virtual reality-based video generation method as described above.
  • a chip includes programmable logic circuits and/or program instructions, and is used to implement virtual reality-based video generation as described above when the electronic device installed with the chip is run. method.
  • a computer system which computer system includes a computer device, a reality camera, and a depth camera; wherein the reality camera is used to collect a target video frame, and the depth camera is used to obtain a realistic foreground of the target video frame.
  • Depth map, computer equipment is used to obtain a virtual background depth map and generate a target video with a depth of field effect.
  • the real-life camera captures the target scene and generates a sequence of video frames. Then, the target video frame is obtained according to the video frame sequence, and the depth information of the target video frame is updated, so that the depth information included in the target video frame is more accurate, and a target video with a depth of field effect is generated based on the target video frame. Since the depth information of the virtual background is more accurate, the video composed of the virtual background and the real foreground is more natural and the display effect is better.
  • Figure 1 is a schematic diagram of a computer system provided by an exemplary embodiment of the present application.
  • Figure 2 is a schematic diagram of a method for generating a depth of field effect map based on virtual reality provided by an exemplary embodiment of the present application;
  • Figure 3 is a schematic flowchart of a virtual reality-based video generation method provided by an exemplary embodiment of the present application
  • Figure 4 is a schematic interface diagram of a virtual reality-based video generation method provided by an exemplary embodiment of the present application
  • Figure 5 is a schematic interface diagram of a virtual reality-based video generation method provided by an exemplary embodiment of the present application.
  • Figure 6 is a schematic interface diagram of a virtual reality-based video generation method provided by an exemplary embodiment of the present application.
  • Figure 7 is a schematic interface diagram of a virtual reality-based video generation method provided by an exemplary embodiment of the present application.
  • Figure 8 is a schematic flowchart of a virtual reality-based video generation method provided by an exemplary embodiment of the present application.
  • Figure 9 is a schematic diagram of calculating depth information provided by an exemplary embodiment of the present application.
  • Figure 10 is a schematic diagram of a method for generating a depth of field effect map based on virtual reality provided by an exemplary embodiment of the present application
  • Figure 11 is a schematic diagram of video generation and production based on virtual reality provided by an exemplary embodiment of the present application.
  • Figure 12 is a schematic interface diagram of a method for generating a depth of field effect map based on virtual reality provided by an exemplary embodiment of the present application;
  • Figure 13 is a schematic diagram of a computer device provided by an exemplary embodiment of the present application.
  • Depth Of Field refers to the relatively clear imaging range in front and behind the camera's focus point. In optics, especially video or photography, it is a description of the distance range in space that can be clearly imaged. The lens used by the camera can only focus the light to a certain fixed distance, and the image far away from this point will gradually blur. However, within a certain distance, the degree of image blur is invisible to the naked eye. This certain distance Call it depth of field.
  • Realistic prospects physical objects in a real environment, generally including actors and surrounding real overlay scenes. Place the object close to the camera as the foreground of the camera.
  • Virtual Backgrounds Pre-designed virtual environments. Virtual scenery generally includes scenes that are difficult to create in reality and quasi-magical scenes. After calculation by the program engine, it is output to the LED curtain wall and serves as the background of the camera behind the real scenery.
  • YUV A color encoding method used in video processing components. Among them, “Y” represents brightness (Luminance or Luma), which is the grayscale value, and “U” and “V” represent chrominance (Chrominance or Chroma), which is used to describe the color. Color and saturation are used to specify the color of pixels.
  • Intrinsic parameters are parameters related to the characteristics of the camera itself.
  • intrinsic parameters include the focal length, pixel size, etc. of the camera.
  • the external parameters are the parameters of the camera in the world coordinate system.
  • the external parameters include the camera's position, rotation direction, etc. Points in the world coordinate system can be mapped to pixels captured by the camera through internal and external parameters.
  • the information including but not limited to user equipment information, user personal information, etc.
  • data including but not limited to data used for analysis, stored data, displayed data, etc.
  • signals involved in this application All are authorized by the user or fully authorized by all parties, and the collection, use and processing of relevant data need to comply with relevant laws, regulations and standards of relevant countries and regions.
  • the real camera can only obtain the background information from the real camera to the LED curtain wall, but cannot obtain the depth information of the virtual background, resulting in a lack of realism in the virtual background, and the combination of the real foreground and the virtual background is stiff.
  • Figure 1 shows a schematic diagram of a computer system provided by an exemplary embodiment of the present application.
  • Computer system 100 includes computer device 110, physical screen 120, reality camera 130, and depth camera 140.
  • At least a first application program for video production is installed on the computer device 110 .
  • the first application program may be a small program in an app (application), a specialized application program, or a web client.
  • a second application program for generating a virtual world is also installed on the computer device 110 .
  • the second application may be a small program in an app, a specialized application, or a web client.
  • first application program and the second application program may be the same application program or may be different application programs.
  • the application program can realize the functions of video production and virtual world generation at the same time.
  • data exchange can be achieved between the first application program and the second application program.
  • the second application program sends the depth information of each virtual object in the virtual world to the virtual camera to the first application program, and the first application program updates the image of the video frame based on the aforementioned depth information.
  • Physical screen 120 is used to display the virtual background.
  • the computer device 110 transmits data of the virtual world to the physical screen 120, which displays the virtual world as a virtual background.
  • the real camera 130 is used to capture the real foreground 150 and the virtual background displayed on the physical screen 120 .
  • the reality camera 130 will transmit the captured video to the computer device 110 .
  • the reality camera 130 can transmit the captured video to the computer device 110 in real time, and the reality camera 130 can also transmit the captured video to the computer device 110 every preset time period.
  • the depth camera 140 is used to obtain a depth map of the real foreground 150 .
  • the depth camera 140 and the reality camera 130 are installed in different locations.
  • the depth camera 140 will transmit the captured depth map of the real foreground 150 to the computer device 110 , and the computer device 110 will determine the depth information of the real foreground 150 based on the depth map.
  • Figure 2 shows a schematic diagram of a method for generating a depth-of-field effect map based on virtual reality provided by an exemplary embodiment of the present application. The method may be performed by the computer system 100 shown in FIG. 1 .
  • the real-life camera 210 collects the target scene to obtain the target video frame 220.
  • the depth camera 250 will also collect the target scene to obtain a depth map.
  • the pixels in the target video frame 220 and the depth map are matched, and the depth information in the depth map is provided to the target video frame 220 to obtain a realistic foreground depth map 260.
  • the virtual camera 230 obtains the depth information of the rendering target corresponding to the virtual background, and obtains the virtual background depth map 240.
  • the depth information of the real foreground depth map 260 and the virtual background depth map 240 is fused to obtain a fused depth map 270. Add a depth of field effect to the target video frame 220 according to the fusion depth map 270 to obtain a depth of field effect map 280.
  • Figure 3 shows a schematic flowchart of a virtual reality-based video generation method provided by an exemplary embodiment of the present application.
  • the method may be executed by the computer system 100 shown in FIG. 1 .
  • computer system 100 includes computer device 110, physical screen 120, reality camera 130, and depth camera 140.
  • computer The device 110 is used to implement the functions of video production and generating virtual worlds
  • the physical screen 120 is used to display the virtual background
  • the reality camera 130 is used to capture the real foreground 150 and the virtual background displayed on the physical screen 120
  • the depth camera 140 is used to obtain the real foreground 150 depth map.
  • the virtual reality-based video generation method includes the following steps:
  • Step 302 Obtain the target video frame from the video frame sequence.
  • the video frame sequence is obtained by collecting the target scene by the real camera.
  • the target scene includes the real foreground and the virtual background.
  • the virtual background is displayed on the physical screen in the real environment.
  • the reality camera transmits the sequence of captured video frames to the computer device.
  • the target video frame is an image of any frame in the video frame sequence.
  • the video frame sequence includes 120 frames of images, and the 45th image frame is randomly selected as the target video frame.
  • the realistic prospect includes at least one of real objects, real creatures, and real people.
  • the realistic foreground 401 is an actor at the video shooting scene.
  • the virtual background refers to the virtual content displayed on the display screen.
  • the virtual content includes at least one of a virtual environment, a virtual character, a virtual object, a virtual prop, and a virtual image.
  • the embodiments of this application do not specifically limit the display content of the virtual background.
  • the virtual background 402 is displayed on the display screen 403 .
  • the virtual background 402 is a virtual image of the city.
  • the real foreground is captured by a real camera and the virtual background is displayed by a physical screen.
  • the physical screen can be an LED wall
  • the virtual background can be projected onto the LED wall.
  • Step 304 Obtain the real foreground depth map and the virtual background depth map of the target video frame.
  • the real foreground depth map includes the depth information from the real foreground to the real camera
  • the virtual background depth map includes the virtual background mapped to the real environment to the real camera. depth information.
  • the depth information of each pixel point in the target video frame is obtained through the depth map provided by the depth camera, and the depth information is used to represent the distance from the real reference point corresponding to each pixel point in the target video frame to the real camera; in the first step When the depth value of a pixel is greater than the first depth threshold, it is determined that the first pixel belongs to the virtual background; when the depth value of the second pixel is greater than the second depth threshold, it is determined that the second pixel belongs to the real foreground.
  • the first depth threshold is not less than the second depth threshold, and the first depth threshold and the second depth threshold can be set by technicians themselves.
  • Figure 5 shows a schematic diagram of a realistic foreground depth map provided by an exemplary embodiment of the present application.
  • the real foreground depth map includes the depth information from the real foreground to the real camera, and the real foreground depth map does not include the depth information from the virtual background to the real camera after being mapped to the real environment.
  • a virtual background depth map is generated through a virtual camera, which is used to capture a rendering target corresponding to the virtual background in the virtual environment.
  • the computer device can obtain the distance from the rendering target to the virtual camera, convert the distance into a real distance, and obtain a virtual background depth map.
  • Figure 6 shows a schematic diagram of a virtual background depth map provided by an exemplary embodiment of the present application.
  • the virtual background depth map includes the depth information from the virtual background to the real camera after being mapped to the real environment, and the virtual background depth map does not include the depth information from the real foreground to the real camera.
  • Step 306 Fusion of the real foreground depth map and the virtual background depth map to obtain a fused depth map.
  • the fused depth map includes depth information from each reference point in the target scene in the real environment to the real camera.
  • the first depth information of each pixel in the real foreground depth map is updated to obtain a fused depth map.
  • the real foreground depth map includes a foreground area corresponding to the real foreground and a background area corresponding to the virtual background.
  • Embodiments of the present application can update the first depth information of pixels in the background area, and can also update the first depth information of pixels in the foreground area.
  • the first depth information of the first pixel belonging to the background area in the real foreground depth map is updated to obtain the fused depth map.
  • the third depth information of the third pixel belonging to the foreground area in the real foreground depth map is updated, and the third pixel is the pixel corresponding to the target object.
  • the fused depth map includes depth information of the real foreground 701 and the virtual background 702. Comparing Figure 5 and Figure 7, it can be seen that compared with the real foreground depth map, the fused depth map also provides depth information of the virtual background 702.
  • Step 308 Adjust the display parameters of the target video frame according to the fusion depth map to generate a depth effect map of the target video frame.
  • the display parameters include at least one of sharpness, brightness, grayscale, contrast, and saturation.
  • the distance interval is used to represent the distance from the reference point corresponding to the pixel point with a clarity greater than the clarity threshold to the real camera; according to the fusion depth map and distance interval, adjust the clarity of each pixel in the target video frame, and generate a depth-of-field effect map of the target scene.
  • the clarity of each pixel in the target video frame according to preset conditions.
  • the preset conditions are determined by technicians based on actual needs. For example, the definition of pixels in a preset area in the target video frame is adjusted.
  • the preset area can be set by technicians themselves.
  • Step 310 Generate a target video with a depth of field effect based on the depth of field effect map of the target video frame.
  • the target video frame includes at least two video frames, and the depth-of-field effect maps of the target video frames are arranged in chronological order to obtain a target video with a depth-of-field effect.
  • steps 360, 380, and 310 are implemented by computer equipment in a computer system.
  • the computer device 110 establishes wireless or wired communication with the depth camera 140, which can realize the information transmission of the real foreground depth map, and then fuse it with the virtual background depth map to obtain a target video with a depth of field effect.
  • the real camera captures the target scene and generates a video frame sequence. Then, the target video frame is obtained according to the video frame sequence, and the depth information of the target video frame is updated, so that the depth information included in the target video frame is more accurate, and a target video with a depth of field effect is generated based on the target video frame. Since the depth information of the virtual background is more accurate, the video composed of the virtual background and the real foreground is more natural and the display effect is better.
  • the depth-of-field rendering is generated through the focusing distance, focal length, aperture and other parameters of the real camera. Therefore, the virtual background in the depth-of-field rendering can simulate the shooting effect of the real camera.
  • the display of the virtual background is more natural and the virtual background is closer. Realistic objects.
  • the depth information of the background area in the real foreground depth map is updated to make the depth information of the background area more accurate.
  • the real scene depth map including two optional implementation methods as an example, the real scene depth map can be obtained by setting a depth camera or by setting an auxiliary camera.
  • the clarity of the pixels of the virtual background can be updated through the parameters of the real camera, for example, the clarity of the pixels can be updated through the focus distance, aperture, focal length, etc.
  • Figure 8 shows a schematic flowchart of a virtual reality-based video generation method provided by an exemplary embodiment of the present application. The method can be executed by the computer system 100 shown in Figure 1.
  • the computer system 100 shown in Figure 1.
  • FIG 1 For relevant descriptions of the computer system 100, reference can be made to the foregoing content and will not be described again.
  • the virtual reality-based video generation method includes the following steps:
  • Step 801 Obtain the target video frame from the video frame sequence.
  • the reality camera transmits the sequence of captured video frames to the computer device.
  • the target video frame is an image of any frame in the video frame sequence.
  • the video frame sequence includes 120 frames of images, and the 45th image frame is randomly selected as the target video frame.
  • the realistic prospect includes at least one of real objects, real creatures, and real people.
  • the virtual background refers to the virtual content displayed on the display screen.
  • the virtual content includes at least one of a virtual environment, a virtual character, a virtual object, a virtual prop, and a virtual image.
  • the embodiments of this application do not specifically limit the display content of the virtual background.
  • Step 802 Obtain the realistic foreground depth map of the target video frame.
  • the depth information of the real foreground is obtained through a depth camera.
  • the method may include the following steps:
  • the internal and external parameters of the depth camera include internal parameters and external parameters.
  • Intrinsic parameters are parameters related to the characteristics of the depth camera itself. Intrinsic parameters include focal length, pixel size, etc.
  • the external parameters are the parameters of the depth camera in the world coordinate system. The external parameters include the camera's position, rotation direction, etc.
  • the internal and external parameters of the real camera include internal parameters and external parameters.
  • Intrinsic parameters are parameters related to the characteristics of the real camera itself. Intrinsic parameters include focal length, pixel size, etc.
  • the external parameters are the parameters of the real camera in the world coordinate system. The external parameters include the camera's position, rotation direction, etc.
  • the spatial offset information refers to the mapping relationship between the camera coordinate system of the depth camera and the camera coordinate system of the real camera. For example, based on the internal and external parameters of the depth camera, the depth mapping relationship of the depth camera is determined.
  • the depth mapping relationship refers to the mapping relationship between the camera coordinate system of the depth camera and the real coordinate system; based on the internal and external parameters of the real camera, the real mapping of the real camera is determined.
  • Relationship, the reality mapping relationship refers to the mapping relationship between the camera coordinate system of the real camera and the real coordinate system; through the real coordinate system, the mapping relationship between the camera coordinate system of the depth camera and the camera coordinate system of the real camera is determined, and the depth camera and the real world coordinate system are obtained. Spatial offset information between cameras.
  • the depth camera and the reality camera capture the target scene from different angles.
  • the depth camera and the reality camera are set in different locations.
  • the depth camera transmits the collected depth map to the computer device through a wired connection or a wireless connection.
  • the spatial offset information refers to the mapping relationship between the camera coordinate system of the depth camera and the camera coordinate system of the real camera. Therefore, the correspondence between the depth map and the pixels on the target video frame can be obtained; based on this correspondence, the depth information of each pixel on the depth map is mapped to each pixel on the target video frame to obtain the realistic foreground Depth map.
  • the depth information of the real foreground is obtained through another reference camera.
  • the method may include the following steps:
  • the first mapping relationship is used to represent the mapping relationship between the camera coordinate system of the reference camera and the real coordinate system.
  • the reference camera is used to capture the target scene from the second angle.
  • the second angle is different from the first angle, which is the shooting angle of the real camera.
  • the internal and external parameters of the reference camera include internal parameters and external parameters.
  • Intrinsic parameters are parameters related to the characteristics of the reference camera itself. Intrinsic parameters include focal length, pixel size, etc.
  • the external parameters are the parameters of the reference camera in the world coordinate system. The external parameters include the camera's position, rotation direction, etc.
  • the first mapping relationship is also used to represent the positional correspondence between the pixel points on the reference image captured by the reference camera and the real point.
  • the coordinates of pixel A on the reference image are (x1, x2), and the first mapping relationship satisfies the functional relationship f.
  • the functional relationship f is generated based on the internal and external parameters of the reference camera, and corresponds to pixel A in the real environment.
  • the second mapping relationship is used to represent the mapping relationship between the camera coordinate system of the real camera and the real coordinate system.
  • the second mapping relationship is also used to represent the positional correspondence between the pixel points on the target video frame captured by the real camera and the real point.
  • the coordinates of pixel point B on the reference image are (x3, x4)
  • the first mapping relationship satisfies the function relationship Tie Functional relationship is generated based on the internal and external parameters of the real camera, and the real point corresponding to pixel point B in the real environment is
  • the reconstructed reference image includes the position of the reference point corresponding to each pixel point in the reference image in the real environment.
  • the reconstructed target scene image includes the position of the reference point corresponding to each pixel in the target video frame in the real environment.
  • the reconstructed reference image and the reconstructed target scene image are mapped to the same plane, and the two pixel points corresponding to the same real point on the reconstructed reference image and the reconstructed target scene image are determined; according to The disparity of the aforementioned two pixels determines the depth information of each pixel in the target video frame, and obtains a realistic foreground depth map.
  • the reconstructed reference image and the reconstructed target scene image are mapped to the plane where the X-axis is located in advance, so that the distance from the reference point 903 to the X-axis Same as the distance from reference point 903 to the X-axis.
  • the reference point 903 forms a pixel point 901 on the target video frame through the center point 904 of the real camera, and the reference point 903 forms a pixel point 902 on the reference image through the center point 905 of the reference camera.
  • f is the focal length of the real camera and the reference camera, and the focal lengths of the real camera and the reference camera are the same.
  • z is the distance from the reference point 903 to the real camera, that is, the depth information from the reference point 903 to the real camera.
  • x is the distance from the reference point 903 to the Z axis.
  • x1 is the position of pixel point 901 on the target video frame.
  • xr is the position of pixel 902 on the reference image.
  • the embodiment of the present application does not specifically limit the method of obtaining the realistic foreground depth map of the target video frame.
  • technicians can choose other methods to obtain the realistic foreground depth map of the target video frame according to actual needs, which will not be described again here.
  • Step 803 Obtain the virtual background depth map of the target video frame.
  • the virtual background depth map is obtained through a virtual camera.
  • the method may include the following steps:
  • the rendering target depth map includes virtual depth information.
  • the virtual depth information is used to represent the distance from the rendering target to the virtual camera in the virtual environment.
  • the computer device stores various data of the virtual environment, where the data includes distances from the virtual camera to various points in the virtual environment. After determining the rendering target, the distance from the rendering target to the virtual camera can be directly determined to obtain the rendering target depth map.
  • the virtual camera is located at position A in the virtual environment, and the real camera is located at position B in the virtual environment.
  • the distance from point 1 in the virtual environment to the virtual camera is x.
  • y the distance between the target and the real camera
  • y f(x)
  • f represents the functional relationship.
  • the rendering target depth map is an image obtained from the virtual camera's perspective
  • the virtual background depth map is an image obtained from the real camera's perspective.
  • coordinate transformation needs to be performed on the pixel points in the rendering target depth map.
  • step 802 and step 803 can be executed at the same time, or step 802 can be executed first and then step 803, or step 803 can be executed first and then step 802.
  • Step 804 In the virtual background depth map, determine the j-th second pixel corresponding to the i-th first pixel belonging to the background area in the real foreground depth map.
  • i and j are positive integers, and the initial value of i can be any integer.
  • step 804 can be implemented as follows: obtain the internal and external parameters of the real camera and the virtual camera respectively; align the internal and external parameters of the real camera and the virtual camera so that the pixels of the virtual background depth map and the foreground depth map correspond; Subsequently, based on the correspondence between the virtual background depth map and the foreground depth map, the j-th second pixel corresponding to the i-th first pixel belonging to the background area is determined.
  • the intrinsic parameters of the camera are parameters related to the characteristics of the camera itself, such as the focal length, pixel size, etc. of the camera; while the extrinsic parameters of the camera are the parameters of the camera in the world coordinate system, such as the camera's position, rotation direction, etc.
  • the intrinsic parameters of the camera are parameters related to the characteristics of the camera itself, such as the focal length, pixel size, etc. of the camera; while the extrinsic parameters of the camera are the parameters of the camera in the world coordinate system, such as the camera's position, rotation direction, etc.
  • points in the world coordinate system can be mapped to pixels captured by the camera through internal and external parameters.
  • the position coordinates of the j-th second pixel point are determined in the virtual background depth map.
  • the position coordinate of the i-th first pixel in the real foreground depth map is (4, 6)
  • the position coordinate of the j-th second pixel in the virtual background depth map is also (4, 6).
  • Step 805 Use the second depth information of the j-th second pixel to replace the first depth information of the i-th first pixel in the real foreground depth map.
  • the depth value in the second depth information of the j-th second pixel is used to replace the depth value in the first depth information of the i-th first pixel in the real foreground depth map.
  • the depth value of the i-th first pixel is 20, and the depth value of the j-th second pixel corresponding to the i-th first pixel is 80, then the i-th The depth value of the first pixel is modified to 80.
  • the j-th second pixel before using the second depth information of the j-th second pixel to replace the first depth information of the i-th first pixel, the j-th second pixel may also be The second depth information of the point is modified.
  • the first target depth value, the second target depth value and the third target depth value can all be set by technicians themselves. For example, suppose there are three second pixels, and the depth values of the three second pixels are 20, 43, and 36 respectively. Set the depth values of these three second pixels to 40 uniformly.
  • Step 806 Update i to i+1, and repeat the above two steps until the background area in the real foreground depth map is traversed.
  • the first depth information of each first pixel in the domain is obtained to obtain a fused depth map.
  • each first pixel point belonging to the background area in the real foreground depth map is traversed until the first depth information of each first pixel point in the background area is replaced with the first pixel point.
  • the second depth information of two pixels is a first pixel point belonging to the background area in the real foreground depth map.
  • Step 807 Adjust the display parameters of the target video frame according to the fusion depth map to generate a depth effect map of the target video frame.
  • the display parameters include at least one of sharpness, brightness, grayscale, contrast, and saturation.
  • the distance interval is used to represent the distance from the reference point corresponding to the pixel point with a clarity greater than the clarity threshold to the real camera; according to the fusion depth map and distance interval, adjust the clarity of each pixel in the target video frame, and generate a depth-of-field effect map of the target video frame. For example, if the distance interval is [0, 20], then the clarity of the pixels located within the distance interval is set to 100%, and the clarity of the pixels located outside the distance interval is set to 40%.
  • the clarity of each pixel in the target video frame according to preset conditions.
  • the preset conditions are determined by technicians based on actual needs. For example, the definition of pixels in a preset area in the target video frame is adjusted.
  • the preset area can be set by technicians themselves.
  • Step 808 Generate a target video with a depth of field effect based on the depth of field effect map of the target video frame.
  • the target video frame includes at least two video frames, and the depth-of-field effect maps of the target video frames are arranged in chronological order to obtain a target video with a depth-of-field effect.
  • the real camera captures the target scene and generates a video frame sequence. Then, the target video frame is obtained according to the video frame sequence, and the depth information of the target video frame is updated, so that the depth information included in the target video frame is more accurate, and a target video with a depth of field effect is generated based on the target video frame. Since the depth information of the virtual background is more accurate, the video composed of the virtual background and the real foreground is more natural and the display effect is better.
  • this embodiment provides multiple methods to obtain the real foreground depth map, so that technicians can adjust the method of obtaining the real foreground depth map according to actual needs.
  • the depth information of the real foreground can be obtained not only through a depth camera, but also through two real cameras. Depth information of realistic prospects increases the flexibility of the solution.
  • the depth information of the background area in the fused depth map is obtained by updating the virtual background depth map, and the depth information of the virtual background depth map is generated by the virtual camera collecting the virtual environment, the depth information obtained in this way is more accurate, and we get The depth of field rendering is more in line with actual needs and has better performance.
  • this embodiment also provides multiple optional implementation methods for obtaining the real foreground depth map and the virtual background depth map, and technicians can choose according to actual needs.
  • the depth information of the real foreground is obtained through a depth camera; another example is the depth information of the real foreground is obtained through two real-life cameras, thereby increasing the flexibility of the video generation method.
  • this embodiment also provides a specific generation method of the fusion depth map.
  • the fusion of the virtual background depth map and the real foreground depth map is achieved. Based on the replacement of pixels one by one, the fusion result can be made more accurate, thereby making the display effect of the obtained depth of field effect map better.
  • this embodiment also provides a specific method of generating the depth of field effect map.
  • the depth of field effect map can be generated based on the relevant information of the real camera, such as determining the distance interval based on the preset aperture or preset focal length of the real camera, and then adjusting the clarity of the pixels based on the fused depth map and distance interval to generate the depth of field effect map.
  • technicians can also choose the specific generation method of the depth of field effect map according to actual needs, further increasing the flexibility of the video generation method.
  • the depth information of the real foreground can also be adjusted so that the depth information of the real foreground meets the preset requirements.
  • Figure 10 shows a schematic flowchart of a virtual reality-based video generation method provided by an exemplary embodiment of the present application. picture. The method can be executed by the computer system 100 shown in Figure 1.
  • the computer system 100 shown in Figure 1.
  • the virtual reality-based video generation method includes the following steps:
  • Step 1001 Determine the third pixel belonging to the target object in the foreground area of the realistic foreground depth map.
  • the third pixel is the pixel corresponding to the target object.
  • Target objects are objects in the real environment.
  • the foreground area is used to indicate the area corresponding to the real foreground in the real foreground depth map, such as the area corresponding to real objects and/or real people.
  • the background area is used to indicate the area corresponding to the virtual background in the real foreground depth map, such as the area corresponding to the LED wall.
  • the depth information of each pixel point in the target video frame is obtained through the depth map provided by the depth camera, and the depth information is used to represent the distance from the real reference point corresponding to each pixel point in the target video frame to the real camera; in the first step When the depth value of a pixel is greater than the first depth threshold, it is determined that the first pixel belongs to the virtual background; when the depth value of the second pixel is greater than the second depth threshold, it is determined that the second pixel belongs to the real foreground.
  • the first depth threshold is not less than the second depth threshold, and the first depth threshold and the second depth threshold can be set by technicians themselves.
  • the pixels in the foreground area belonging to the depth threshold interval are determined as the third pixels.
  • the depth threshold interval can be set by technicians themselves.
  • the pixels in the foreground area belonging to the target object area are determined as the third pixels.
  • the target object area can be set by the technician himself.
  • the third pixel is any pixel in the foreground area.
  • Step 1002 In response to the depth value update instruction, update the depth value of the third pixel.
  • the depth value of the third pixel point is set to the first preset depth value.
  • the first preset depth value can be set by technicians according to actual needs.
  • step 1002 may be implemented as follows: determine a depth value setting instruction according to the desired position of the target object, and set the depth value of the third pixel point to the first preset depth value according to the depth value setting instruction.
  • the target object is inconvenient to move, or the depth value of the target object is expected to be a large value, but due to site restrictions, the target object cannot be moved to the desired position (i.e., the desired position) , at this time, you can choose to uniformly set the depth value of the third pixel corresponding to the target object to the first preset depth value, so that the depth information of the target object in the depth of field effect map meets the actual needs, that is, the target object is in the depth of field.
  • the rendering is at the desired location.
  • a second preset depth value is added to the depth value of the third pixel point.
  • the second preset depth value can be set by technicians according to actual needs.
  • the depth value of the third pixel point is reduced by a third preset depth value.
  • the third preset depth value can be set by technicians according to actual needs.
  • step 1002 can be implemented as follows: when the distance between the target object and the real camera is greater than the distance between the desired position of the target object and the real camera, determine the depth value increase instruction, and set the third depth value increase instruction according to the depth value increase instruction.
  • the depth value of the pixel is increased by the second preset depth value; or, when the distance between the target object and the real camera is less than the distance between the desired position of the target object and the real camera, a depth value reduction instruction is determined and the depth value reduction instruction is determined. Reduce the third preset depth value for the depth value of the third pixel.
  • the target object can exchange positions with other objects, or it is hoped that the target object can move in front of other objects, or it is hoped that the target object can be moved behind other objects.
  • the above scenarios can be understood as the real-time position of the target object is different from the expected position.
  • the depth value of the third pixel corresponding to the target object can be changed so that the generated depth of field effect map can reflect the distance between the target object and other objects.
  • Positional relationship For example, there is a reference object in the real foreground (which can be understood as the expected position of the target object).
  • the depth value corresponding to the pixel point of the reference object is 10, which means that the reference object is 10 meters away from the real camera, and the pixel point corresponding to the target object
  • the corresponding depth value is 15, indicating that the target object is 15 meters away from the real camera
  • the actual demand is that the real foreground depth map can reflect that the distance between the target object and the real camera is smaller than the distance between the reference object and the real camera, and the reference object and the real camera
  • the distance can be understood as the distance between the desired position of the target object and the real camera.
  • the depth value of the third pixel corresponding to the target object by 8 (that is, according to the depth value reduction instruction, the depth value of the third pixel point corresponding to the target object is The depth value of the three pixels is reduced by the third preset depth value), then the depth value of the target object corresponding to the third pixel is 7, which can meet the above actual requirements.
  • the technician can input the depth value setting command and set the depth value of the third pixel corresponding to the tree to 40. In this way, the depth of field effect map and the display effect of the target video can be obtained. It shows that the tree is 40 meters away from the real camera, and there is no need to move the tree to achieve such a display effect, which is easy to operate and highly efficient.
  • Tree A and tree B there are tree A and tree B in the real foreground.
  • Tree A is 20 meters away from the real camera, and tree B is 25 meters away from the real camera.
  • tree A In the target video that the technician hopes to shoot, tree A is in the middle of the tree. behind B, and because directly moving tree A or tree B is not an easy solution to implement.
  • the technician can directly set the depth value of tree A to 30 through the depth value setting command.
  • tree A is 30 meters away from the real camera, and tree B is still 25 meters away from the real camera. m, satisfying the display effect of tree A behind tree B.
  • the technician can also increase the depth value of tree A by 15 through the depth increase command, so that the depth value of tree A is 35.
  • tree A is 35 meters away from the real camera.
  • tree B is still 25 meters away from the real camera, which satisfies the display effect of tree A behind tree B.
  • the technician can also use the depth value reduction command to reduce the depth value of tree B by 10, so that the depth value of tree B is 15.
  • tree A is far away from the real camera. It is still 20 meters, and tree B is 15 meters away from the real camera, which meets the display effect of tree A behind tree B.
  • this embodiment can modify each third pixel point in the foreground area so that the depth information of the pixel points in the foreground area meets the requirements, which can not only reduce the movement of objects in the real foreground, but also make the depth of the real foreground Information is more accurate.
  • the depth information of each third pixel in the foreground area can be directly adjusted according to the actual needs of the technician, so that the foreground area in the target video or depth of field effect map can present the appearance that the technician hopes. display effect.
  • FIG 11 shows a schematic diagram of a method for generating a depth of field effect map based on virtual reality provided by an exemplary embodiment of the present application.
  • This method is implemented in the form of a plug-in in UE4 (Unreal Engine 4, Unreal 4 engine).
  • this method can also be implemented in the form of a plug-in in Unity3D (a real-time 3D interactive content creation and operation platform, which belongs to the creation engine and development tool).
  • the embodiments of this application do not specifically limit the application platform of this method.
  • the virtual reality-based depth of field rendering generation method is implemented through the virtual production depth of field plug-in 1101. The specific steps are as follows:
  • the two threads of the plug-in are processed synchronously.
  • Realistic foreground depth processing thread Processes data including the reality camera 1102 and the depth camera 1104, including converting from original YUV to RGBA, combined with the depth information provided by the depth camera 1104 using opencv (a cross-platform computer vision and machine learning software library). Obtain the depth information of the real foreground under the real camera 1102. And in this real foreground depth processing thread, use dx11 shared texture to copy the virtual background depth map to the current thread, merge it into a fused depth map 1107 containing the depth information of the real foreground and virtual background, and use Compute shader (a computer Technology can realize parallel processing of GPU.
  • GPU Graphics Processing Unit, graphics processor
  • Virtual background depth processing thread Get the depth information from the Render Target corresponding to the virtual camera 1103, generate a virtual background depth map 1106, and copy it to the shared texture in the real foreground depth processing thread.
  • Select the depth camera Calibrate the internal and external parameters of the depth camera 1104. According to the internal and external parameters of the depth camera 1104 and the internal and external parameters of the real camera 1102, the depth map captured by the depth camera 1104 is mapped to the real camera 1102 to obtain the depth map of the real camera 1102 in the corresponding scene, that is, the real foreground depth map 1105 is obtained.
  • auxiliary camera calibrate the internal and external parameters of the auxiliary camera, based on the internal and external parameters of the auxiliary camera Perform stereoscopic correction of the image to obtain the corrected mapping relationship. Perform stereoscopic correction of the image according to the internal and external parameters of the real camera 1102 to obtain a corrected mapping relationship. Then in each frame of data processing, the aforementioned two mapping relationships are used to reconstruct the data provided by the two cameras, and then a disparity map is generated. The depth information of the target video frame is obtained based on the disparity map, and a realistic foreground depth map 1105 is obtained.
  • the depth information is obtained from the rendering target, and the depth information is converted into the corresponding realistic linear distance to obtain the virtual background depth map 1106, which is then synchronously copied to Realistic foreground depth processing thread in another texture.
  • the data processing thread of the real camera data it is merged into the fusion depth map 1107, so that in the imaging of the real camera, the clarity of the pixels of the virtual background is known based on the focus distance; the clarity of the display is determined based on the set aperture or focal length. The maximum and minimum distances of pixels, as well as the blur degree of the blurred area. According to the above-mentioned parameter matching and fusion depth map 1107, it is possible to determine how to display the pixels and generate the final depth effect map 1108.
  • FIG. 12 shows a schematic diagram of a virtual reality-based video generation device provided by an embodiment of the present application.
  • the above functions can be implemented by hardware, or can be implemented by hardware executing corresponding software.
  • the device 1200 includes:
  • the acquisition module 1201 is used to acquire target video frames from a video frame sequence.
  • the video frame sequence is obtained by collecting a target scene with a real camera.
  • the target scene includes a real foreground and a virtual background.
  • the virtual background is displayed in the real environment. on the physical screen in;
  • the acquisition module 1201 is also used to acquire the real foreground depth map and the virtual background depth map of the target video frame.
  • the real foreground depth map includes the depth information from the real foreground to the real camera.
  • the virtual background includes depth information from the virtual background mapped to the real environment to the real camera;
  • the fusion module 1202 is used to fuse the real foreground depth map and the virtual background depth map to obtain a fused depth map.
  • the fused depth map includes the distance from each reference point in the target scene to the real environment. Realistic camera depth information;
  • Update module 1203 configured to adjust the display parameters of the target video frame according to the fusion depth map, and generate a depth effect map of the target video frame;
  • the update module 1203 is also configured to generate a target video with a depth of field effect based on the depth of field effect map of the target video frame.
  • the real foreground depth map includes a background area corresponding to the virtual background; the fusion module 1202 is also configured to determine whether the real foreground depth map belongs to the background area according to the background area corresponding to the virtual background depth map.
  • the second depth information of the second pixel point is used to update the first depth information of the first pixel point belonging to the background area in the real foreground depth map to obtain the fusion depth map.
  • the acquisition module 1201 is also used to determine, in the virtual background depth map, the i-th first pixel belonging to the background area in the real foreground depth map.
  • i and j are positive integers; use the second depth information of the j-th second pixel to replace the i-th in the realistic foreground depth map.
  • the first depth information of each first pixel point update i to i+1, and repeat the above two steps until the first pixel points belonging to the background area in the real foreground depth map are traversed.
  • the first depth information is used to obtain the fused depth map.
  • the acquisition module 1201 is also used to determine the position of the i-th first pixel of the background area on the physical screen according to the internal and external parameters of the real-life camera.
  • Screen coordinates in the virtual environment, determine the coordinates of the virtual point corresponding to the i-th first pixel point according to the screen coordinates; map the coordinates of the virtual point to the virtual point according to the internal and external parameters of the virtual camera
  • the j-th second pixel is obtained, and the virtual camera is used to capture a rendering target corresponding to the virtual background in the virtual environment.
  • the realistic foreground depth map includes a foreground area corresponding to the realistic foreground; the fusion module 1202 is also used to update the real foreground depth map belonging to the foreground area.
  • the fusion module 1202 is also used to calculate the depth map of the realistic foreground.
  • a third pixel belonging to the target object is determined; in response to a depth value update instruction, the depth value of the third pixel is updated.
  • the fusion module 1202 is also configured to set the depth value of the third pixel point to the first preset depth value according to the depth value setting instruction; or, according to the depth value increase an instruction to increase the depth value of the third pixel by a second preset depth value; or, according to a depth value reduction instruction, to decrease the depth value of the third pixel by a third preset depth value.
  • the acquisition module 1201 is also used to generate a spatial offset between the depth camera and the real camera based on the internal and external parameters of the depth camera and the internal and external parameters of the real camera. information; obtain the depth map collected by the depth camera; map the depth information of the depth map to the target video frame according to the spatial offset information to obtain the realistic foreground depth map.
  • the real-life camera is used to shoot the target scene from a first angle; the acquisition module 1201 is also used to acquire the first mapping relationship according to the internal and external parameters of the reference camera.
  • the first mapping relationship is used to represent the mapping relationship between the camera coordinate system of the reference camera and the real coordinate system.
  • the reference camera is used to shoot the target scene from a second angle. The second angle is consistent with the third angle.
  • One angle is different; according to the internal and external parameters of the real camera, a second mapping relationship is obtained, and the second mapping relationship is used to represent the mapping relationship between the camera coordinate system of the real camera and the real coordinate system; according to the The first mapping relationship reconstructs the reference image captured by the reference camera to obtain a reconstructed reference image; the target video frame captured by the real camera is reconstructed according to the second mapping relationship to obtain a reconstructed Target scene image; determine the depth information of each pixel point in the target video frame according to the disparity between the reconstructed reference image and the reconstructed target scene image, and obtain the realistic foreground depth map.
  • the acquisition module 1201 is also used to acquire the rendering target corresponding to the virtual background in the virtual environment; and generate a rendering target depth map of the rendering target in the virtual environment.
  • the rendering target depth map includes virtual depth information, and the virtual depth information is used to represent the distance from the rendering target to the virtual camera in the virtual environment; converting the virtual depth information in the rendering target depth map into reality Depth information is used to obtain the virtual background depth map, and the real depth information is used to represent the distance from the rendering target mapped to the real environment to the real camera.
  • the update module 1203 is also used to determine a distance interval based on the preset aperture or preset focal length of the real-world camera, where the distance interval is used to indicate that the clarity is greater than the clarity threshold. The distance from the reference point corresponding to the pixel point to the real camera; according to the fusion depth map and the distance interval, adjust the clarity of each pixel point in the target video frame to generate the target video frame Depth of field effect picture.
  • the update module 1203 is also used to adjust the clarity of the area corresponding to the virtual background in the target video frame according to the focus distance of the real camera and the fusion depth map. degree, generating the depth of field effect map of the target video frame.
  • the acquisition module 1201 is also used to acquire at least two depth-of-field renderings corresponding to the video frame sequence; the update module 1203 is also used to arrange the At least two depth-of-field renderings are used to obtain a depth-of-field video corresponding to the video frame sequence.
  • the real camera captures the target scene and generates a video frame sequence. Then, the target video frame is obtained according to the video frame sequence, and the depth information of the target video frame is updated, so that the depth information included in the target video frame is more accurate. Due to the increased depth information of the virtual background, the picture composed of the virtual background and the real foreground is more natural and the display effect is better.
  • FIG. 13 is a schematic structural diagram of a computer device according to an exemplary embodiment.
  • the computer device 1300 includes a central processing unit (Central Processing Unit, CPU) 1301, a system memory 1304 including a random access memory (Random Access Memory, RAM) 1302 and a read-only memory (Read-Only Memory, ROM) 1303, and connected system memory. 1304 and the system bus 1305 of the central processing unit 1301.
  • the computer device 1300 also includes a basic input/output (I/O system) 1306 that helps transfer information between various devices within the computer device, and a module 1315 for storing an operating system 1313, application programs 1314, and other programs. mass storage devices 1307.
  • I/O system basic input/output
  • the basic input/output system 1306 includes a display 1308 for displaying information and input devices 1309 such as a mouse and a keyboard for the user to input information.
  • the display 1308 and the input device 1309 are both connected to the central processing unit 1301 through the input and output controller 1310 connected to the system bus 1305.
  • Basic input/output system 1306 may also include an input/output controller 1310 for receiving and processing input from a variety of other devices such as a keyboard, mouse, or electronic stylus.
  • input and output controller 1310 also provides output to a display screen, printer, or other type of output device.
  • Mass storage device 1307 is connected to central processing unit 1301 through a mass storage controller (not shown) connected to system bus 1305 .
  • Mass storage device 1307 and its associated computer device-readable media provide non-volatile storage for computer device 1300 . That is, the mass storage device 1307 may include a computer device-readable medium (not shown) such as a hard disk or a Compact Disc Read-Only Memory (CD-ROM) drive.
  • a computer device-readable medium such as a hard disk or a Compact Disc Read-Only Memory (CD-ROM) drive.
  • Computer device readable media may include computer device storage media and communication media.
  • Computer device storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as instructions, data structures, program modules or other data readable by a computer device.
  • Computer equipment storage media include RAM, ROM, Erasable Programmable Read Only Memory (EPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), CD-ROM , Digital Video Disc (DVD) or other optical storage, tape cassette, magnetic tape, disk storage or other magnetic storage device.
  • EPROM Erasable Programmable Read Only Memory
  • EEPROM Electrically Erasable Programmable Read-Only Memory
  • CD-ROM Compact Disc
  • DVD Digital Video Disc
  • computer equipment storage media are not limited to the above types.
  • the above-mentioned system memory 1304 and mass storage device 1307 may be collectively referred to as memory.
  • computer device 1300 may also operate via a network connection to a remote computer device on a network, such as the Internet. That is, the computer device 1300 can be connected to the network 1312 through the network interface unit 1311 connected to the system bus 1305, or the network interface unit 1311 can also be used to connect to other types of networks or remote computer device systems (not shown) .
  • the memory also includes one or more programs.
  • One or more programs are stored in the memory.
  • the central processor 1301 implements all or part of the steps of the above-mentioned virtual reality-based video generation method by executing the one or more programs.
  • a computer-readable storage medium stores at least one instruction, at least a program, a code set or an instruction set. At least one instruction, at least a program, a code set. Or the instruction set is loaded and executed by the processor to implement the virtual reality-based video generation method provided by each of the above method embodiments.
  • This application also provides a computer-readable storage medium, in which at least one instruction, at least a program, a code set or an instruction set is stored, and at least one instruction, at least a program, a code set or an instruction set is loaded and executed by a processor. To implement the virtual reality-based video generation method provided by the above method embodiment.
  • the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the virtual reality-based video generation method provided in the above embodiments.
  • the present application provides a chip, which includes programmable logic circuits and/or program instructions, and is used to implement the virtual reality-based video generation method as described above when the electronic device installed with the chip is running.
  • the computer system includes a computer device, a reality camera and a depth camera; wherein, the reality camera is used to collect a target video frame, the depth camera is used to obtain a realistic foreground depth map of the target video frame, and the computer device is used to acquire a target video frame. It is used to obtain the virtual background depth map and generate the target video with depth of field effect.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Processing Or Creating Images (AREA)

Abstract

一种基于虚拟现实的视频生成方法、装置、设备及介质,涉及虚拟现实领域。该方法包括:从视频帧序列中获取目标视频帧,视频帧序列是由现实摄像机采集目标场景得到的,目标场景包括现实前景和虚拟背景,虚拟背景显示在现实环境中的物理屏幕上(302);获取目标视频帧的现实前景深度图和虚拟背景深度图(304);融合现实前景深度图和虚拟背景深度图,得到融合深度图(306);根据融合深度图调整目标视频帧的显示参数,生成目标视频帧的景深效果图(308);根基于所述目标视频帧的景深效果图,生成具有景深效果的目标视频(310)。本申请得到的目标视频的具有较好景深效果。

Description

基于虚拟现实的视频生成方法、装置、设备及介质
本申请要求于2022年04月28日提交的申请号为202210463334.7、发明名称为“基于虚拟现实的视频生成方法、装置、设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及虚拟现实领域,特别涉及一种基于虚拟现实的视频生成方法、装置、设备及介质。
背景技术
虚拟制片指计算机辅助制片和可视化电影制作方法。虚拟制片包括多种方式,比如,可视化(visualization)、表演捕捉(performance capture)、混合虚拟制作(hybrid virtual production)、实时LED墙镜头内虚拟制作(live LED wall in-camera)等。
相关技术会在拍摄时,在演员、道具等现实前景的后方设置一面LED(Light Emitting Diode,发光二极管)墙,并在LED墙上投影实时的虚拟背景。现实摄像机会同时拍摄演员、道具、LED墙上的显示内容,并将拍摄到的内容输入到计算机中。计算机实时输出现实摄像机的拍摄内容。
但是现实摄像机采集到的每一帧图像的显示效果差,进而导致现实摄像机拍摄的视频效果较差。
发明内容
本申请实施例提供了一种基于虚拟现实的视频生成方法、装置、设备及介质,技术方案如下:
根据本申请的一个方面,提供了一种基于虚拟现实的视频生成方法,该方法由计算机系统执行,该方法包括:
从视频帧序列中获取目标视频帧,视频帧序列是由现实摄像机采集目标场景得到的,目标场景包括现实前景和虚拟背景,虚拟背景显示在现实环境中的物理屏幕上;
获取目标视频帧的现实前景深度图和虚拟背景深度图,现实前景深度图包括现实前景到现实摄像机的深度信息,虚拟背景深度图包括被映射到现实环境后的虚拟背景到现实摄像机的深度信息;
融合现实前景深度图和虚拟背景深度图,得到融合深度图,融合深度图包括目标场景内的各个参考点在现实环境中到现实摄像机的深度信息;
根据融合深度图调整目标视频帧的显示参数,生成目标视频帧的景深效果图;
基于目标视频帧的景深效果图,生成具有景深效果的目标视频。
根据本申请的另一个方面,提供了一种基于虚拟现实的视频生成装置,该装置包括:
获取模块,用于从视频帧序列中获取目标视频帧,视频帧序列是由现实摄像机采集目标场景得到的,目标场景包括现实前景和虚拟背景,虚拟背景显示在现实环境中的物理屏幕上;
获取模块,还用于获取目标视频帧的现实前景深度图和虚拟背景深度图,现实前景深度图包括现实前景到现实摄像机的深度信息,虚拟背景深度图包括被映射到现实环境后的虚拟背景到现实摄像机的深度信息;
融合模块,用于融合现实前景深度图和虚拟背景深度图,得到融合深度图,融合深度图包括目标场景内的各个参考点在现实环境中到现实摄像机的深度信息;
更新模块,用于根据融合深度图调整目标视频帧的显示参数,生成目标视频帧的景深效果图;
更新模块,还用于基于目标视频帧的景深效果图,生成具有景深效果的目标视频。
根据本申请的另一方面,提供了一种计算机设备,该计算机设备包括:处理器和存储器, 存储器中存储有至少一条指令、至少一段程序、代码集或指令集,至少一条指令、至少一段程序、代码集或指令集由处理器加载并执行以实现如上方面的基于虚拟现实的视频生成方法。
根据本申请的另一方面,提供了一种计算机存储介质,计算机可读存储介质中存储有至少一条程序代码,程序代码由处理器加载并执行以实现如上方面的基于虚拟现实的视频生成方法。
根据本申请的另一方面,提供了一种计算机程序产品或计算机程序,上述计算机程序产品或计算机程序包括计算机指令,上述计算机指令存储在计算机可读存储介质中。计算机设备的处理器从上述计算机可读存储介质读取上述计算机指令,上述处理器执行上述计算机指令,使得上述计算机设备执行如上方面的基于虚拟现实的视频生成方法。
根据本申请的另一方面,提供了一种芯片,该芯片包括可编程逻辑电路和/或程序指令,当安装有芯片的电子设备运行时,用于实现如上所述的基于虚拟现实的视频生成方法。
根据本申请的另一方面,提供了一种计算机系统,该计算机系统包括计算机设备、现实摄像机和深度摄像机;其中,现实摄像机用于采集目标视频帧,深度摄像机用于获取目标视频帧的现实前景深度图,计算机设备用于获取虚拟背景深度图、以及生成具有景深效果的目标视频。
现实摄像机拍摄目标场景,生成视频帧序列。再根据视频帧序列获取目标视频帧,并对目标视频帧的深度信息进行更新,使得目标视频帧包括的深度信息更加准确,并基于目标视频帧生成具有景深效果的目标视频。由于虚拟背景的深度信息更加准确,使得虚拟背景和现实前景结合组成的视频更加自然,显示效果较好。
附图说明
图1是本申请一个示例性实施例提供的计算机系统的示意图;
图2是本申请一个示例性实施例提供的基于虚拟现实的景深效果图生成方法的示意图;
图3是本申请一个示例性实施例提供的基于虚拟现实的视频生成方法的流程示意图;
图4是本申请一个示例性实施例提供的基于虚拟现实的视频生成方法的界面示意图;
图5是本申请一个示例性实施例提供的基于虚拟现实的视频生成方法的界面示意图;
图6是本申请一个示例性实施例提供的基于虚拟现实的视频生成方法的界面示意图;
图7是本申请一个示例性实施例提供的基于虚拟现实的视频生成方法的界面示意图;
图8是本申请一个示例性实施例提供的基于虚拟现实的视频生成方法的流程示意图;
图9是本申请一个示例性实施例提供的计算深度信息的示意图;
图10是本申请一个示例性实施例提供的基于虚拟现实的景深效果图生成方法的示意图;
图11是本申请一个示例性实施例提供的基于虚拟现实的视频生成制作的示意图;
图12是本申请一个示例性实施例提供的基于虚拟现实的景深效果图生成方法的界面示意图;
图13是本申请一个示例性实施例提供的计算机设备的示意图。
具体实施方式
景深(Depth Of Field,DOF):是指摄像机对焦点前后相对清晰的成像范围。在光学中,尤其是录影或是摄影,是一个描述在空间中,可以清楚成像的距离范围。摄像机使用的透镜只能够将光聚到某一固定的距离,远离此点的图像则会逐渐模糊,但是在某一段特定的距离内,图像模糊的程度是肉眼无法察觉的,这段特定的距离称之为景深。
现实前景:现实环境中的实物,一般包含演员以及周边真实堆叠场景。将靠近摄像机作为摄像机的前景。
虚拟背景:预先设计的虚拟环境。虚拟置景一般包含比较现实中不好做的场景以及类魔幻场景,经过程序引擎演算后输出到LED幕墙上,在现实置景后面作为摄像机的后景。
YUV:一种颜色编码方法,用在视频处理组件中。其中,“Y”表示明亮度(Luminance或Luma),也就是灰阶值,“U”和“V”表示的则是色度(Chrominance或Chroma),作用是描述色 彩及饱和度,用于指定像素点的颜色。
内外参数:包括摄像机的内参数和外参数。内参数是与摄像机自身特性相关的参数,比如,内参数包括摄像机的焦距、像素大小等。外参数是摄像机在世界坐标系中的参数,比如,外参数包括摄像机的位置、旋转方向等。通过内外参数可将世界坐标系中的点映射到摄像机拍摄的像素点上。
需要说明的是,本申请所涉及的信息(包括但不限于用户设备信息、用户个人信息等)、数据(包括但不限于用于分析的数据、存储的数据、展示的数据等)以及信号,均为经用户授权或者经过各方充分授权的,且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准。
在电影、电视等视频内容的制作过程中,会用到虚拟制片技术。相关技术会在视频拍摄场地中设置一块LED幕墙用于显示虚拟背景,并在LED幕墙的前方设置现实前景,现实前景指现实中的物体或生物。现实摄像机同时拍摄现实前景和虚拟背景,得到视频。
但是,相关技术中,现实摄像机只能获取到现实摄像机到LED幕墙的背景信息,获取不到虚拟背景的深度信息,导致中的虚拟背景缺乏真实感,且现实前景和虚拟背景的组合生硬。
图1示出了本申请一个示例性实施例提供的计算机系统的示意图。计算机系统100包括计算机设备110、物理屏幕120、现实摄像机130和深度摄像机140。
示意性的,计算机设备110上至少安装有视频制作的第一应用程序。其中,第一应用程序可以是app(application,应用程序)中的小程序,也可以是专门的应用程序,也可以是网页客户端。可选的,计算机设备110上还安装有生成虚拟世界的第二应用程序。其中,第二应用程序可以是app中的小程序,也可以是专门的应用程序,也可以是网页客户端。
应当理解的是,第一应用程序和第二应用程序可以是同一个应用程序,也可以是不同的应用程序。在第一应用程序和第二应用程序是同一个应用程序的情况下,该应用程序可以同时实现视频制作和生成虚拟世界的功能。在第一应用程序和第二应用程序是不同应用程序的情况下,第一应用程序和第二应用程序之间可实现数据互通。示例性的,第二应用程序将虚拟世界中各个虚拟物体到虚拟摄像机的深度信息发送给第一应用程序,由第一应用程序根据前述深度信息对视频帧的图像进行更新。
物理屏幕120用于显示虚拟背景。计算机设备110将虚拟世界的数据传输给物理屏幕120,物理屏幕120显示该虚拟世界作为虚拟背景。
现实摄像机130用于拍摄现实前景150和物理屏幕120显示的虚拟背景。现实摄像机130会将拍摄到的视频传输给计算机设备110。现实摄像机130可以将拍摄到的视频实时传输给计算机设备110,现实摄像机130也可以每隔预设时长将拍摄到的视频传输给计算机设备110。
深度摄像机140用于获取现实前景150的深度图。深度摄像机140和现实摄像机130的设置位置不同。深度摄像机140会将拍摄到的现实前景150的深度图传输给计算机设备110,计算机设备110根据该深度图确定现实前景150的深度信息。
图2示出了本申请一个示例性实施例提供的基于虚拟现实的景深效果图生成方法的示意图。该方法可由图1所示的计算机系统100执行。
如图2所示,现实摄像机210采集目标场景得到目标视频帧220。深度摄像机250也会采集目标场景得到深度图。通过像素点匹配的方法,匹配目标视频帧220和深度图中的像素点,将深度图中的深度信息提供给目标视频帧220,得到现实前景深度图260。另一方面,虚拟摄像机230获取与虚拟背景对应的渲染目标的深度信息,得到虚拟背景深度图240。融合现实前景深度图260和虚拟背景深度图240的深度信息,得到融合深度图270。根据融合深度图270为目标视频帧220增加景深效果,得到景深效果图280。
图3示出了本申请一个示例性实施例提供的基于虚拟现实的视频生成方法的流程示意图。示意性的,该方法可由图1所示的计算机系统100执行。在一些实施例中,计算机系统100包括计算机设备110、物理屏幕120、现实摄像机130和深度摄像机140。其中,计算机 设备110用于实现视频制作和生成虚拟世界的功能,物理屏幕120用于显示虚拟背景,现实摄像机130用于拍摄现实前景150和物理屏幕120显示的虚拟背景,深度摄像机140用于获取现实前景150的深度图。
示意性的,本申请实施例提供的基于虚拟现实的视频生成方法,包括如下步骤:
步骤302:从视频帧序列中获取目标视频帧,视频帧序列是由现实摄像机采集目标场景得到的,目标场景包括现实前景和虚拟背景,虚拟背景显示在现实环境中的物理屏幕上。
可选地,现实摄像机将拍摄到的视频帧序列传输给计算机设备。
目标视频帧是视频帧序列中任意一帧的图像。示例性的,视频帧序列包括120帧图像,随机取其中的第45帧图像作为目标视频帧。可选地,按照预设帧率播放视频帧序列,生成视频。示例性的,在视频帧序列中每24帧图像构成1秒的视频。
可选地,现实前景包括现实物体、现实生物、真人中的至少一种。示例性的,如图4所示,现实前景401是视频拍摄现场的演员。
虚拟背景指显示在显示屏上的虚拟内容。可选地,虚拟内容包括虚拟环境、虚拟人物、虚拟物体、虚拟道具、虚拟影像中的至少一种。本申请实施例对虚拟背景的显示内容不做具体限定。示例性的,如图4所示,虚拟背景402显示在显示屏403上。虚拟背景402是城市的虚拟影像。
在一些实施例中,现实前景由现实摄像机获取,虚拟背景由物理屏幕显示。其中,物理屏幕可以是LED墙,虚拟背景可投影至LED墙上。
步骤304:获取目标视频帧的现实前景深度图和虚拟背景深度图,现实前景深度图包括现实前景到现实摄像机的深度信息,虚拟背景深度图包括被映射到现实环境后的虚拟背景到现实摄像机的深度信息。
可选地,通过深度摄像机提供的深度图和目标视频帧,确定现实前景深度图。示例性的,通过深度摄像机提供的深度图获取目标视频帧内各个像素点的深度信息,该深度信息用于表示目标视频帧中各个像素点对应的现实参考点到现实摄像机的距离;在第一像素点的深度值大于第一深度阈值的情况下,确定该第一像素点属于虚拟背景;在第二像素点的深度值大于第二深度阈值的情况下,确定该第二像素点属于现实前景。其中,第一深度阈值不小于第二深度阈值,第一深度阈值和第二深度阈值可由技术人员自行设置。
图5示出了本申请一个示例性实施例提供的现实前景深度图的示意图。其中,现实前景深度图包括现实前景到现实摄像机的深度信息,且现实前景深度图不包括被映射到现实环境后的虚拟背景到现实摄像机的深度信息。
可选地,通过虚拟摄像机生成虚拟背景深度图,虚拟摄像机用于在虚拟环境中中拍摄与虚拟背景对应的渲染目标。示例性的,计算机设备可获取到该渲染目标到虚拟摄像机的距离,将该距离转化为现实距离,得到虚拟背景深度图。
图6示出了本申请一个示例性实施例提供的虚拟背景深度图的示意图。其中,虚拟背景深度图包括被映射到现实环境后的虚拟背景到现实摄像机的深度信息,且虚拟背景深度图不包括现实前景到现实摄像机的深度信息。
步骤306:融合现实前景深度图和虚拟背景深度图,得到融合深度图,融合深度图包括目标场景内的各个参考点在现实环境中到现实摄像机的深度信息。
可选地,根据虚拟背景深度图内各个像素点的第二深度信息,更新现实前景深度图内各个像素点的第一深度信息,得到融合深度图。
其中,现实前景深度图包括与现实前景对应的前景区域和与虚拟背景对应的背景区域。本申请实施例可以对背景区域内的像素点的第一深度信息进行更新,也可以对前景区域内的像素点的第一深度信息进行更新。
示例性的,根据虚拟背景深度图内属于背景区域的第二像素点的第二深度信息,更新现实前景深度图内属于背景区域的第一像素点的第一深度信息,得到融合深度图。
示例性的,更新现实前景深度图内属于前景区域的第三像素点的第三深度信息,第三像素点是与目标物体对应的像素点。
示例性的,如图7所示,融合深度图包括现实前景701和虚拟背景702的深度信息。对比图5和图7可得,相较于现实前景深度图,融合深度图还提供了虚拟背景702的深度信息。
步骤308:根据融合深度图调整目标视频帧的显示参数,生成目标视频帧的景深效果图。
可选地,显示参数包括清晰度、亮度、灰度、对比度、饱和度中的至少一种。可选地,根据技术人员的实际需求调整目标视频帧的显示参数。示例性的,根据技术人员的实际需求,增加融合深度图中与虚拟背景对应的像素点的亮度。
可选地,根据现实摄像机的预设光圈或预设焦距,确定距离区间,距离区间用于表示清晰度大于清晰度阈值的像素点对应的参考点到现实摄像机的距离;根据融合深度图和距离区间,调整目标视频帧内各个像素点的清晰度,生成目标场景的景深效果图。
可选地,根据现实摄像机的对焦距离和融合深度图,调整目标视频帧内虚拟背景对应的区域的清晰度,生成所述目标场景的所述景深效果图。
可选地,根据预设条件调整目标视频帧内的各个像素点的清晰度。预设条件是由技术人员根据实际需求确定的。示例性的,调整目标视频帧中预设区域内的像素点的清晰度,预设区域可由技术人员自行设置。
步骤310:基于目标视频帧的景深效果图,生成具有景深效果的目标视频。
可选地,按照预设帧率播放目标视频帧的景深效果图,得到具有景深效果的目标视频。
示例性的,目标视频帧包括至少两个视频帧,按照时间顺序排列目标视频帧的景深效果图,得到具有景深效果的目标视频。可选地,按照时间顺序排列连续的目标视频真的景深效果图,生成具有景深效果的目标视频。
在一些实施例中,步骤360、步骤380和步骤310由计算机系统中的计算机设备实现。其中,计算机设备110与深度摄像机140建立无线或有线通信,可实现现实前景深度图的信息传输,从而结合虚拟背景深度图进行融合,以得到具有景深效果的目标视频。
综上所述,本申请实施例中,现实摄像机拍摄目标场景,生成视频帧序列。再根据视频帧序列获取目标视频帧,并对目标视频帧的深度信息进行更新,使得目标视频帧包括的深度信息更加准确,并基于目标视频帧生成具有景深效果的目标视频。由于虚拟背景的深度信息更加准确,使得虚拟背景和现实前景结合组成的视频更加自然,显示效果较好。
此外,景深效果图是通过现实摄像机的对焦距离、焦距、光圈等参数生成的,因此,景深效果图中的虚拟背景可以仿真出现实摄像机的拍摄效果,虚拟背景的显示更加自然,虚拟背景更加接近现实物体。
在接下来的实施例中,对现实前景深度图中背景区域的深度信息进行更新,使得背景区域的深度信息更加准确。以现实置景深度图包括两种可选的实施方式为例,现实置景深度图可以通过设置一个深度摄像机获得,也可以通过设置一个辅助摄像机获得。而且,虚拟背景的像素点的清晰度可以通过现实摄像机的参数来进行更新,比如,通过对焦距离、光圈、焦距等对像素点的清晰度进行更新。
图8示出了本申请一个示例性实施例提供的基于虚拟现实的视频生成方法的流程示意图。方法可由图1所示的计算机系统100执行,计算机系统100的相关描述可参考前述内容,不再赘述。
示意性的,本申请实施例提供的基于虚拟现实的视频生成方法,包括如下步骤:
步骤801:从视频帧序列中获取目标视频帧。
可选地,现实摄像机将拍摄到的视频帧序列传输给计算机设备。
目标视频帧是视频帧序列中任意一帧的图像。示例性的,视频帧序列包括120帧图像,随机取其中的第45帧图像作为目标视频帧。可选地,按照预设帧率播放视频帧序列,生成具有景深效果的目标视频。示例性的,在视频帧序列中每24帧图像构成1秒的视频。
可选地,现实前景包括现实物体、现实生物、真人中的至少一种。
虚拟背景指显示在显示屏上的虚拟内容。可选地,虚拟内容包括虚拟环境、虚拟人物、虚拟物体、虚拟道具、虚拟影像中的至少一种。本申请实施例对虚拟背景的显示内容不做具体限定。
步骤802:获取目标视频帧的现实前景深度图。
在一种可选的实现方式中,通过深度摄像机获取现实前景的深度信息,该方法可包括以下步骤:
1、根据深度摄像机的内外参数和现实摄像机的内外参数,生成深度摄像机与现实摄像机之间的空间偏移信息。
可选地,深度摄像机的内外参数包括内参数和外参数。内参数是与深度摄像机自身特性相关的参数,内参数包括焦距、像素大小等。外参数是深度摄像机在世界坐标系中的参数,外参数包括相机的位置、旋转方向等。
可选地,现实摄像机的内外参数包括内参数和外参数。内参数是与现实摄像机自身特性相关的参数,内参数包括焦距、像素大小等。外参数是现实摄像机在世界坐标系中的参数,外参数包括相机的位置、旋转方向等。
在一种可选地实施方式中,空间偏移信息指深度摄像机的相机坐标系与现实摄像机的相机坐标系之间的映射关系。示例性的,根据深度摄像机的内外参数,确定深度摄像机的深度映射关系,深度映射关系指深度摄像机的相机坐标系到现实坐标系的映射关系;根据现实摄像机的内外参数,确定现实摄像机的现实映射关系,现实映射关系指现实摄像机的相机坐标系到现实坐标系的映射关系;通过现实坐标系,确定深度摄像机的相机坐标系与现实摄像机的相机坐标系之间的映射关系,得到深度摄像机与现实摄像机之间的空间偏移信息。
可选地,深度摄像机和现实摄像机拍摄目标场景的角度不同。可选地,深度摄像机和现实摄像机的设置位置不同。
2、获取深度摄像机采集的深度图。
可选地,深度摄像机将采集的深度图通过有线连接或无线连接传输给计算机设备。
3、根据空间偏移信息,将深度图的深度信息映射到目标视频帧上,得到现实前景深度图。
由于空间偏移信息指深度摄像机的相机坐标系与现实摄像机的相机坐标系之间的映射关系。因此,可以得到深度图和目标视频帧上的像素点之间的对应关系;根据该对应关系,将深度图上各个像素点的深度信息映射到目标视频帧上的各个像素点上,得到现实前景深度图。
在一种可选的实现方式中,通过另一个参考摄像机获取现实前景的深度信息,该方法可包括以下步骤:
1、根据参考摄像机的内外参数,获取第一映射关系,第一映射关系用于表示参考摄像机的摄像机坐标系和现实坐标系之间的映射关系,参考摄像机用于从第二角度拍摄目标场景,第二角度与第一角度不同,第一角度是现实摄像机的拍摄角度。
可选地,参考摄像机的内外参数包括内参数和外参数。内参数是与参考摄像机自身特性相关的参数,内参数包括焦距、像素大小等。外参数是参考摄像机在世界坐标系中的参数,外参数包括相机的位置、旋转方向等。
可选地,第一映射关系还用于表示参考摄像机的拍摄的参考图像上的像素点与现实点的位置对应关系。例如,参考图像上像素点A的坐标是(x1,x2),第一映射关系满足函数关系f,函数关系f是根据参考摄像机的内外参数生成的,而在现实环境中与像素点A对应的现实点是(y1,y2,y3)=f(x1,x2)。
2、根据现实摄像机的内外参数,获取第二映射关系,第二映射关系用于表示现实摄像机的摄像机坐标系和现实坐标系之间的映射关系。
可选地,第二映射关系还用于表示现实摄像机的拍摄的目标视频帧上的像素点与现实点的位置对应关系。例如,参考图像上像素点B的坐标是(x3,x4),第一映射关系满足函数关 系函数关系是根据现实摄像机的内外参数生成的,而在现实环境中与像素点B对应的现实点是
3、根据第一映射关系对参考摄像机拍摄的参考图像进行重构,得到重构参考图像。
可选地,根据第一映射关系将参考图像上的各个像素点映射到现实环境中,得到重构参考图像。其中,重构参考图像包括参考图像中的各个像素点对应的参考点在现实环境中的位置。
4、根据第二映射关系对现实摄像机拍摄的目标视频帧进行重构,得到重构目标场景图像。
可选地,根据第二映射关系将目标视频帧上的各个像素点映射到现实环境中,得到重构目标场景图像。其中,重构目标场景图像包括目标视频帧中的各个像素点对应的参考点在现实环境中的位置。
5、根据重构参考图像和重构目标场景图像之间的视差,确定目标视频帧内各个像素点的深度信息,得到现实前景深度图。
可选地,为方便计算视差,将重构参考图像和重构目标场景图像映射到同一平面上,确定重构参考图像和重构目标场景图像上对应同一个现实点的两个像素点;根据前述两个像素点的视差确定目标视频帧内各个像素点的深度信息,得到现实前景深度图。
示例性的,如图9所示,假设现实摄像机和参考摄像机位于同一平面上,预先将重构参考图像和重构目标场景图像映射到X轴所在平面上,使得参考点903到X轴的距离与参考点903到X轴的距离相同。参考点903通过现实摄像机的中心点904在目标视频帧上形成像素点901,参考点903通过参考摄像机的中心点905在参考图像上形成像素点902。在图9中,f是现实摄像机和参考摄像机的焦距,现实摄像机和参考摄像机的焦距相同。z是参考点903到现实摄像机的距离,即参考点903到现实摄像机的深度信息。x是参考点903到Z轴的距离。x1是像素点901在目标视频帧上的位置。xr是像素点902在参考图像上的位置。则通过图9中的相似三角形可得到以下等式:
因此,根据上述等式可得到z=f*b/(x1-xr)。其中,(x1-xr)即为视差。
需要说明的是,本申请实施例对获取目标视频帧的现实前景深度图的方法不做具体限定。除上述两种可选方式,技术人员可根据实际需求选择其他方式来获取目标视频帧的现实前景深度图,这里不再赘述。
步骤803:获取目标视频帧的虚拟背景深度图。
在一种可选的实现方式中,通过虚拟摄像机获取到虚拟背景深度图,该方法可包括以下步骤:
1、获取虚拟环境中与虚拟背景对应的渲染目标。
可选地,根据现实摄像机的拍摄角度和位置,确定现实摄像机拍摄到的物理屏幕区域;根据物理屏幕区域确定物理屏幕区域上的显示内容;根据前述显示内容,获取虚拟环境中的渲染目标。示例性的,若物理屏幕的尺寸是30×4(m×m),而现实摄像机设置在距物理屏幕30米处的位置,现实摄像机与物理屏幕之间的夹角为90度,确定现实摄像机拍摄到了物理屏幕的一部分,这部分物理屏幕的大小是20×3(m×m)。
2、生成虚拟环境中的渲染目标的渲染目标深度图,渲染目标深度图包括虚拟深度信息,虚拟深度信息用于表示在虚拟环境中渲染目标到虚拟摄像机的距离。
示例性的,计算机设备存储有虚拟环境的各项数据,其中,该数据包括虚拟摄像机到虚拟环境中的各个点的距离。在确定渲染目标后,可直接确定渲染目标到虚拟摄像机的距离,得到渲染目标深度图。
3、将渲染目标深度图中的虚拟深度信息转化为现实深度信息,得到虚拟背景深度图,现实深度信息用于表示被映射到现实环境中的渲染目标到现实摄像机的距离。
可选地,确定虚拟摄像机在虚拟环境中的第一位置;确定现实摄像机在现实环境中的第 二位置;根据第一位置和第二位置之间的位置关系将渲染目标深度图中的虚拟深度信息转化为现实深度信息。示例性的,虚拟摄像机在虚拟环境中位于位置A,现实摄像机在虚拟环境中位于位置B,此时,虚拟环境中的点1到虚拟摄像机的距离为x,设被映射到现实环境中的渲染目标到现实摄像机的距离为y,则有y=f(x),f表示函数关系。
可选地,渲染目标深度图是以虚拟摄像机为视角得到的图像,而虚拟背景深度图是以现实摄像机为视角的图像,此时,需要对渲染目标深度图中的像素点做坐标变换。示例性的,确定渲染目标深度图的像素点在物理屏幕上的第一位置坐标;根据第一位置坐标和现实摄像机的内外参数,将第一位置坐标映射为第二位置坐标,第二位置坐标是虚拟背景深度图上的位置坐标;将前述像素点的深度信息填入到第二位置坐标对应的像素点上,得到虚拟背景深度图。
需要说明的是,步骤802和步骤803不存在先后顺序,可以同时执行步骤802和步骤803,也可以先执行步骤802,后执行步骤803,还可以先执行步骤803,后执行步骤802。
步骤804:在虚拟背景深度图中,确定与现实前景深度图中属于背景区域的第i个第一像素点对应的第j个第二像素点。
其中,i,j为正整数,i的初始值可以为任意整数。
可选地,根据现实摄像机的内外参数,确定背景区域的第i个第一像素点在物理屏幕上的屏幕坐标;在虚拟环境中,根据屏幕坐标确定与第i个第一像素点对应的虚拟点的坐标;根据虚拟摄像机的内外参数,将虚拟点的坐标映射到虚拟背景深度图上,得到第j个第二像素点。
在一些实施例中,步骤804可实现为如下:分别获取现实摄像机和虚拟摄像机的内外参数;将现实摄像机和虚拟摄像机的内外参数对齐,以使得虚拟背景深度图和前景深度图的像素点对应;随后,根据虚拟背景深度图和前景深度图的对应关系,来确定属于背景区域的第i个第一像素点对应的第j个第二像素点。
其中,摄像机的内参数是与摄像机自身特性相关的参数,如摄像机的焦距、像素大小等;而摄像机的外参数是摄像机在世界坐标系中的参数,如摄像机的位置、旋转方向等。受力信息的,通过内外参数可将世界坐标系中的点映射到摄像机拍摄的像素点上。
可选地,根据第i个第一像素点的在现实前景深度图中的位置坐标,在虚拟背景深度图中确定第j个第二像素点的位置坐标。例如,第i个第一像素点在现实前景深度图中的位置坐标是(4,6),则第j个第二像素点在虚拟背景深度图中的位置坐标也是(4,6)。
步骤805:使用第j个第二像素点的第二深度信息,在现实前景深度图中替换第i个第一像素点的第一深度信息。
示例性的,使用第j个第二像素点的第二深度信息中的深度值,在现实前景深度图中替换第i个第一像素点的第一深度信息中的深度值。例如,在现实前景深度图中,第i个第一像素点的深度值是20,与第i个第一像素点对应的第j个第二像素点的深度值是80,则将第i个第一像素点的深度值修改为80。
在本申请的另一个可选实现方式中,在使用第j个第二像素点的第二深度信息替换第i个第一像素点的第一深度信息前,还可以对第j个第二像素点的第二深度信息进行修改。示例性的,将第二深度信息中的深度值修改为第一目标深度值,或者,为第二深度信息中的深度值增加第二目标深度值,或者,为第二深度信息中的深度值减小第三目标深度值。其中,第一目标深度值、第二目标深度值和第三目标深度值都可以由技术人员自行设置。例如,假设有3个第二像素点,3个第二像素点的深度值分别是20、43和36,将这3个第二像素点的深度值统一设置为40。
示例性的,请参考图5和图7,图7相较于图5,图7中虚拟背景的深度信息已经被替换。
步骤806:将i更新为i+1,重复上述两个步骤,直至遍历现实前景深度图中属于背景区 域的各个第一像素点的第一深度信息,得到融合深度图。
示意性的,背景区域包括至少两个第一像素点,则遍历现实前景深度图中属于背景区域的各个第一像素点,直至将背景区域中各个第一像素点的第一深度信息替换为第二像素点的第二深度信息。
步骤807:根据融合深度图调整目标视频帧的显示参数,生成目标视频帧的景深效果图。
可选地,显示参数包括清晰度、亮度、灰度、对比度、饱和度中的至少一种。
可选地,根据现实摄像机的预设光圈或预设焦距,确定距离区间,距离区间用于表示清晰度大于清晰度阈值的像素点对应的参考点到现实摄像机的距离;根据融合深度图和距离区间,调整目标视频帧内各个像素点的清晰度,生成目标视频帧的景深效果图。示例性的,距离区间是[0,20],则将位于距离区间内的像素点的清晰度设置为100%,将位于距离区间外的像素点的清晰度设置为40%。
可选地,根据现实摄像机的对焦距离和融合深度图,调整目标视频帧内虚拟背景对应的区域的清晰度,生成目标视频帧的景深效果图。
可选地,根据预设条件调整目标视频帧内的各个像素点的清晰度。预设条件是由技术人员根据实际需求确定的。示例性的,调整目标视频帧中预设区域内的像素点的清晰度,预设区域可由技术人员自行设置。
步骤808:基于目标视频帧的景深效果图,生成具有景深效果的目标视频。
可选地,按照预设帧率播放目标视频帧的景深效果图,得到具有景深效果的目标视频。
示例性的,目标视频帧包括至少两个视频帧,按照时间顺序排列目标视频帧的景深效果图,得到具有景深效果的目标视频。可选地,按照时间顺序排列连续的目标视频真的景深效果图,生成具有景深效果的目标视频。
综上所述,本申请实施例中,现实摄像机拍摄目标场景,生成视频帧序列。再根据视频帧序列获取目标视频帧,并对目标视频帧的深度信息进行更新,使得目标视频帧包括的深度信息更加准确,并基于目标视频帧生成具有景深效果的目标视频。由于虚拟背景的深度信息更加准确,使得虚拟背景和现实前景结合组成的视频更加自然,显示效果较好。
而且,本实施例提供多种方法获取现实前景深度图,以便技术人员根据实际需求调整获取现实前景深度图的方式,不仅可以通过深度摄像机获取现实前景的深度信息,还可以通过两个现实摄像头获取现实前景的深度信息,增加了方案的灵活性。又因为融合深度图中背景区域的深度信息是通过虚拟背景深度图更新得到的,虚拟背景深度图的深度信息又是由虚拟摄像机采集虚拟环境生成的,这种方式获得的深度信息更加准确,得到的景深效果图更加符合实际需求,表现效果较好。
可选的,本实施例还给出了获取现实前景深度图和虚拟背景深度图的多种可选的实现方式,技术人员可根据实际需要进行选择。比如,通过深度摄像机获取现实前景深度信息;又如,通过两个现实摄像头获取现实前景深度信息,从而增加视频生成方法的灵活性。
可选的,本实施例还给出了融合深度图的具体生成方式,通过在现实前景深度图中逐个替换像素点和深度信息的方式,来实现虚拟背景深度图与现实前景深度图的融合。基于像素点的逐个替换,能够使得融合结果更加准确,从而使得得到的景深效果图的显示效果更好。
可选的,本实施例还给出了景深效果图的具体生成方式。其中,景深效果图可根据现实摄像机的相关信息生成,如根据现实摄像机的预设光圈或预设焦距确定距离区间,随后根据融合深度图和距离区间调整像素点的清晰度,来生成景深效果图。基于此,同样能够使得技术人员根据实际需要选择景深效果图的具体生成方式,进一步增加视频生成方法的灵活性。
在接下来的实施例中,考虑到在一些场景中,现实前景不便于移动,但是又希望可以修改现实前景在目标视频中的显示效果。因此,在接下来的实施例中,还可以对现实前景的深度信息进行调整,使得现实前景的深度信息符合预设的要求。
图10示出了本申请一个示例性实施例提供的基于虚拟现实的视频生成方法的流程示意 图。方法可由图1所示的计算机系统100执行,计算机系统100的相关描述可参考前述内容,不再赘述。
示意性的,本申请实施例提供的基于虚拟现实的视频生成方法,包括如下步骤:
步骤1001:在现实前景深度图的前景区域中,确定属于目标物体的第三像素点。
第三像素点是与目标物体对应的像素点。目标物体是现实环境中的物体。
示意性的,前景区域用于指示现实前景深度图中与现实前景对应的区域,如现实物体和/或真人所对应的区域。与之类似的,背景区域用于指示现实前景深度图中与虚拟背景对应的区域,如LED墙所对应的区域。
可选地,通过深度摄像机提供的深度图和目标视频帧,确定现实前景深度图。示例性的,通过深度摄像机提供的深度图获取目标视频帧内各个像素点的深度信息,该深度信息用于表示目标视频帧中各个像素点对应的现实参考点到现实摄像机的距离;在第一像素点的深度值大于第一深度阈值的情况下,确定该第一像素点属于虚拟背景;在第二像素点的深度值大于第二深度阈值的情况下,确定该第二像素点属于现实前景。其中,第一深度阈值不小于第二深度阈值,第一深度阈值和第二深度阈值可由技术人员自行设置。
可选地,将前景区域内属于深度阈值区间的像素点确定为第三像素点。深度阈值区间可由技术人员自行设置。
可选地,将前景区域内属于目标物体区域的像素点确定为第三像素点。目标物体区域可由技术人员自行设置。可选地,通过选取框在前景区域内确定目标物体区域。
第三像素点是前景区域中的任意一个像素点。
步骤1002:响应于深度值更新指令,更新第三像素点的深度值。
可选地,响应于深度值设置指令,将第三像素点的深度值设置为第一预设深度值。第一预设深度值可由技术人员根据实际需求进行设置。
在一些实施例中,步骤1002可实现为如下:根据目标物体的期望位置确定深度值设置指令,并根据深度值设置指令将第三像素点的深度值设置为第一预设深度值。示例性的,在一些场景中,目标物体不便于移动,或者希望该目标物体的深度值是一个较大的值,但是受场地限制,无法将该目标物体移动到希望的位置(即期望位置),此时,可以选择将目标物体对应的第三像素点的深度值统一设置为第一预设深度值,使得目标物体在景深效果图中的深度信息贴合实际需求,即使得目标物体在景深效果图中位于期望位置上。
可选地,响应于深度值增加指令,为第三像素点的深度值增加第二预设深度值,第二预设深度值可由技术人员根据实际需求进行设置。可选地,响应于深度值减小指令,为第三像素点的深度值减小第三预设深度值,第三预设深度值可由技术人员根据实际需求进行设置。
在另一些实施例中,步骤1002可实现为如下:在目标物体与现实摄像机的距离大于目标物体的期望位置与现实摄像机的距离时,确定深度值增加指令,并根据深度值增加指令为第三像素点的深度值增加第二预设深度值;或者,在目标物体与现实摄像机的距离小于目标物体的期望位置与现实摄像机的距离时,确定深度值减小指令,并根据深度值减小指令为第三像素点的深度值减小第三预设深度值。示例性的,在一些场景中,希望目标物体与其他物体的位置进行交换,或者,希望目标物体能够移动到其它物体的前方,或者,希望目标物体能够移动到其它物体的后方。上述场景均可理解为目标物体的实时位置与期望位置不同,此时,可以改变目标物体对应的第三像素点的深度值,使得生成的景深效果图能够体现出目标物体与其他物体之间的位置关系。例如,现实前景中存在参考物体(可理解为目标物体的期望位置),参考物体对应的像素点对应的深度值是10,说明该参考物体距现实摄像机有10米,而目标物体对应的像素点对应的深度值是15,说明该目标物体距现实摄像机有15米,而实际需求希望现实前景深度图能够体现出目标物体与现实摄像机的距离小于参考物体与现实摄像机的距离,参考物体与现实摄像机的距离可理解为目标物体的期望位置与现实摄像机的距离。因此,可以选择将目标物体对应的第三像素点的深度值减小8(即根据深度值减小指令为第 三像素点的深度值减小第三预设深度值),那么,目标物体对应第三像素点的深度值是7,可以满足上述实际需求。
在一个具体的例子中,现实前景中有一棵树,这棵树距现实摄像机有20米,由于移动这棵树并不方便,但是又希望目标视频的显示效果是能体现出这棵树距现实摄像机有40米,此时,可以让技术人员输入深度值设置指令,将这棵树对应的第三像素点的深度值统一设置为40,这样得到的景深效果图和目标视频的显示效果都能体现出这棵树距现实摄像机有40米,而且,实现这样的显示效果不需要移动这棵树,操作简便,效率高。
在另一个具体的例子中,现实前景中有树A和树B,树A距现实摄像机有20米,树B距现实摄像机有25米,而技术人员希望拍摄出的目标视频中树A在树B的后方,又因为直接移动树A或树B是不太容易实现的方案。此时,技术人员可以通过深度值设置指令,直接将树A的深度值设置为30,此时,在目标视频的显示效果中,树A距现实摄像机有30米,树B距现实摄像机还是25米,满足树A在树B的后方的显示效果。或者,技术人员也可以通过深度增加指令,将树A的深度值增加15,这样得到的树A的深度值是35,此时,在目标视频的显示效果中,树A距现实摄像机有35米,树B距现实摄像机还是25米,满足树A在树B的后方的显示效果。或者,技术人员还可以通过深度值减小指令,将树B的深度值减小10,这样得到的树B的深度值是15,此时,在目标视频的显示效果中,树A距现实摄像机还是20米,树B距现实摄像机有15米,满足树A在树B的后方的显示效果。
综上所述,本实施例可以修改前景区域内的各个第三像素点,使得前景区域内的像素点的深度信息符合要求,不仅可以减少现实前景的物体的移动,而且还使现实前景的深度信息更加准确。
而且,在不便移动目标物体的情况下,可以根据技术人员的实际需求直接调整前景区域内的各个第三像素点的深度信息,使得目标视频或景深效果图中的前景区域能够呈现出技术人员希望的显示效果。
图11示出了本申请一个示例性实施例提供的基于虚拟现实的景深效果图生成方法的示意图。该方法在UE4(Unreal Engine 4,虚幻4引擎)中以插件的形式实现。可选地,该方法也可以在Unity3D(一种实时3D互动内容创作和运营平台,属于创作引擎、开发工具)中以插件的形式实现。本申请实施例对该方法的应用平台不做具体限定。在图11所示的实施例中,基于虚拟现实的景深效果图生成方法通过虚拟制作景深插件1101实现,具体步骤如下所示:
1、插件二线程同步处理。
1.1、现实前景深度处理线程:处理包含现实摄像机1102与深度摄像机1104的数据,包含从原始YUV转RGBA,结合深度摄像机1104提供的深度信息使用opencv(一个跨平台计算机视觉和机器学习软件库)处理得到现实摄像机1102下的现实前景的深度信息。并在这个现实前景深度处理线程里,利用dx11共享纹理,把虚拟背景深度图复制到当前线程,融合成包含现实前景和虚拟背景的深度信息的融合深度图1107,并利用Compute shader(一种计算机技术,可以实现GPU的并行处理,GPU即Graphics Processing Unit,图形处理器)根据现实摄像机1102拍摄到的与融合深度图1107得到景深表现效果1108。
1.2、虚拟背景深度处理线程:在虚拟摄像机1103对应的Render Target(渲染目标)拿到深度信息,生成虚拟背景深度图1106,并复制给现实前景深度处理线程里的共享纹理。
2、在现实摄像机数据处理线程中,先确定现实前景深度图的选择方案。本申请实施例包括以下两种可选的实施方式:
2.1、选择深度摄像机:标定深度摄像机1104的内外参数。根据深度摄像机1104的内外参数和现实摄像机1102的内外参数,把深度摄像机1104拍摄的深度图映射到现实摄像机1102上,得到现实摄像机1102在对应场景下的深度图,即得到现实前景深度图1105。
2.2、选择再加个辅助摄像机:标定辅助摄像机的内外参数,根据辅助摄像机的内外参数 进行图像的立体更正,得到更正的映射关系。根据现实摄像机1102的内外参数进行图像的立体更正,得到更正的映射关系。然后在每帧数据处理中,使用前述的两种映射关系分别对二个摄像机提供的数据进行重构,然后生成视差图,根据视差图得到目标视频帧的深度信息,得到现实前景深度图1105。
3、在虚拟背景深度处理线程中,在得到虚拟摄像机1103拍摄的渲染目标,再从渲染目标得到深度信息,转化此深度信息为对应的现实线性距离,得到虚拟背景深度图1106,然后同步复制给现实前景深度处理线程的另一个纹理中。
4、现实摄像机数据数据处理线程中,融合成融合深度图1107,这样在现实摄像机的成像中,根据对焦距离,知道虚拟背景的像素点的清晰度;根据设定的光圈或焦距,决定显示清晰像素点的最大距离与最小距离,以及模糊区域的模糊程度。根据上述参数匹配融合深度图1107,可以决定像素点如何显示,生成最终的景深效果图1108。
请参考图12,其示出了本申请一个实施例提供的基于虚拟现实的视频生成装置的示意图。上述功能可以由硬件实现,也可以由硬件执行相应的软件实现。该装置1200包括:
获取模块1201,用于从视频帧序列中获取目标视频帧,所述视频帧序列是由现实摄像机采集目标场景得到的,所述目标场景包括现实前景和虚拟背景,所述虚拟背景显示在现实环境中的物理屏幕上;
所述获取模块1201,还用于获取所述目标视频帧的现实前景深度图和虚拟背景深度图,所述现实前景深度图包括所述现实前景到所述现实摄像机的深度信息,所述虚拟背景深度图包括被映射到所述现实环境后的所述虚拟背景到所述现实摄像机的深度信息;
融合模块1202,用于融合所述现实前景深度图和所述虚拟背景深度图,得到融合深度图,所述融合深度图包括所述目标场景内的各个参考点在所述现实环境中到所述现实摄像机的深度信息;
更新模块1203,用于根据所述融合深度图调整所述目标视频帧的显示参数,生成所述目标视频帧的景深效果图;
所述更新模块1203,还用于基于所述目标视频帧的所述景深效果图,生成具有景深效果的目标视频。
在本申请的一个可选设计中,所述现实前景深度图包括与所述虚拟背景对应的背景区域;所述融合模块1202,还用于根据所述虚拟背景深度图内属于所述背景区域的第二像素点的第二深度信息,更新所述现实前景深度图内属于所述背景区域的第一像素点的第一深度信息,得到所述融合深度图。
在本申请的一个可选设计中,所述获取模块1201,还用于在所述虚拟背景深度图中,确定与所述现实前景深度图中属于所述背景区域的第i个第一像素点对应的第j个第二像素点,所述i,j为正整数;使用所述第j个第二像素点的所述第二深度信息,在所述现实前景深度图中替换所述第i个第一像素点的所述第一深度信息;将i更新为i+1,重复上述两个步骤,直至遍历所述现实前景深度图中属于所述背景区域的各个第一像素点的所述第一深度信息,得到所述融合深度图。
在本申请的一个可选设计中,所述获取模块1201,还用于根据所述现实摄像机的内外参数,确定所述背景区域的所述第i个第一像素点在所述物理屏幕上的屏幕坐标;在虚拟环境中,根据所述屏幕坐标确定与所述第i个第一像素点对应的虚拟点的坐标;根据虚拟摄像机的内外参数,将所述虚拟点的坐标映射到所述虚拟背景深度图上,得到所述第j个第二像素点,所述虚拟摄像机用于在所述虚拟环境中拍摄与所述虚拟背景对应的渲染目标。
在本申请的一个可选设计中,所述现实前景深度图包括与所述现实前景对应的前景区域;所述融合模块1202,还用于更新所述现实前景深度图内属于所述前景区域的第三像素点的第三深度信息,所述第三像素点是与目标物体对应的像素点。
在本申请的一个可选设计中,所述融合模块1202,还用于在所述现实前景深度图的所述 前景区域中,确定属于所述目标物体的第三像素点;响应于深度值更新指令,更新所述第三像素点的深度值。
在本申请的一个可选设计中,所述融合模块1202,还用于根据深度值设置指令,将所述第三像素点的深度值设置为第一预设深度值;或,根据深度值增加指令,为所述第三像素点的深度值增加第二预设深度值;或,根据深度值减小指令,为所述第三像素点的深度值减小第三预设深度值。
在本申请的一个可选设计中,所述获取模块1201,还用于根据深度摄像机的内外参数和所述现实摄像机的内外参数,生成所述深度摄像机与所述现实摄像机之间的空间偏移信息;获取所述深度摄像机采集的深度图;根据所述空间偏移信息,将所述深度图的深度信息映射到所述目标视频帧上,得到所述现实前景深度图。
在本申请的一个可选设计中,所述现实摄像机用于从第一角度拍摄所述目标场景;所述获取模块1201,还用于根据参考摄像机的内外参数,获取第一映射关系,所述第一映射关系用于表示所述参考摄像机的摄像机坐标系和现实坐标系之间的映射关系,所述参考摄像机用于从第二角度拍摄所述目标场景,所述第二角度与所述第一角度不同;根据所述现实摄像机的内外参数,获取第二映射关系,所述第二映射关系用于表示所述现实摄像机的摄像机坐标系和所述现实坐标系之间的映射关系;根据所述第一映射关系对所述参考摄像机拍摄的参考图像进行重构,得到重构参考图像;根据所述第二映射关系对所述现实摄像机拍摄的所述目标视频帧进行重构,得到重构目标场景图像;根据所述重构参考图像和所述重构目标场景图像之间的视差,确定所述目标视频帧内各个像素点的深度信息,得到所述现实前景深度图。
在本申请的一个可选设计中,所述获取模块1201,还用于获取虚拟环境中与所述虚拟背景对应的渲染目标;生成虚拟环境中的所述渲染目标的渲染目标深度图,所述渲染目标深度图包括虚拟深度信息,所述虚拟深度信息用于表示在所述虚拟环境中所述渲染目标到虚拟摄像机的距离;将所述渲染目标深度图中的所述虚拟深度信息转化为现实深度信息,得到所述虚拟背景深度图,所述现实深度信息用于表示被映射到所述现实环境中的所述渲染目标到所述现实摄像机的距离。
在本申请的一个可选设计中,所述更新模块1203,还用于根据所述现实摄像机的预设光圈或预设焦距,确定距离区间,所述距离区间用于表示清晰度大于清晰度阈值的像素点对应的参考点到所述现实摄像机的距离;根据所述融合深度图和所述距离区间,调整所述目标视频帧内各个像素点的清晰度,生成所述目标视频帧的所述景深效果图。
在本申请的一个可选设计中,所述更新模块1203,还用于根据所述现实摄像机的对焦距离和所述融合深度图,调整所述目标视频帧内所述虚拟背景对应的区域的清晰度,生成所述目标视频帧的所述景深效果图。
在本申请的一个可选设计中,所述获取模块1201,还用于获取所述视频帧序列对应的的至少两张景深效果图;所述更新模块1203,还用于按照时间顺序排列所述至少两张景深效果图,得到所述视频帧序列对应的景深视频。
综上所述,本申请实施例中,现实摄像机拍摄目标场景,生成视频帧序列。再根据视频帧序列获取目标视频帧,并对目标视频帧的深度信息进行更新,使得目标视频帧包括的深度信息更加准确。由于增加了虚拟背景的深度信息,使得虚拟背景和现实前景结合组成的图更加自然,显示效果较好。
图13是根据一示例性实施例示出的一种计算机设备的结构示意图。计算机设备1300包括中央处理单元(Central Processing Unit,CPU)1301、包括随机存取存储器(Random AccessMemory,RAM)1302和只读存储器(Read-Only Memory,ROM)1303的系统存储器1304,以及连接系统存储器1304和中央处理单元1301的系统总线1305。计算机设备1300还包括帮助计算机设备内的各个器件之间传输信息的基本输入/输出系统(Input/Output,I/O系统)1306,和用于存储操作系统1313、应用程序1314和其他程序模块1315的大容量存储设备 1307。
基本输入/输出系统1306包括有用于显示信息的显示器1308和用于用户输入信息的诸如鼠标、键盘之类的输入设备1309。其中显示器1308和输入设备1309都通过连接到系统总线1305的输入输出控制器1310连接到中央处理单元1301。基本输入/输出系统1306还可以包括输入输出控制器1310以用于接收和处理来自键盘、鼠标、或电子触控笔等多个其他设备的输入。类似地,输入输出控制器1310还提供输出到显示屏、打印机或其他类型的输出设备。
大容量存储设备1307通过连接到系统总线1305的大容量存储控制器(未示出)连接到中央处理单元1301。大容量存储设备1307及其相关联的计算机设备可读介质为计算机设备1300提供非易失性存储。也就是说,大容量存储设备1307可以包括诸如硬盘或者只读光盘(Compact Disc Read-Only Memory,CD-ROM)驱动器之类的计算机设备可读介质(未示出)。
不失一般性,计算机设备可读介质可以包括计算机设备存储介质和通信介质。计算机设备存储介质包括以用于存储诸如计算机设备可读指令、数据结构、程序模块或其他数据等信息的任何方法或技术实现的易失性和非易失性、可移动和不可移动介质。计算机设备存储介质包括RAM、ROM、可擦除可编程只读存储器(Erasable Programmable Read Only Memory,EPROM)、带电可擦可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,EEPROM),CD-ROM、数字视频光盘(Digital Video Disc,DVD)或其他光学存储、磁带盒、磁带、磁盘存储或其他磁性存储设备。当然,本领域技术人员可知计算机设备存储介质不局限于上述几种。上述的系统存储器1304和大容量存储设备1307可以统称为存储器。
根据本公开的各种实施例,计算机设备1300还可以通过诸如因特网等网络连接到网络上的远程计算机设备运行。也即计算机设备1300可以通过连接在系统总线1305上的网络接口单元1311连接到网络1312,或者说,也可以使用网络接口单元1311来连接到其他类型的网络或远程计算机设备系统(未示出)。
存储器还包括一个或者一个以上的程序,一个或者一个以上程序存储于存储器中,中央处理器1301通过执行该一个或一个以上程序来实现上述基于虚拟现实的视频生成方法的全部或者部分步骤。
在示例性实施例中,还提供了一种计算机可读存储介质,计算机可读存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,至少一条指令、至少一段程序、代码集或指令集由处理器加载并执行以实现上述各个方法实施例提供的基于虚拟现实的视频生成方法。
本申请还提供一种计算机可读存储介质,存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,至少一条指令、至少一段程序、代码集或指令集由处理器加载并执行以实现上述方法实施例提供的基于虚拟现实的视频生成方法。
本申请还提供一种计算机程序产品或计算机程序,上述计算机程序产品或计算机程序包括计算机指令,上述计算机指令存储在计算机可读存储介质中。计算机设备的处理器从上述计算机可读存储介质读取上述计算机指令,上述处理器执行上述计算机指令,使得上述计算机设备执行如上方面实施例提供的基于虚拟现实的视频生成方法。
本申请提供了一种芯片,该芯片包括可编程逻辑电路和/或程序指令,当安装有芯片的电子设备运行时,用于实现如上所述的基于虚拟现实的视频生成方法。
本申请提供了一种计算机系统,该计算机系统包括计算机设备、现实摄像机和深度摄像机;其中,现实摄像机用于采集目标视频帧,深度摄像机用于获取目标视频帧的现实前景深度图,计算机设备用于获取虚拟背景深度图、以及生成具有景深效果的目标视频。

Claims (25)

  1. 一种基于虚拟现实的视频生成方法,其中,所述方法由计算机系统执行,所述方法包括:
    从视频帧序列中获取目标视频帧,所述视频帧序列是由现实摄像机采集目标场景得到的,所述目标场景包括现实前景和虚拟背景,所述虚拟背景显示在现实环境中的物理屏幕上;
    获取所述目标视频帧的现实前景深度图和虚拟背景深度图,所述现实前景深度图包括所述现实前景到所述现实摄像机的深度信息,所述虚拟背景深度图包括被映射到所述现实环境后的所述虚拟背景到所述现实摄像机的深度信息;
    融合所述现实前景深度图和所述虚拟背景深度图,得到融合深度图,所述融合深度图包括所述目标场景内的各个参考点在所述现实环境中到所述现实摄像机的深度信息;
    根据所述融合深度图调整所述目标视频帧的显示参数,生成所述目标视频帧的景深效果图;
    基于所述目标视频帧的所述景深效果图,生成具有景深效果的目标视频。
  2. 根据权利要求1所述的方法,其中,所述现实前景深度图包括与所述虚拟背景对应的背景区域;
    所述融合所述现实前景深度图和所述虚拟背景深度图,得到融合深度图,包括:
    根据所述虚拟背景深度图内属于所述背景区域的第二像素点的第二深度信息,更新所述现实前景深度图内属于所述背景区域的第一像素点的第一深度信息,得到所述融合深度图。
  3. 根据权利要求2所述的方法,其中,所述根据所述虚拟背景深度图内属于所述背景区域的第二像素点的第二深度信息,更新所述现实前景深度图内属于所述背景区域的第一像素点的第一深度信息,得到所述融合深度图,包括:
    在所述虚拟背景深度图中,确定与所述现实前景深度图中属于所述背景区域的第i个第一像素点对应的第j个第二像素点,所述i,j为正整数;
    使用所述第j个第二像素点的所述第二深度信息,在所述现实前景深度图中替换所述第i个第一像素点的所述第一深度信息;
    将i更新为i+1,重复上述两个步骤,直至遍历所述现实前景深度图中属于所述背景区域的各个第一像素点的所述第一深度信息,得到所述融合深度图。
  4. 根据权利要求3所述的方法,其中,所述在所述虚拟背景深度图中,确定与所述现实前景深度图中属于所述背景区域的第i个第一像素点对应的第j个第二像素点,包括:
    根据所述现实摄像机的内外参数,确定所述背景区域的所述第i个第一像素点在所述物理屏幕上的屏幕坐标;
    在虚拟环境中,根据所述屏幕坐标确定与所述第i个第一像素点对应的虚拟点的坐标;
    根据虚拟摄像机的内外参数,将所述虚拟点的坐标映射到所述虚拟背景深度图上,得到所述第j个第二像素点,所述虚拟摄像机用于在所述虚拟环境中拍摄与所述虚拟背景对应的渲染目标。
  5. 根据权利要求1至4任一项所述的方法,其中,所述现实前景深度图包括与所述现实前景对应的前景区域;
    所述方法还包括:
    更新所述现实前景深度图内属于所述前景区域的第三像素点的第三深度信息,所述第三像素点是与目标物体对应的像素点。
  6. 根据权利要求5所述的方法,其中,所述第三深度信息包括深度值;
    所述更新所述现实前景深度图内属于所述前景区域的第三像素点的第三深度信息,包括:
    在所述现实前景深度图的所述前景区域中,确定属于所述目标物体的第三像素点;
    响应于深度值更新指令,更新所述第三像素点的深度值。
  7. 根据权利要求6所述的方法,其中,所述响应于深度值更新指令,更新所述第三像素点 的深度值,包括:
    根据深度值设置指令,将所述第三像素点的深度值设置为第一预设深度值;
    或,根据深度值增加指令,为所述第三像素点的深度值增加第二预设深度值;
    或,根据深度值减小指令,为所述第三像素点的深度值减小第三预设深度值。
  8. 根据权利要求1至4任一项所述的方法,其中,所述获取目标场景的现实前景深度图,包括:
    根据深度摄像机的内外参数和所述现实摄像机的内外参数,生成所述深度摄像机与所述现实摄像机之间的空间偏移信息;
    获取所述深度摄像机采集的深度图;
    根据所述空间偏移信息,将所述深度图的深度信息映射到所述目标视频帧上,得到所述现实前景深度图。
  9. 根据权利要求1至4任一项所述的方法,其中,所述现实摄像机用于从第一角度拍摄所述目标场景;
    所述获取目标场景的现实前景深度图,包括:
    根据参考摄像机的内外参数,获取第一映射关系,所述第一映射关系用于表示所述参考摄像机的摄像机坐标系和现实坐标系之间的映射关系,所述参考摄像机用于从第二角度拍摄所述目标场景,所述第二角度与所述第一角度不同;
    根据所述现实摄像机的内外参数,获取第二映射关系,所述第二映射关系用于表示所述现实摄像机的摄像机坐标系和所述现实坐标系之间的映射关系;
    根据所述第一映射关系对所述参考摄像机拍摄的参考图像进行重构,得到重构参考图像;根据所述第二映射关系对所述现实摄像机拍摄的所述目标视频帧进行重构,得到重构目标场景图像;
    根据所述重构参考图像和所述重构目标场景图像之间的视差,确定所述目标视频帧内各个像素点的深度信息,得到所述现实前景深度图。
  10. 根据权利要求1至4任一项所述的方法,其中,所述获取所述目标场景的虚拟背景深度图,包括:
    获取虚拟环境中与所述虚拟背景对应的渲染目标;
    生成虚拟环境中的所述渲染目标的渲染目标深度图,所述渲染目标深度图包括虚拟深度信息,所述虚拟深度信息用于表示在所述虚拟环境中所述渲染目标到虚拟摄像头的距离;
    将所述渲染目标深度图中的所述虚拟深度信息转化为现实深度信息,得到所述虚拟背景深度图,所述现实深度信息用于表示被映射到所述现实环境中的所述渲染目标到所述现实摄像头的距离。
  11. 根据权利要求1至4任一项所述的方法,其中,所述根据所述融合深度图调整所述目标视频帧的显示参数,生成所述目标视频帧的景深效果图,包括:
    根据所述现实摄像机的预设光圈或预设焦距,确定距离区间,所述距离区间用于表示清晰度大于清晰度阈值的像素点对应的参考点到所述现实摄像机的距离;
    根据所述融合深度图和所述距离区间,调整所述目标视频帧内各个像素点的清晰度,生成所述目标视频帧的所述景深效果图。
  12. 根据权利要求1至4任一项所述的方法,其中,所述根据所述融合深度图调整所述目标视频帧的显示参数,生成所述目标视频帧的景深效果图,包括:
    根据所述现实摄像机的对焦距离和所述融合深度图,调整所述目标视频帧内所述虚拟背景对应的区域的清晰度,生成所述目标视频帧的所述景深效果图。
  13. 根据权利要求1至4任一项所述的方法,其中,所述基于所述目标视频帧的所述景深效果图,生成具有景深效果的目标视频,包括:
    按照预设帧率播放所述目标视频帧的所述景深效果图,得到所述具有景深效果的目标视 频。
  14. 一种基于虚拟现实的视频生成装置,其中,所述装置包括:
    获取模块,用于从视频帧序列中获取目标视频帧,所述视频帧序列是由现实摄像机采集目标场景得到的,所述目标场景包括现实前景和虚拟背景,所述虚拟背景显示在现实环境中的物理屏幕上;
    所述获取模块,还用于获取所述目标视频帧的现实前景深度图和虚拟背景深度图,所述现实前景深度图包括所述现实前景到所述现实摄像机的深度信息,所述虚拟背景深度图包括被映射到所述现实环境后的所述虚拟背景到所述现实摄像机的深度信息;
    融合模块,用于融合所述现实前景深度图和所述虚拟背景深度图,得到融合深度图,所述融合深度图包括所述目标场景内的各个参考点在所述现实环境中到所述现实摄像机的深度信息;
    更新模块,用于根据所述融合深度图调整所述目标视频帧的显示参数,生成所述目标视频帧的景深效果图;
    所述更新模块,还用于基于所述目标视频帧的所述景深效果图,生成具有景深效果的目标视频。
  15. 根据权利要求14所述的装置,其中,所述现实前景深度图包括与所述虚拟背景对应的背景区域;所述融合模块,用于根据所述虚拟背景深度图内属于所述背景区域的第二像素点的第二深度信息,更新所述现实前景深度图内属于所述背景区域的第一像素点的第一深度信息,得到所述融合深度图。
  16. 根据权利要求15所述的装置,其中,所述融合模块,用于在所述虚拟背景深度图中,确定与所述现实前景深度图中属于所述背景区域的第i个第一像素点对应的第j个第二像素点,所述i,j为正整数;使用所述第j个第二像素点的所述第二深度信息,在所述现实前景深度图中替换所述第i个第一像素点的所述第一深度信息;将i更新为i+1,重复上述两个步骤,直至遍历所述现实前景深度图中属于所述背景区域的各个第一像素点的所述第一深度信息,得到所述融合深度图。
  17. 根据权利要求14至16任一所述的装置,其中,所述现实前景深度图包括与所述现实前景对应的前景区域;所述更新模块,还用于更新所述现实前景深度图内属于所述前景区域的第三像素点的第三深度信息,所述第三像素点是与目标物体对应的像素点。
  18. 根据权利要求14至16任一所述的装置,其中,所述获取模块,用于根据深度摄像机的内外参数和所述现实摄像机的内外参数,生成所述深度摄像机与所述现实摄像机之间的空间偏移信息;获取所述深度摄像机采集的深度图;根据所述空间偏移信息,将所述深度图的深度信息映射到所述目标视频帧上,得到所述现实前景深度图。
  19. 根据权利要求14至16任一所述的装置,其中,所述现实摄像机用于从第一角度拍摄所述目标场景,所述获取模块,用于根据参考摄像机的内外参数,获取第一映射关系,所述第一映射关系用于表示所述参考摄像机的摄像机坐标系和现实坐标系之间的映射关系,所述参考摄像机用于从第二角度拍摄所述目标场景,所述第二角度与所述第一角度不同;根据所述现实摄像机的内外参数,获取第二映射关系,所述第二映射关系用于表示所述现实摄像机的摄像机坐标系和所述现实坐标系之间的映射关系;根据所述第一映射关系对所述参考摄像机拍摄的参考图像进行重构,得到重构参考图像;根据所述第二映射关系对所述现实摄像机拍摄的所述目标视频帧进行重构,得到重构目标场景图像;根据所述重构参考图像和所述重构目标场景图像之间的视差,确定所述目标视频帧内各个像素点的深度信息,得到所述现实前景深度图。
  20. 根据权利要求14至16任一所述的装置,其中,所述获取模块,用于获取虚拟环境中与所述虚拟背景对应的渲染目标;生成虚拟环境中的所述渲染目标的渲染目标深度图,所述渲染目标深度图包括虚拟深度信息,所述虚拟深度信息用于表示在所述虚拟环境中所述渲染 目标到虚拟摄像头的距离;将所述渲染目标深度图中的所述虚拟深度信息转化为现实深度信息,得到所述虚拟背景深度图,所述现实深度信息用于表示被映射到所述现实环境中的所述渲染目标到所述现实摄像头的距离。
  21. 一种计算机设备,其中,所述计算机设备包括:处理器和存储器,所述存储器中存储有至少一段程序,所述至少一段程序由所述处理器加载并执行以实现如权利要求1至13中任一项所述的基于虚拟现实的视频生成方法。
  22. 一种计算机可读存储介质,其中,所述计算机可读存储介质中存储有至少一条程序代码,所述程序代码由处理器加载并执行以实现如权利要求1至13中任一项所述的基于虚拟现实的视频生成方法。
  23. 一种计算机程序产品,包括计算机程序或指令,其中,所述计算机程序或指令被处理器执行时实现权利要求1至13中任一项所述的基于虚拟现实的视频生成方法。
  24. 一种芯片,其中,所述芯片包括可编程逻辑电路和/或程序,当安装有所述芯片的电子设备运行时,用于实现如权利要求1至13任一所述的基于虚拟现实的视频生成方法。
  25. 一种计算机系统,其中,所述计算机系统包括计算机设备、现实摄像机和深度摄像机;
    其中,所述现实摄像机用于采集目标视频帧,所述深度摄像机用于获取所述目标视频帧的现实前景深度图,所述计算机设备用于获取虚拟背景深度图、以及生成具有景深效果的目标视频。
PCT/CN2023/083335 2022-04-28 2023-03-23 基于虚拟现实的视频生成方法、装置、设备及介质 WO2023207452A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210463334.7 2022-04-28
CN202210463334.7A CN116527863A (zh) 2022-04-28 2022-04-28 基于虚拟现实的视频生成方法、装置、设备及介质

Publications (1)

Publication Number Publication Date
WO2023207452A1 true WO2023207452A1 (zh) 2023-11-02

Family

ID=87389138

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/083335 WO2023207452A1 (zh) 2022-04-28 2023-03-23 基于虚拟现实的视频生成方法、装置、设备及介质

Country Status (2)

Country Link
CN (1) CN116527863A (zh)
WO (1) WO2023207452A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117395450A (zh) * 2023-12-11 2024-01-12 飞狐信息技术(天津)有限公司 虚拟直播方法、装置、终端和存储介质
CN117994444A (zh) * 2024-04-03 2024-05-07 浙江华创视讯科技有限公司 复杂场景的重建方法、设备及存储介质

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117596497A (zh) * 2023-09-28 2024-02-23 书行科技(北京)有限公司 图像渲染方法、装置、电子设备和计算机可读存储介质
CN117278731B (zh) * 2023-11-21 2024-05-28 启迪数字科技(深圳)有限公司 多视频与三维场景融合方法、装置、设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112712487A (zh) * 2020-12-23 2021-04-27 北京软通智慧城市科技有限公司 一种场景视频融合方法、系统、电子设备及存储介质
CN112929627A (zh) * 2021-02-22 2021-06-08 广州博冠信息科技有限公司 虚拟现实场景实现方法、装置、存储介质及电子设备
CN113256781A (zh) * 2021-06-17 2021-08-13 腾讯科技(深圳)有限公司 虚拟场景的渲染和装置、存储介质及电子设备
US11107195B1 (en) * 2019-08-23 2021-08-31 Lucasfilm Entertainment Company Ltd. Motion blur and depth of field for immersive content production systems
WO2021208648A1 (zh) * 2020-04-17 2021-10-21 Oppo广东移动通信有限公司 虚拟对象调整方法、装置、存储介质与增强现实设备
CN113840049A (zh) * 2021-09-17 2021-12-24 阿里巴巴(中国)有限公司 图像处理方法、视频流场景切换方法、装置、设备及介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11107195B1 (en) * 2019-08-23 2021-08-31 Lucasfilm Entertainment Company Ltd. Motion blur and depth of field for immersive content production systems
WO2021208648A1 (zh) * 2020-04-17 2021-10-21 Oppo广东移动通信有限公司 虚拟对象调整方法、装置、存储介质与增强现实设备
CN112712487A (zh) * 2020-12-23 2021-04-27 北京软通智慧城市科技有限公司 一种场景视频融合方法、系统、电子设备及存储介质
CN112929627A (zh) * 2021-02-22 2021-06-08 广州博冠信息科技有限公司 虚拟现实场景实现方法、装置、存储介质及电子设备
CN113256781A (zh) * 2021-06-17 2021-08-13 腾讯科技(深圳)有限公司 虚拟场景的渲染和装置、存储介质及电子设备
CN113840049A (zh) * 2021-09-17 2021-12-24 阿里巴巴(中国)有限公司 图像处理方法、视频流场景切换方法、装置、设备及介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117395450A (zh) * 2023-12-11 2024-01-12 飞狐信息技术(天津)有限公司 虚拟直播方法、装置、终端和存储介质
CN117994444A (zh) * 2024-04-03 2024-05-07 浙江华创视讯科技有限公司 复杂场景的重建方法、设备及存储介质

Also Published As

Publication number Publication date
CN116527863A (zh) 2023-08-01

Similar Documents

Publication Publication Date Title
WO2023207452A1 (zh) 基于虚拟现实的视频生成方法、装置、设备及介质
US11076142B2 (en) Real-time aliasing rendering method for 3D VR video and virtual three-dimensional scene
CN109658365B (zh) 图像处理方法、装置、系统和存储介质
US11425283B1 (en) Blending real and virtual focus in a virtual display environment
US11227428B2 (en) Modification of a live-action video recording using volumetric scene reconstruction to replace a designated region
JP2016537901A (ja) ライトフィールド処理方法
CN112446939A (zh) 三维模型动态渲染方法、装置、电子设备及存储介质
US11165957B2 (en) Reconstruction of obscured views in captured imagery using user-selectable pixel replacement from secondary imagery
US10163250B2 (en) Arbitrary view generation
EP3057316B1 (en) Generation of three-dimensional imagery to supplement existing content
JP3387856B2 (ja) 画像処理方法、画像処理装置および記憶媒体
JP7366563B2 (ja) 画像生成装置、画像生成方法、及びプログラム
KR100893855B1 (ko) 3차원 포그라운드와 2차원 백그라운드 결합 방법 및 3차원어플리케이션 엔진
US11627297B1 (en) Method for image processing of image data for a two-dimensional display wall with three-dimensional objects
WO2024042893A1 (ja) 情報処理装置、情報処理方法、およびプログラム
Abad et al. Integrating synthetic objects into real scenes
Grau et al. New production tools for the planning and the on-set visualisation of virtual and real scenes
JP2023540647A (ja) 映画産業向けプリビジュアライゼーション・デバイス及びシステム
Grau et al. O. Razzoli2, A. Sarti1, L. Spallarossa5, S. Tubaro1, J. Woetzel4
Milne et al. The ORIGAMI project: advanced tools for creating and mixing real and virtual content in film and TV production
Evers-Senne et al. The ORIGAMI project: advanced tools for creating and mixing real and virtual content in film and TV production
de Sorbier et al. Depth Camera to Generate On-line Content for Auto-Stereoscopic Displays

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23794887

Country of ref document: EP

Kind code of ref document: A1