CN116109974A - Volumetric video display method and related equipment - Google Patents

Volumetric video display method and related equipment Download PDF

Info

Publication number
CN116109974A
CN116109974A CN202310074198.7A CN202310074198A CN116109974A CN 116109974 A CN116109974 A CN 116109974A CN 202310074198 A CN202310074198 A CN 202310074198A CN 116109974 A CN116109974 A CN 116109974A
Authority
CN
China
Prior art keywords
video
virtual
target
volume
virtual scene
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310074198.7A
Other languages
Chinese (zh)
Inventor
孙伟
罗栋藩
邵志兢
张煜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhuhai Prometheus Vision Technology Co ltd
Original Assignee
Zhuhai Prometheus Vision Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhuhai Prometheus Vision Technology Co ltd filed Critical Zhuhai Prometheus Vision Technology Co ltd
Priority to CN202310074198.7A priority Critical patent/CN116109974A/en
Publication of CN116109974A publication Critical patent/CN116109974A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Ophthalmology & Optometry (AREA)
  • Human Computer Interaction (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The application discloses a method and related equipment for displaying a volume video, which can collect human eye images of a target user currently watching the volume video when the volume video is played; analyzing the eye image in the direction of sight to determine the viewpoint position of the target user on the video picture of the volume video; and adjusting a virtual viewing angle corresponding to the virtual scene in the volume video according to the viewpoint position, so that the target user views the volume video from the adjusted virtual viewing angle. The virtual viewing angle of the volume video can be controlled based on the direction of the user's sight, so that the volume video with different angles is presented, and immersive viewing experience is brought to the user.

Description

Volumetric video display method and related equipment
Technical Field
The application relates to the technical field of computers, in particular to a volumetric video display method and related equipment.
Background
With the development of computer technology, multimedia is increasingly widely used, and various videos are continuously emerging on a network. People generally relax by watching video. However, the conventional video is mainly shot by a shooting device at a fixed angle, and when the video is played, a playing device such as a television can only play video content at the fixed angle, so that the immersive viewing experience is difficult for a user.
Disclosure of Invention
The embodiment of the application provides a volumetric video display method and related equipment, wherein the related equipment can comprise a volumetric video display device, electronic equipment, a computer-readable storage medium and a computer program product, and can control the virtual viewing angle of the volumetric video based on the direction of the line of sight of a user, so that the volumetric video with different angles is displayed, and immersive viewing experience is brought to the user.
The embodiment of the application provides a method for displaying a volume video, which comprises the following steps:
when a volume video is played, acquiring a human eye image of a target user currently watching the volume video;
analyzing the eye image in the direction of sight to determine the viewpoint position of the target user on the video picture of the volume video;
and adjusting a virtual viewing angle corresponding to the virtual scene in the volume video according to the viewpoint position, so that the target user views the volume video from the adjusted virtual viewing angle.
Accordingly, an embodiment of the present application provides a volumetric video display device, including:
the system comprises an acquisition unit, a display unit and a display unit, wherein the acquisition unit is used for acquiring a human eye image of a target user currently watching the volume video when the volume video is played;
The analysis unit is used for analyzing the eye direction of the eye image and determining the viewpoint position of the target user on the video picture of the volume video;
and the adjusting unit is used for adjusting the virtual viewing angle corresponding to the virtual scene in the volume video according to the viewpoint position so that the target user views the volume video from the adjusted virtual viewing angle.
Optionally, in some embodiments of the present application, the analysis unit may include a detection subunit, a first determination subunit, and a second determination subunit, as follows:
the detection subunit is used for detecting a cornea reflection area and a pupil area in the human eye image;
a first determination subunit configured to determine a line-of-sight direction of the target user according to a positional relationship between the cornea reflection area and the pupil area;
and a second determining subunit, configured to determine, based on the line of sight direction, a viewpoint position of the target user on a video picture of the volumetric video.
Optionally, in some embodiments of the present application, the adjusting unit may include an adjusting subunit, a third determining subunit, and a display subunit, as follows:
The adjusting subunit is configured to adjust, according to the viewpoint position, a virtual viewing angle corresponding to a virtual scene in the volumetric video to obtain a target virtual viewing angle;
a third determining subunit, configured to determine, based on the target virtual viewing angle, a target angle at which a model corresponding to a virtual scene in the volumetric video needs to be rotated;
the display subunit is used for rendering and displaying a target virtual scene picture so that the target user can watch the volume video from the adjusted virtual watching view angle, wherein the target virtual scene picture is a virtual scene picture corresponding to the model corresponding to the virtual scene after the model corresponding to the virtual scene rotates by the target angle.
Optionally, in some embodiments of the present application, the volumetric video display apparatus may further include a first display unit, as follows:
the first display unit is used for collecting a target image containing the target user when the volume video is played; identifying, by the target image, location information of the target user relative to a video frame of the volumetric video; and according to the position information, adjusting a virtual viewing angle corresponding to the virtual scene in the volume video, so that the target user views the volume video from the adjusted virtual viewing angle.
Optionally, in some embodiments of the present application, the virtual scene of the volumetric video includes at least one virtual object;
the volumetric video presentation device may further comprise a second presentation unit, as follows:
the second display unit is used for determining a target virtual object from the at least one virtual object according to the object visual field to be watched by the target user; adjusting a virtual viewing angle corresponding to a virtual scene in the volume video to be a target viewing angle of the target virtual object in the virtual scene; and updating a video picture showing the volume video, wherein the updated video picture comprises a virtual scene picture under the target visual angle of the target virtual object.
Optionally, in some embodiments of the present application, the virtual scene of the volumetric video includes at least one virtual object; the volumetric video display apparatus may further comprise a third display unit, as follows:
the third display unit is configured to determine a virtual scene picture under a view angle of each virtual object in the virtual scene; and displaying the virtual scene pictures of each virtual object under the view angle in the video page corresponding to the volume video.
Optionally, in some embodiments of the present application, the volumetric video display apparatus may further include a fourth display unit, as follows:
the fourth display unit is used for detecting a target viewpoint position of the target user on a video picture of the volume video; determining a target virtual scene picture from the virtual scene pictures under the view angles of all the virtual objects according to the target viewpoint positions; and amplifying and displaying the target virtual scene picture in the video page, and reducing and displaying other virtual scene pictures except the target virtual scene picture.
The electronic device provided by the embodiment of the application comprises a processor and a memory, wherein the memory stores a plurality of instructions, and the processor loads the instructions to execute the steps in the method for displaying the volume video.
The embodiment of the application also provides a computer readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps in the volumetric video presentation method provided by the embodiment of the application.
In addition, the embodiment of the application further provides a computer program product, which comprises a computer program or instructions, and the computer program or instructions realize the steps in the volumetric video presentation method provided by the embodiment of the application when being executed by a processor.
The embodiment of the application provides a method for displaying a volume video and related equipment, which can collect human eye images of a target user currently watching the volume video when the volume video is played; analyzing the eye image in the direction of sight to determine the viewpoint position of the target user on the video picture of the volume video; and adjusting a virtual viewing angle corresponding to the virtual scene in the volume video according to the viewpoint position, so that the target user views the volume video from the adjusted virtual viewing angle. The virtual viewing angle of the volume video can be controlled based on the direction of the user's sight, so that the volume video with different angles is presented, and immersive viewing experience is brought to the user.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1a is a schematic view of a scenario of a volumetric video presentation method according to an embodiment of the present application;
FIG. 1b is a flow chart of a volumetric video presentation method provided by an embodiment of the present application;
FIG. 2 is another flow chart of a volumetric video presentation method provided in an embodiment of the present application;
fig. 3 is a schematic structural diagram of a volumetric video display device according to an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.
Embodiments of the present application provide a volumetric video presentation method and related apparatus, which may include a volumetric video presentation device, an electronic device, a computer-readable storage medium, and a computer program product. The volume video display device can be integrated in electronic equipment, and the electronic equipment can be a terminal or a server and other equipment.
As shown in fig. 1a, an example is a method in which a terminal and a server perform a volumetric video presentation together. The volumetric video display system provided by the embodiment of the application comprises a terminal 10, a server 11 and the like; the terminal 10 and the server 11 are connected via a network, e.g. a wired or wireless network connection, etc., wherein the volumetric video presentation device may be integrated in the terminal.
Wherein, terminal 10 can be used for: when a volume video is played, acquiring a human eye image of a target user currently watching the volume video; analyzing the eye image in the direction of sight to determine the viewpoint position of the target user on the video picture of the volume video; and adjusting a virtual viewing angle corresponding to the virtual scene in the volume video according to the viewpoint position, so that the target user views the volume video from the adjusted virtual viewing angle. The terminal 10 may include, among other things, a cell phone, a tablet computer, an electronic watch, an electronic bracelet, etc. A client may also be provided on the terminal 10, which may be an application client or the like.
Wherein, the server 11 can be used for: receiving a human eye image of a target user sent by a terminal 10, analyzing the line of sight direction of the human eye image, and determining the viewpoint position of the target user on a video picture of the volume video; and according to the viewpoint position, adjusting a virtual viewing angle corresponding to the virtual scene in the volume video to obtain a target virtual scene picture corresponding to the adjusted virtual viewing angle, and sending the target virtual scene picture to the terminal 10 so that the target user views the volume video from the adjusted virtual viewing angle. The server 11 may be a single server, or may be a server cluster or cloud server composed of a plurality of servers.
The embodiment of the application provides a volumetric video display method, which relates to a computer vision technology in the field of artificial intelligence.
Among these, artificial intelligence (AI, artificial Intelligence) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and expand human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision. The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning, automatic driving, intelligent traffic and other directions.
The Computer Vision technology (CV) is a science for researching how to make a machine "look at", and more specifically, a camera and a Computer are used to replace human eyes to perform machine Vision such as identifying, tracking and measuring on a target, and further perform graphic processing, so that the Computer is processed into an image more suitable for human eyes to observe or transmit to an instrument to detect. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision technologies typically include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technology, virtual reality, augmented reality, synchronous positioning and mapping, autopilot, intelligent transportation, etc., as well as common biometric technologies such as face recognition, fingerprint recognition, etc.
The following will describe in detail. The following description of the embodiments is not intended to limit the preferred embodiments.
The present embodiment will be described from the perspective of a volumetric video presentation apparatus, which may be integrated in an electronic device, which may be a server or a terminal, for example.
It will be appreciated that in the specific embodiments of the present application, related data such as user information is referred to, and when the above embodiments of the present application are applied to specific products or technologies, user permissions or consents need to be obtained, and the collection, use and processing of related data need to comply with related laws and regulations and standards of related countries and regions.
Volumetric Video (also known as Volumetric Video, spatial Video, volumetric three-dimensional Video, or 6-degree-of-freedom Video, etc.) is a technique that generates a sequence of three-dimensional models by capturing information (e.g., depth information, color information, etc.) in three-dimensional space. Compared with the traditional video, the volumetric video adds the concept of space into the video, and the three-dimensional model is used for better restoring the real three-dimensional world, instead of using a two-dimensional plane video plus a mirror to simulate the sense of space of the real three-dimensional world. Because the volume video is a three-dimensional model sequence, a user can adjust to any visual angle to watch according to own preference, and compared with a two-dimensional plane video, the volume video has higher reduction degree and immersion sense.
Alternatively, in the present application, the three-dimensional model used to construct the volumetric video may be reconstructed as follows:
Firstly, color images and depth images of different visual angles of a shooting object and camera parameters corresponding to the color images are acquired; and training a neural network model implicitly expressing a three-dimensional model of the shooting object according to the acquired color image and the corresponding depth image and camera parameters, and extracting an isosurface based on the trained neural network model to realize three-dimensional reconstruction of the shooting object so as to obtain the three-dimensional model of the shooting object.
It should be noted that, in the embodiments of the present application, the neural network model of which architecture is adopted is not particularly limited, and may be selected by those skilled in the art according to actual needs. For example, a multi-layer perceptron (Multilayer Perceptron, MLP) without a normalization layer may be selected as a base model for model training.
The three-dimensional model reconstruction method provided in the present application will be described in detail below.
Firstly, a plurality of color cameras and depth cameras can be synchronously adopted to shoot a target object (the target object is a shooting object) which needs to be subjected to three-dimensional reconstruction at multiple visual angles, so as to obtain color images and corresponding depth images of the target object at multiple different visual angles, namely, at the same shooting moment (the difference value of actual shooting moments is smaller than or equal to a time threshold, namely, the shooting moments are considered to be the same), the color cameras at all visual angles shoot to obtain color images of the target object at the corresponding visual angles, and correspondingly, the depth cameras at all visual angles shoot to obtain depth images of the target object at the corresponding visual angles. The target object may be any object, including but not limited to living objects such as a person, an animal, and a plant, or inanimate objects such as a machine, furniture, and a doll.
Therefore, the color images of the target object at different visual angles are provided with the corresponding depth images, namely, when shooting, the color cameras and the depth cameras can adopt the configuration of a camera set, and the color cameras at the same visual angle are matched with the depth cameras to synchronously shoot the same target object. For example, a studio may be built, in which a central area is a photographing area, around which a plurality of sets of color cameras and depth cameras are paired at a certain angle interval in a horizontal direction and a vertical direction. When the target object is in the shooting area surrounded by the color cameras and the depth cameras, the color images and the corresponding depth images of the target object at different visual angles can be obtained through shooting by the color cameras and the depth cameras.
In addition, camera parameters of the color camera corresponding to each color image are further acquired. The camera parameters include internal parameters and external parameters of the color camera, which can be determined through calibration, wherein the internal parameters of the color camera are parameters related to the characteristics of the color camera, including but not limited to data such as focal length and pixels of the color camera, and the external parameters of the color camera are parameters of the color camera in a world coordinate system, including but not limited to data such as position (coordinates) of the color camera and rotation direction of the camera.
As described above, after obtaining the color images of the target object at different viewing angles and the corresponding depth images thereof at the same shooting time, the three-dimensional reconstruction of the target object can be performed according to the color images and the corresponding depth images thereof. Different from the mode of converting depth information into point cloud to perform three-dimensional reconstruction in the related technology, the method and the device train a neural network model to achieve implicit expression of the three-dimensional model of the target object, so that three-dimensional reconstruction of the target object is achieved based on the neural network model.
Optionally, the application selects a multi-layer perceptron (Multilayer Perceptron, MLP) that does not include a normalization layer as the base model, and trains as follows:
converting pixel points in each color image into rays based on corresponding camera parameters;
sampling a plurality of sampling points on the rays, and determining first coordinate information of each sampling point and an SDF value of each sampling point from a pixel point;
inputting the first coordinate information of the sampling points into a basic model to obtain a predicted SDF value and a predicted RGB color value of each sampling point output by the basic model;
based on a first difference between the predicted SDF value and the SDF value and a second difference between the predicted RGB color value and the RGB color value of the pixel point, adjusting parameters of the basic model until a preset stop condition is met;
And taking the basic model meeting the preset stopping condition as a neural network model of the three-dimensional model of the implicitly expressed target object.
Firstly, converting a pixel point in a color image into a ray based on camera parameters corresponding to the color image, wherein the ray can be a ray passing through the pixel point and perpendicular to a color image surface; then, sampling a plurality of sampling points on the ray, wherein the sampling process of the sampling points can be executed in two steps, partial sampling points can be uniformly sampled firstly, and then the plurality of sampling points are further sampled at a key position based on the depth value of the pixel point, so that the condition that the sampling points near the surface of the model can be sampled as many as possible is ensured; then, calculating first coordinate information of each sampling point in a world coordinate system and a directional distance (Signed Distance Field, SDF) value of each sampling point according to camera parameters and depth values of the pixel points, wherein the SDF value can be a difference value between the depth value of the pixel point and the distance between the sampling point and an imaging surface of a camera, the difference value is a signed value, when the difference value is a positive value, the sampling point is outside the three-dimensional model, when the difference value is a negative value, the sampling point is inside the three-dimensional model, and when the difference value is zero, the sampling point is on the surface of the three-dimensional model; then, after sampling of the sampling points is completed and the SDF value corresponding to each sampling point is obtained through calculation, first coordinate information of the sampling points in a world coordinate system is further input into a basic model (the basic model is configured to map the input coordinate information into the SDF value and the RGB color value and then output), the SDF value output by the basic model is recorded as a predicted SDF value, and the RGB color value output by the basic model is recorded as a predicted RGB color value; then, parameters of the basic model are adjusted based on a first difference between the predicted SDF value and the SDF value corresponding to the sampling point and a second difference between the predicted RGB color value and the RGB color value of the pixel point corresponding to the sampling point.
In addition, for other pixel points in the color image, sampling is performed in the above manner, and then coordinate information of the sampling point in the world coordinate system is input to the basic model to obtain a corresponding predicted SDF value and a predicted RGB color value, which are used for adjusting parameters of the basic model until a preset stopping condition is met, for example, the preset stopping condition may be configured to reach a preset number of iterations of the basic model, or the preset stopping condition may be configured to converge the basic model. When the iteration of the basic model meets the preset stopping condition, the neural network model which can accurately and implicitly express the three-dimensional model of the shooting object is obtained. Finally, an isosurface extraction algorithm can be adopted to extract the three-dimensional model surface of the neural network model, so that a three-dimensional model of the shooting object is obtained.
Optionally, in some embodiments, determining an imaging plane of the color image based on camera parameters; and determining that the rays passing through the pixel points in the color image and perpendicular to the imaging surface are rays corresponding to the pixel points.
The coordinate information of the color image in the world coordinate system, namely the imaging surface, can be determined according to the camera parameters of the color camera corresponding to the color image. Then, it can be determined that the ray passing through the pixel point in the color image and perpendicular to the imaging plane is the ray corresponding to the pixel point.
Optionally, in some embodiments, determining second coordinate information and rotation angle of the color camera in the world coordinate system according to the camera parameters; and determining an imaging surface of the color image according to the second coordinate information and the rotation angle.
Optionally, in some embodiments, the first number of first sampling points are equally spaced on the ray; determining a plurality of key sampling points according to the depth values of the pixel points, and sampling a second number of second sampling points according to the key sampling points; the first number of first sampling points and the second number of second sampling points are determined as a plurality of sampling points obtained by sampling on the rays.
Firstly uniformly sampling n (i.e. a first number) first sampling points on rays, wherein n is a positive integer greater than 2; then, according to the depth value of the pixel point, determining a preset number of key sampling points closest to the pixel point from n first sampling points, or determining key sampling points smaller than a distance threshold from the pixel point from n first sampling points; then, resampling m second sampling points according to the determined key sampling points, wherein m is a positive integer greater than 1; and finally, determining the n+m sampling points obtained by sampling as a plurality of sampling points obtained by sampling on the rays. The m sampling points are sampled again at the key sampling points, so that the training effect of the model is more accurate at the surface of the three-dimensional model, and the reconstruction accuracy of the three-dimensional model is improved.
Optionally, in some embodiments, determining a depth value corresponding to the pixel point according to a depth image corresponding to the color image; calculating an SDF value of each sampling point from the pixel point based on the depth value; and calculating coordinate information of each sampling point according to the camera parameters and the depth values.
After a plurality of sampling points are sampled on the rays corresponding to each pixel point, for each sampling point, determining the distance between the shooting position of the color camera and the corresponding point on the target object according to the camera parameters and the depth value of the pixel point, and then calculating the SDF value of each sampling point one by one and the coordinate information of each sampling point based on the distance.
After the training of the basic model is completed, for the coordinate information of any given point, the corresponding SDF value of the basic model after the training is completed can be predicted by the basic model after the training is completed, and the predicted SDF value indicates the position relationship (internal, external or surface) between the point and the three-dimensional model of the target object, so as to realize the implicit expression of the three-dimensional model of the target object and obtain the neural network model for implicitly expressing the three-dimensional model of the target object.
Finally, performing iso-surface extraction on the neural network model, for example, drawing the surface of the three-dimensional model by using an iso-surface extraction algorithm (MC), so as to obtain the surface of the three-dimensional model, and further obtaining the three-dimensional model of the target object according to the surface of the three-dimensional model.
According to the three-dimensional reconstruction scheme, the three-dimensional model of the target object is implicitly modeled through the neural network, and depth information is added to improve the training speed and accuracy of the model. By adopting the three-dimensional reconstruction scheme provided by the application, the three-dimensional reconstruction is continuously carried out on the shooting object in time sequence, so that three-dimensional models of the shooting object at different moments can be obtained, and a three-dimensional model sequence formed by the three-dimensional models at different moments according to the time sequence is the volume video shot by the shooting object. Therefore, the volume video shooting can be carried out on any shooting object, and the volume video with specific content presentation can be obtained. For example, the dance shooting object can be shot with a volume video to obtain a volume video of dance of the shooting object at any angle, the teaching shooting object can be shot with a volume video to obtain a teaching volume video of the shooting object at any angle, and the like.
It should be noted that, the volume video according to the following embodiments of the present application may be obtained by shooting using the above volume video shooting method.
As shown in fig. 1b, the specific flow of the volumetric video display method may be as follows:
101. When a volume video is played, a human eye image of a target user currently watching the volume video is acquired.
Among them, a volume Video (also called Volumetric Video, spatial Video, volumetric three-dimensional Video, or 6-degree-of-freedom Video, etc.), is a technique for capturing information (such as depth information and color information, etc.) in a three-dimensional space and generating a three-dimensional model sequence.
The human eye image can be specifically an image acquired by a front camera of the intelligent terminal playing the volume video, the front camera can be camera hardware integrated on the intelligent terminal, and the hardware is located on the same side of a main display screen of the intelligent terminal. In some embodiments, an infrared LED (Light Emitting Diode ) may be used to illuminate the face of the target user and captured by an infrared camera to obtain a human eye image.
The human eye image may be an image containing eyes of the target user.
102. And analyzing the eye direction of the eye image, and determining the viewpoint position of the target user on the video picture of the volume video.
The eye image may be preprocessed before the eye image is analyzed in the line of sight direction, and the preprocessing may be an eye region segmentation processing of the eye image. In some embodiments, the eye area may be specifically taken out from the human eye image, and the eye area is determined as the processed human eye image. In other embodiments, the eye region segmentation process of the human eye image may specifically be an identification process of the eye region of the human eye image, and in particular, the eye region identification process of the human eye image may be performed by a neural Network model, which may be a visual geometry group Network (VGGNet, visual Geometry Group Network), a Residual Network (Residual Network), a dense connection convolution Network (DenseNet, dense Convolutional Network), or the like, but it should be understood that the neural Network of the present embodiment is not limited to only the above-listed types.
Wherein the eye region may be a region that contains the eye portion of the target user, but does not contain other facial organs.
Optionally, in some embodiments, before the eye image is analyzed in the line of sight direction, the eye image may be subjected to image enhancement processing, so as to improve the resolution of the eye image. Image data enhancement may include histogram equalization, sharpening, smoothing, and the like.
In addition, before the eye image is analyzed in the line of sight direction, the eye image can be subjected to image anti-distortion correction processing to obtain an undistorted eye image.
Optionally, in this embodiment, the step of "analyzing the eye direction of the eye image to determine the viewpoint position of the target user on the video frame of the volumetric video" may include:
detecting a cornea reflection area and a pupil area in the human eye image;
determining a sight line direction of the target user according to the position relation between the cornea reflection area and the pupil area;
based on the gaze direction, a viewpoint position of the target user on a video picture of the volumetric video is determined.
Specifically, from an image of human eyes, a white circular portion of the eyes can be detected as a cornea reflection region, and a black circular portion of the eyes can be detected as a pupil region. And then determining the sight direction of the user according to the position relation between the cornea reflection area and the pupil area.
Alternatively, in some embodiments, a machine model method may be used when determining the direction of the user's line of sight, and a look-up table method may be used.
103. And adjusting a virtual viewing angle corresponding to the virtual scene in the volume video according to the viewpoint position, so that the target user views the volume video from the adjusted virtual viewing angle.
Optionally, in this embodiment, the step of adjusting, according to the viewpoint position, a virtual viewing angle corresponding to a virtual scene in the volumetric video so that the target user views the volumetric video from the adjusted virtual viewing angle may include:
according to the viewpoint position, adjusting a virtual viewing angle corresponding to a virtual scene in the volume video to obtain a target virtual viewing angle;
determining a target angle at which a model corresponding to a virtual scene in the volume video needs to rotate based on the target virtual viewing angle;
rendering and displaying a target virtual scene picture so that the target user views the volume video from the adjusted virtual viewing angle, wherein the target virtual scene picture is a virtual scene picture corresponding to the model corresponding to the virtual scene after the model corresponding to the virtual scene rotates by the target angle.
The virtual viewing angle may be specifically understood as a presentation angle of a model corresponding to a virtual scene in the volumetric video. The target virtual scene picture may be a front view, i.e. a front view, of the model to which the virtual scene corresponds.
In a specific scene, when a target user looks at the front, the volume video is at a fixed viewing angle, and when the target user turns left, the sight line direction is left, and when the viewpoint position of the target user on the video picture of the volume video is moved left, the volume video is rotated right, so that a virtual viewing point (corresponding to the virtual viewing angle) is moved to the left side of the volume video; when the target user turns right, the sight line direction is right, the sight point position of the target user on the video picture of the volume video moves right, and the volume video rotates left, so that the virtual sight point moves to the right side of the volume video; the other directions are the same.
In this embodiment, the user's sight line may be detected by a camera installed on a volumetric video display device (such as a computer, a naked eye 3D display, etc.), and when the user's sight line is detected, the direction of the user's sight line is obtained, and the virtual viewing angle of the volumetric video is controlled in real time, so that volumetric videos with different angles are presented, and immersive viewing experience is achieved.
In the related art, the volume video may be viewed on various devices, such as a computer, a naked eye 3D (three-dimensional) display, etc., which a user may place on a vehicle, furniture, etc. However, when the user views the volume video on these devices, the viewing angle of the volume video can only be adjusted by manual operation of the user, for example, when the user views the dance volume video, the user usually views the volume video from the front, and the viewing angle is correct at this time, but if the user wants to view the left side of the dancer, that is, turn the viewing angle of the volume video to the left, the user needs to manually operate to adjust the viewing angle of the dancer volume video to see the left side of the dancer, which results in quite splitting the viewing experience and being very troublesome.
The method for displaying the volume video can detect the sight of the user, and automatically change the virtual viewing angle of the volume video according to the direction of the sight of the user, so that immersive experience that the user can view the volume video from different directions without manual adjustment is provided for the user, the sight based on the user is realized, and the volume video content angle corresponding to the sight direction is displayed.
Optionally, in this embodiment, the method for displaying a volumetric video may further include:
when the volume video is played, acquiring a target image containing the target user;
identifying, by the target image, location information of the target user relative to a video frame of the volumetric video;
and according to the position information, adjusting a virtual viewing angle corresponding to the virtual scene in the volume video, so that the target user views the volume video from the adjusted virtual viewing angle.
In this embodiment, the position of the user may be identified by the camera, and the virtual viewing angle of the volumetric video may be adjusted to a corresponding angle, so that the viewing angles of the volumetric video seen by the user at different positions may be different.
According to the position information of the target user relative to the video picture of the volume video, the distance and the direction of the target user relative to the video picture of the volume video can be judged, and the virtual viewing angle is adjusted according to the distance and the direction.
Optionally, in this embodiment, the virtual scene of the volumetric video includes at least one virtual object;
the method for displaying the volume video can further comprise the following steps:
Determining a target virtual object from the at least one virtual object according to the object view to be watched by the target user;
adjusting a virtual viewing angle corresponding to a virtual scene in the volume video to be a target viewing angle of the target virtual object in the virtual scene;
and updating a video picture showing the volume video, wherein the updated video picture comprises a virtual scene picture under the target visual angle of the target virtual object.
The virtual object may be a person, an animal, a plant, etc. in a video frame of the volumetric video, which is not limited in this embodiment.
The visual field corresponding to the visual angle of the target virtual object in the virtual scene is the visual field of the object to be watched by the target user.
Optionally, in this embodiment, the virtual scene of the volumetric video includes at least one virtual object; the method for displaying the volume video can further comprise the following steps:
determining a virtual scene picture of each virtual object in the virtual scene under the view angle;
and displaying the virtual scene pictures of each virtual object under the view angle in the video page corresponding to the volume video.
In this embodiment, when a plurality of virtual objects are included in a virtual scene corresponding to a volumetric video, virtual scene images under different virtual object (character) views can be simultaneously displayed on a display screen in a split screen mode, so that user experience can be improved.
In a specific scenario, the virtual scenario corresponding to the volume video includes a virtual character abc, and a user may choose to view the video from a view angle of a certain virtual character, or may choose to view the video from view angles of multiple virtual characters, for example, view the volume video from view angles of virtual character a and virtual character b. In particular, virtual scene pictures under different virtual character perspectives can be simultaneously displayed on the same screen, so that a user can feel video from different virtual character perspectives.
Specifically, the scheme can be applied to a virtual concert scene; in the virtual concert scene, the virtual roles can be virtual stars in the concert or any virtual audience, and the scheme can realize that the concert is experienced from different angles of the audience.
Optionally, in this embodiment, the method for displaying a volumetric video may further include:
detecting a target viewpoint position of the target user on a video picture of the volume video;
determining a target virtual scene picture from the virtual scene pictures under the view angles of all the virtual objects according to the target viewpoint positions;
and amplifying and displaying the target virtual scene picture in the video page, and reducing and displaying other virtual scene pictures except the target virtual scene picture.
The method comprises the steps of capturing a target viewpoint position of a user through an eyeball tracking technology, determining a virtual object corresponding to the target viewpoint position, and determining a virtual scene picture under the visual angle of the virtual object as a target virtual scene picture.
In this embodiment, the line of sight of the user may be captured, the view angle of which virtual character the user wants to view from may be determined according to the line of sight of the user, the virtual scene image under the view angle of the virtual character may be enlarged on the screen, and the virtual scene images under the view angles of other virtual characters may be reduced.
As can be seen from the above, the present embodiment can collect the human eye image of the target user currently watching the volume video when playing the volume video; analyzing the eye image in the direction of sight to determine the viewpoint position of the target user on the video picture of the volume video; and adjusting a virtual viewing angle corresponding to the virtual scene in the volume video according to the viewpoint position, so that the target user views the volume video from the adjusted virtual viewing angle. The virtual viewing angle of the volume video can be controlled based on the direction of the user's sight, so that the volume video with different angles is presented, and immersive viewing experience is brought to the user.
The method according to the previous embodiment will be described in further detail below with the specific integration of the volumetric video presentation device in the terminal.
The embodiment of the application provides a method for displaying a volume video, as shown in fig. 2, the specific flow of the method for displaying the volume video may be as follows:
201. and when the terminal plays the volume video, acquiring a human eye image of a target user currently watching the volume video.
The general flow of the production of the volumetric video may include three steps. The first step is shooting and acquisition, specifically, a performer enters a camera array system deployed according to a matrix, and data such as color information, material information, depth information and the like of the performer are shot and extracted through professional-level acquisition equipment such as an infrared IR camera, a 4K ultra-high-definition industrial camera and the like. The second step is material generation: after the data are acquired, the materials are uploaded to the cloud, and then a volume video (3D dynamic character model sequence) can be automatically generated by a cloud mobilizing algorithm. The third step is to use the volumetric video: the volume video can be placed into the virtual engine UE4/UE5/Unity 3D through the preset plug-in, and perfectly fused with the virtual scene or CG special effects, so that real-time rendering is supported.
The human eye image can be specifically an image acquired by a front camera of the intelligent terminal playing the volume video, the front camera can be camera hardware integrated on the intelligent terminal, and the hardware is located on the same side of a main display screen of the intelligent terminal. In some embodiments, an infrared LED (Light Emitting Diode ) may be used to illuminate the face of the target user and captured by an infrared camera to obtain a human eye image.
The human eye image may be an image containing eyes of the target user.
202. And the terminal analyzes the eye direction of the eye image and determines the viewpoint position of the target user on the video picture of the volume video.
Optionally, in this embodiment, the step of "analyzing the eye direction of the eye image to determine the viewpoint position of the target user on the video frame of the volumetric video" may include:
detecting a cornea reflection area and a pupil area in the human eye image;
determining a sight line direction of the target user according to the position relation between the cornea reflection area and the pupil area;
based on the gaze direction, a viewpoint position of the target user on a video picture of the volumetric video is determined.
Specifically, from an image of human eyes, a white circular portion of the eyes can be detected as a cornea reflection region, and a black circular portion of the eyes can be detected as a pupil region. And then determining the sight direction of the user according to the position relation between the cornea reflection area and the pupil area.
203. And the terminal adjusts the virtual viewing angle corresponding to the virtual scene in the volume video according to the viewpoint position to obtain a target virtual viewing angle.
The virtual viewing angle may be specifically understood as a presentation angle of a model corresponding to a virtual scene in the volumetric video. The target virtual scene picture may be a front view, i.e. a front view, of the model to which the virtual scene corresponds.
204. And the terminal determines the target angle at which the model corresponding to the virtual scene in the volume video needs to rotate based on the target virtual viewing angle.
205. And rendering and displaying a target virtual scene picture by the terminal so that the target user views the volume video from the adjusted virtual viewing angle, wherein the target virtual scene picture is a virtual scene picture corresponding to the virtual scene after the model corresponding to the virtual scene rotates by the target angle.
In a specific scene, when a target user looks at the front, the volume video is at a fixed viewing angle, and when the target user turns left, the sight line direction is left, and when the viewpoint position of the target user on the video picture of the volume video is moved left, the volume video is rotated right, so that a virtual viewing point (corresponding to the virtual viewing angle) is moved to the left side of the volume video; when the target user turns right, the sight line direction is right, the sight point position of the target user on the video picture of the volume video moves right, and the volume video rotates left, so that the virtual sight point moves to the right side of the volume video; the other directions are the same.
In this embodiment, the user's sight line may be detected by a camera installed on a volumetric video display device (such as a computer, a naked eye 3D display, etc.), and when the user's sight line is detected, the direction of the user's sight line is obtained, and the virtual viewing angle of the volumetric video is controlled in real time, so that volumetric videos with different angles are presented, and immersive viewing experience is achieved.
As can be seen from the above, in this embodiment, when a volume video is played, a terminal may collect an eye image of a target user currently watching the volume video; analyzing the eye image in the direction of sight to determine the viewpoint position of the target user on the video picture of the volume video; according to the viewpoint position, adjusting a virtual viewing angle corresponding to a virtual scene in the volume video to obtain a target virtual viewing angle; determining a target angle at which a model corresponding to a virtual scene in the volume video needs to rotate based on the target virtual viewing angle; rendering and displaying a target virtual scene picture so that the target user views the volume video from the adjusted virtual viewing angle, wherein the target virtual scene picture is a virtual scene picture corresponding to the model corresponding to the virtual scene after the model corresponding to the virtual scene rotates by the target angle. The virtual viewing angle of the volume video can be controlled based on the direction of the user's sight, so that the volume video with different angles is presented, and immersive viewing experience is brought to the user.
In order to better implement the above method, the embodiment of the present application further provides a volumetric video display device, as shown in fig. 3, where the volumetric video display device may include an acquisition unit 301, an analysis unit 302, and an adjustment unit 303, as follows:
(1) An acquisition unit 301;
and the acquisition unit is used for acquiring the human eye image of the target user currently watching the volume video when the volume video is played.
(2) An analysis unit 302;
and the analysis unit is used for analyzing the eye direction of the eye image and determining the viewpoint position of the target user on the video picture of the volume video.
Optionally, in some embodiments of the present application, the analysis unit may include a detection subunit, a first determination subunit, and a second determination subunit, as follows:
the detection subunit is used for detecting a cornea reflection area and a pupil area in the human eye image;
a first determination subunit configured to determine a line-of-sight direction of the target user according to a positional relationship between the cornea reflection area and the pupil area;
and a second determining subunit, configured to determine, based on the line of sight direction, a viewpoint position of the target user on a video picture of the volumetric video.
(3) An adjustment unit 303;
and the adjusting unit is used for adjusting the virtual viewing angle corresponding to the virtual scene in the volume video according to the viewpoint position so that the target user views the volume video from the adjusted virtual viewing angle.
Optionally, in some embodiments of the present application, the adjusting unit may include an adjusting subunit, a third determining subunit, and a display subunit, as follows:
the adjusting subunit is configured to adjust, according to the viewpoint position, a virtual viewing angle corresponding to a virtual scene in the volumetric video to obtain a target virtual viewing angle;
a third determining subunit, configured to determine, based on the target virtual viewing angle, a target angle at which a model corresponding to a virtual scene in the volumetric video needs to be rotated;
the display subunit is used for rendering and displaying a target virtual scene picture so that the target user can watch the volume video from the adjusted virtual watching view angle, wherein the target virtual scene picture is a virtual scene picture corresponding to the model corresponding to the virtual scene after the model corresponding to the virtual scene rotates by the target angle.
Optionally, in some embodiments of the present application, the volumetric video display apparatus may further include a first display unit, as follows:
The first display unit is used for collecting a target image containing the target user when the volume video is played; identifying, by the target image, location information of the target user relative to a video frame of the volumetric video; and according to the position information, adjusting a virtual viewing angle corresponding to the virtual scene in the volume video, so that the target user views the volume video from the adjusted virtual viewing angle.
Optionally, in some embodiments of the present application, the virtual scene of the volumetric video includes at least one virtual object;
the volumetric video presentation device may further comprise a second presentation unit, as follows:
the second display unit is used for determining a target virtual object from the at least one virtual object according to the object visual field to be watched by the target user; adjusting a virtual viewing angle corresponding to a virtual scene in the volume video to be a target viewing angle of the target virtual object in the virtual scene; and updating a video picture showing the volume video, wherein the updated video picture comprises a virtual scene picture under the target visual angle of the target virtual object.
Optionally, in some embodiments of the present application, the virtual scene of the volumetric video includes at least one virtual object; the volumetric video display apparatus may further comprise a third display unit, as follows:
the third display unit is configured to determine a virtual scene picture under a view angle of each virtual object in the virtual scene; and displaying the virtual scene pictures of each virtual object under the view angle in the video page corresponding to the volume video.
Optionally, in some embodiments of the present application, the volumetric video display apparatus may further include a fourth display unit, as follows:
the fourth display unit is used for detecting a target viewpoint position of the target user on a video picture of the volume video; determining a target virtual scene picture from the virtual scene pictures under the view angles of all the virtual objects according to the target viewpoint positions; and amplifying and displaying the target virtual scene picture in the video page, and reducing and displaying other virtual scene pictures except the target virtual scene picture.
As can be seen from the above, the present embodiment can collect, by the collection unit 301, an eye image of a target user currently watching a volume video when the volume video is played; analyzing the eye image in a sight line direction through an analysis unit 302, and determining the viewpoint position of the target user on a video picture of the volume video; and adjusting the virtual viewing angle corresponding to the virtual scene in the volume video according to the viewpoint position by the adjusting unit 303, so that the target user views the volume video from the adjusted virtual viewing angle. The virtual viewing angle of the volume video can be controlled based on the direction of the user's sight, so that the volume video with different angles is presented, and immersive viewing experience is brought to the user.
The embodiment of the application further provides an electronic device, as shown in fig. 4, which shows a schematic structural diagram of the electronic device according to the embodiment of the application, where the electronic device may be a terminal or a server, specifically:
the electronic device may include one or more processing cores 'processors 401, one or more computer-readable storage media's memory 402, power supply 403, and input unit 404, among other components. Those skilled in the art will appreciate that the electronic device structure shown in fig. 4 is not limiting of the electronic device and may include more or fewer components than shown, or may combine certain components, or may be arranged in different components. Wherein:
the processor 401 is a control center of the electronic device, connects various parts of the entire electronic device using various interfaces and lines, and performs various functions of the electronic device and processes data by running or executing software programs and/or modules stored in the memory 402, and calling data stored in the memory 402, thereby performing overall monitoring of the electronic device. Optionally, processor 401 may include one or more processing cores; preferably, the processor 401 may integrate an application processor and a modem processor, wherein the application processor mainly processes an operating system, a user interface, an application program, etc., and the modem processor mainly processes wireless communication. It will be appreciated that the modem processor described above may not be integrated into the processor 401.
The memory 402 may be used to store software programs and modules, and the processor 401 executes various functional applications and data processing by executing the software programs and modules stored in the memory 402. The memory 402 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data created according to the use of the electronic device, etc. In addition, memory 402 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 402 may also include a memory controller to provide the processor 401 with access to the memory 402.
The electronic device further comprises a power supply 403 for supplying power to the various components, preferably the power supply 403 may be logically connected to the processor 401 by a power management system, so that functions of managing charging, discharging, and power consumption are performed by the power management system. The power supply 403 may also include one or more of any of a direct current or alternating current power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.
The electronic device may further comprise an input unit 404, which input unit 404 may be used for receiving input digital or character information and generating keyboard, mouse, joystick, optical or trackball signal inputs in connection with user settings and function control.
Although not shown, the electronic device may further include a display unit or the like, which is not described herein. In particular, in this embodiment, the processor 401 in the electronic device loads executable files corresponding to the processes of one or more application programs into the memory 402 according to the following instructions, and the processor 401 executes the application programs stored in the memory 402, so as to implement various functions as follows:
when a volume video is played, acquiring a human eye image of a target user currently watching the volume video; analyzing the eye image in the direction of sight to determine the viewpoint position of the target user on the video picture of the volume video; and adjusting a virtual viewing angle corresponding to the virtual scene in the volume video according to the viewpoint position, so that the target user views the volume video from the adjusted virtual viewing angle.
The specific implementation of each operation above may be referred to the previous embodiments, and will not be described herein.
As can be seen from the above, the present embodiment can collect the human eye image of the target user currently watching the volume video when playing the volume video; analyzing the eye image in the direction of sight to determine the viewpoint position of the target user on the video picture of the volume video; and adjusting a virtual viewing angle corresponding to the virtual scene in the volume video according to the viewpoint position, so that the target user views the volume video from the adjusted virtual viewing angle. The virtual viewing angle of the volume video can be controlled based on the direction of the user's sight, so that the volume video with different angles is presented, and immersive viewing experience is brought to the user.
Those of ordinary skill in the art will appreciate that all or a portion of the steps of the various methods of the above embodiments may be performed by instructions, or by instructions controlling associated hardware, which may be stored in a computer-readable storage medium and loaded and executed by a processor.
To this end, embodiments of the present application provide a computer readable storage medium having stored therein a plurality of instructions capable of being loaded by a processor to perform steps in any of the volumetric video presentation methods provided by embodiments of the present application. For example, the instructions may perform the steps of:
When a volume video is played, acquiring a human eye image of a target user currently watching the volume video; analyzing the eye image in the direction of sight to determine the viewpoint position of the target user on the video picture of the volume video; and adjusting a virtual viewing angle corresponding to the virtual scene in the volume video according to the viewpoint position, so that the target user views the volume video from the adjusted virtual viewing angle.
The specific implementation of each operation above may be referred to the previous embodiments, and will not be described herein.
Wherein the computer-readable storage medium may comprise: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like.
Because the instructions stored in the computer readable storage medium can execute the steps in any of the volumetric video display methods provided in the embodiments of the present application, the beneficial effects that any of the volumetric video display methods provided in the embodiments of the present application can be achieved, and detailed descriptions of the foregoing embodiments are omitted herein.
According to one aspect of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions are read from a computer-readable storage medium by a processor of a computer device, and executed by the processor, cause the computer device to perform the methods provided in various alternative implementations of the volumetric video presentation aspect described above.
The foregoing has described in detail a method for displaying a video of a volume and related devices provided in the embodiments of the present application, and specific examples have been applied herein to illustrate the principles and embodiments of the present application, where the foregoing examples are provided only to assist in understanding the method of the present application and the core ideas thereof; meanwhile, those skilled in the art will have variations in the specific embodiments and application scope in light of the ideas of the present application, and the present description should not be construed as limiting the present application in view of the above.

Claims (11)

1. A method for volumetric video presentation, comprising:
when a volume video is played, acquiring a human eye image of a target user currently watching the volume video;
analyzing the eye image in the direction of sight to determine the viewpoint position of the target user on the video picture of the volume video;
and adjusting a virtual viewing angle corresponding to the virtual scene in the volume video according to the viewpoint position, so that the target user views the volume video from the adjusted virtual viewing angle.
2. The method of claim 1, wherein the performing gaze direction analysis on the human eye image to determine a viewpoint position of the target user on a video frame of the volumetric video comprises:
Detecting a cornea reflection area and a pupil area in the human eye image;
determining a sight line direction of the target user according to the position relation between the cornea reflection area and the pupil area;
based on the gaze direction, a viewpoint position of the target user on a video picture of the volumetric video is determined.
3. The method of claim 1, wherein adjusting the virtual viewing angle corresponding to the virtual scene in the volumetric video according to the viewpoint position to enable the target user to view the volumetric video from the adjusted virtual viewing angle comprises:
according to the viewpoint position, adjusting a virtual viewing angle corresponding to a virtual scene in the volume video to obtain a target virtual viewing angle;
determining a target angle at which a model corresponding to a virtual scene in the volume video needs to rotate based on the target virtual viewing angle;
rendering and displaying a target virtual scene picture so that the target user views the volume video from the adjusted virtual viewing angle, wherein the target virtual scene picture is a virtual scene picture corresponding to the model corresponding to the virtual scene after the model corresponding to the virtual scene rotates by the target angle.
4. The method according to claim 1, wherein the method further comprises:
when the volume video is played, acquiring a target image containing the target user;
identifying, by the target image, location information of the target user relative to a video frame of the volumetric video;
and according to the position information, adjusting a virtual viewing angle corresponding to the virtual scene in the volume video, so that the target user views the volume video from the adjusted virtual viewing angle.
5. The method of claim 1, wherein the virtual scene of the volumetric video includes at least one virtual object therein;
the method further comprises the steps of:
determining a target virtual object from the at least one virtual object according to the object view to be watched by the target user;
adjusting a virtual viewing angle corresponding to a virtual scene in the volume video to be a target viewing angle of the target virtual object in the virtual scene;
and updating a video picture showing the volume video, wherein the updated video picture comprises a virtual scene picture under the target visual angle of the target virtual object.
6. The method of claim 1, wherein the virtual scene of the volumetric video includes at least one virtual object therein; the method further comprises the steps of:
Determining a virtual scene picture of each virtual object in the virtual scene under the view angle;
and displaying the virtual scene pictures of each virtual object under the view angle in the video page corresponding to the volume video.
7. The method of claim 6, wherein the method further comprises:
detecting a target viewpoint position of the target user on a video picture of the volume video;
determining a target virtual scene picture from the virtual scene pictures under the view angles of all the virtual objects according to the target viewpoint positions;
and amplifying and displaying the target virtual scene picture in the video page, and reducing and displaying other virtual scene pictures except the target virtual scene picture.
8. A volumetric video presentation device, comprising:
the system comprises an acquisition unit, a display unit and a display unit, wherein the acquisition unit is used for acquiring a human eye image of a target user currently watching the volume video when the volume video is played;
the analysis unit is used for analyzing the eye direction of the eye image and determining the viewpoint position of the target user on the video picture of the volume video;
and the adjusting unit is used for adjusting the virtual viewing angle corresponding to the virtual scene in the volume video according to the viewpoint position so that the target user views the volume video from the adjusted virtual viewing angle.
9. An electronic device comprising a memory and a processor; the memory stores an application program, and the processor is configured to execute the application program in the memory to perform the operations in the volumetric video presentation method according to any one of claims 1 to 7.
10. A computer readable storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the steps in the volumetric video presentation method of any of claims 1 to 7.
11. A computer program product comprising a computer program or instructions which, when executed by a processor, carries out the steps in the volumetric video presentation method of any of claims 1 to 7.
CN202310074198.7A 2023-02-01 2023-02-01 Volumetric video display method and related equipment Pending CN116109974A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310074198.7A CN116109974A (en) 2023-02-01 2023-02-01 Volumetric video display method and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310074198.7A CN116109974A (en) 2023-02-01 2023-02-01 Volumetric video display method and related equipment

Publications (1)

Publication Number Publication Date
CN116109974A true CN116109974A (en) 2023-05-12

Family

ID=86266994

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310074198.7A Pending CN116109974A (en) 2023-02-01 2023-02-01 Volumetric video display method and related equipment

Country Status (1)

Country Link
CN (1) CN116109974A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117636701A (en) * 2023-10-19 2024-03-01 广州市信息技术职业学校 Capsule filling machine auxiliary training system based on virtual reality technology

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117636701A (en) * 2023-10-19 2024-03-01 广州市信息技术职业学校 Capsule filling machine auxiliary training system based on virtual reality technology

Similar Documents

Publication Publication Date Title
US10832039B2 (en) Facial expression detection method, device and system, facial expression driving method, device and system, and storage medium
WO2021093453A1 (en) Method for generating 3d expression base, voice interactive method, apparatus and medium
WO2020010979A1 (en) Method and apparatus for training model for recognizing key points of hand, and method and apparatus for recognizing key points of hand
EP3395066B1 (en) Depth map generation apparatus, method and non-transitory computer-readable medium therefor
CN113822977A (en) Image rendering method, device, equipment and storage medium
CN110363133B (en) Method, device, equipment and storage medium for sight line detection and video processing
WO2021052208A1 (en) Auxiliary photographing device for movement disorder disease analysis, control method and apparatus
CN112927362A (en) Map reconstruction method and device, computer readable medium and electronic device
CN112598780B (en) Instance object model construction method and device, readable medium and electronic equipment
CN113709543A (en) Video processing method and device based on virtual reality, electronic equipment and medium
CN117036583A (en) Video generation method, device, storage medium and computer equipment
CN116109974A (en) Volumetric video display method and related equipment
CN112288876A (en) Long-distance AR identification server and system
CN115442519B (en) Video processing method, apparatus and computer readable storage medium
CN116095353A (en) Live broadcast method and device based on volume video, electronic equipment and storage medium
CN114898447B (en) Personalized fixation point detection method and device based on self-attention mechanism
CN116129526A (en) Method and device for controlling photographing, electronic equipment and storage medium
WO2024027063A1 (en) Livestream method and apparatus, storage medium, electronic device and product
CN115546408A (en) Model simplifying method and device, storage medium, electronic equipment and product
CN112866559B (en) Image acquisition method, device, system and storage medium
CN113673567A (en) Panorama emotion recognition method and system based on multi-angle subregion self-adaption
Jian et al. Realistic face animation generation from videos
US20240020901A1 (en) Method and application for animating computer generated images
US20240048780A1 (en) Live broadcast method, device, storage medium, electronic equipment and product
CN115497029A (en) Video processing method, device and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination