CN113178019B

CN113178019B - Indication information identification method, system and storage medium based on video content

Info

Publication number: CN113178019B
Application number: CN202110642375.8A
Authority: CN
Inventors: 徐异凌; 管云峰; 柳宁
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2018-07-09
Filing date: 2018-07-20
Publication date: 2023-01-03
Anticipated expiration: 2038-07-20
Also published as: CN110706355A; CN110706355B; CN113178019A

Abstract

The invention provides an indication information identification method based on video content, which comprises the following steps: a video type judging step: judging whether the video is three-degree-of-freedom plus video; and (3) constraint establishment step: and establishing constraint on the three degrees of freedom plus the video. The invention can restrain the 3DoF + video and carry out information identification on the 3DoF + content of the video media, the identification information indicates the specific 3DoF + restraint information of the part of video, and in specific applications, such as virtual navigation, virtual stadium and the like, the identification information provided by the invention can be further used for processing and presenting client application or service.

Description

Indication information identification method, system and storage medium based on video content

Technical Field

The invention relates to the technical field of virtual reality, in particular to an indication information identification method based on video content, and particularly relates to an indication information identification method, system and storage medium based on six degrees of freedom and used for a specific application scene during video presentation and consumption.

Background

With the rapid development of Virtual Reality (VR) technology, demand for VR systems is increasing, and technically, the development from three degrees of Freedom (3 Degree of Freedom,3 DoF) to three degrees of Freedom plus (3 Degree of Freedom +,3DoF +) to six degrees of Freedom (6 Degree of Freedom,6 DoF) is being implemented, and 6DoF tracking technology makes interaction in the Virtual Reality world possible by allowing a user to move in the VR space to create an immersive experience. The 3DoF supports the user's head to make three rotations, yaw, roll, pitch (i.e., yaw, roll, pitch) as shown in fig. 1. The 3DoF + supports the head of the user to perform small-range translational motion in six directions of the X axis, the Y axis and the Z axis, namely up, down, left, right, front and back on the basis of the 3DoF, as shown in figure 2. The 6DoF can not only perform three rotations of Yaw, roll and Pitch like the 3DoF, but also track the translation of the user on the X, Y and Z axes. There are three main definitions of 6DoF currently defined, namely window 6DoF, omnidirectional 6DoF and full 6DoF, as shown in fig. 3, 4 and 5. For the window 6DoF, the head of the user can rotate within the restricted range of Yaw and Pitch, and the unlimited range of Roll, and the user can perform translational motion within the restricted range of the forward X-axis and the unrestricted ranges of other five directions; for the omnidirectional 6DoF, the head of the user can rotate within the unlimited ranges of Yaw, roll and Pitch, and the user can perform limited translation motion within the ranges of six directions of up, down, left, right, front and back of an X axis, a Y axis and a Z axis; for the complete 6DoF, the head of the user can rotate without limitation by Yaw, roll and Pitch, and the user can perform translational motion without limitation in the range of six directions of up, down, left, right, front and back of X, Y and Z axes. In order to meet the requirements of different application scenarios, additional identification of the belonging 3DoF + and 6DoF information of the video content is required to meet the further indication of the specific information.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a method and a system for indicating information identification based on video content.

The method for identifying the indication information based on the video content comprises the following steps:

a video type judging step: judging whether the video is added with three degrees of freedom or not, and establishing a constraint: establishing constraint on the three degrees of freedom plus the video;

further, establishing constraints on the three degrees of freedom plus the video means establishing constraints on the movement of the head of the viewer on the three degrees of freedom plus the video;

further, the method for establishing the constraint comprises the following steps: respectively setting a maximum value and a minimum value on an x axis, a y axis and a z axis;

further, the fields for setting the maximum value and the minimum value on the x-axis, the y-axis, and the z-axis respectively are: HXmax, HXmin, HYmax, HYmin, HZmax, HZmin, the field represents depth information of a virtual scene, and the virtual scene is a scene presented in virtual reality VR.

The invention also provides an indication information identification system based on video content, which comprises:

the video type judging module: judging whether the video is added in three degrees of freedom, and a constraint establishing module: establishing constraint on the three degrees of freedom plus the video;

further, the establishing of the constraint on the three degrees of freedom plus the video means that the constraint on the head movement of the viewer is established on the three degrees of freedom plus the video;

further, the fields for setting the maximum value and the minimum value on the x-axis, the y-axis, and the z-axis respectively are: HXmax, HXmin, HYmax, HYmin, HZmax, HZmin, the field representing depth information of the virtual scene; the virtual scene is a scene presented in a virtual reality VR.

Compared with the prior art, the invention has the following beneficial effects:

1. the invention can classify and constrain videos such as 3DoF +, 6DoF and the like, and carry out information identification on the contents such as 3DoF +, 6DoF and the like of the video media, wherein the identification information indicates specific 3DoF +, 6DoF type information and constraint information of the part of videos.

2. In specific applications, such as virtual navigation, virtual stadium, etc., the identification information provided by the present invention may be further used for processing and presentation of client applications or services.

Drawings

Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:

FIG. 1 is a schematic view of 3DoF degrees of freedom;

FIG. 2 is a schematic diagram of 3DoF + degrees of freedom;

FIG. 3 is a diagram of DOF degrees of freedom for window 6;

FIG. 4 is a schematic view of an omnidirectional 6DoF degree of freedom;

FIG. 5 is a schematic view of a full 6DoF degree of freedom;

FIG. 6 is a logic flow diagram of a method for indicating information identification based on video content;

FIG. 7 is an organization of VR information in a preferred embodiment;

FIG. 8 is a logic flow diagram of a method for identifying indication information based on video content in response to video source and various video constraints.

Detailed Description

The present invention will be described in detail with reference to specific examples. The following examples will aid those skilled in the art in further understanding the present invention, but are not intended to limit the invention in any manner. It should be noted that variations and modifications can be made by persons skilled in the art without departing from the concept of the invention. All falling within the scope of the present invention.

In the description of the present invention, it is to be understood that the terms "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", etc. indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the device or element referred to must have a particular orientation, be constructed in a particular orientation, and be operated, and thus, are not to be construed as limiting the present invention.

As shown in fig. 6, the method for identifying indication information based on video content provided by the present invention includes the following steps: video storage step: storing the video on a server side; analyzing and shaping: analyzing the video, and setting a video parent type according to an analysis result; and an attribute setting step: and setting video attributes according to the video parent type.

In the analyzing and sizing step, the video is analyzed according to the following factors: whether the video content changes with the translation; whether the video content changes with the rotation. The parent type of video contains the following: a non-panoramic video; three-degree-of-freedom video; adding video in three degrees of freedom; six degree of freedom video. The attribute setting step comprises: setting the attribute of the non-panoramic video to common; setting the attribute of the three-degree-of-freedom video as 3DoF; setting the attribute of the three degrees of freedom plus the video as 3DoF +; the attribute of the six-degree-of-freedom video is set to 6DoF.

The method for identifying the indication information based on the video content further comprises the following steps: setting a subtype with six degrees of freedom: setting a video subtype of a six degree of freedom video, the subtype comprising the following: a window six-degree-of-freedom video; omnidirectional six-degree-of-freedom video; full six-degree-of-freedom video. In the label setting step: setting a label 0x01 for a window six-degree-of-freedom video; setting a label 0x02 for the omnidirectional six-degree-of-freedom video; the label 0x00 is set for a full six-degree-of-freedom video. Preferably, the video content-based indication information identification method further comprises a constraint establishing step of: for any of a plurality of video types: the method comprises the following steps of establishing any one or more of the following constraints: a viewer head rotation constraint, a viewer head translation constraint, a viewer body translation constraint. In practical applications, the viewer body translation constraint corresponds to the viewer foot movement constraint.

Correspondingly, the invention also provides an indication information identification system based on the video content, which comprises the following modules: the video storage module: storing the video on a server side; analyzing and shaping module: analyzing the video, and setting a video parent type according to an analysis result; an attribute setting module: and setting video attributes according to the video parent type.

In the analysis and shaping module, the video is analyzed according to the following factors: whether the video content changes with the translation; whether the video content changes with the rotation. The parent type of video contains the following: a non-panoramic video; three-degree-of-freedom video; adding video in three degrees of freedom; six degree of freedom video. In the attribute setting module: setting the attribute of the non-panoramic video to common; setting the attribute of the three-degree-of-freedom video as 3DoF; setting the attribute of the three degrees of freedom plus the video as 3DoF +; the attribute of the six-degree-of-freedom video is set to 6DoF.

The video content-based indication information identification system further comprises the following modules: the six-degree-of-freedom subtype setting module: setting a video subtype of a six degree of freedom video, the subtype comprising the following: a window six-degree-of-freedom video; omnidirectional six-degree-of-freedom video; complete six-degree-of-freedom video. In the label setting module: setting a label 0x01 for a window six-degree-of-freedom video; setting a label 0x02 for the omnidirectional six-degree-of-freedom video; the label 0x00 is set for a full six-degree-of-freedom video. Preferably, the video content-based indication information identification system further comprises a constraint building module: for any of a plurality of video types: the method comprises the following steps of establishing any one or more of the following constraints: a viewer head rotation constraint, a viewer head translation constraint, a viewer body translation constraint. In practical application, the body translation constraint of the viewer corresponds to the foot movement constraint of the viewer.

The present invention also provides a computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the above-described method for identifying indication information based on video content.

The preferred embodiment:

the invention is applied in a specific protocol, taking MMT transmission signaling in an OMAF standard as an example, the following fields can be reasonably added according to requirements:

6DoF _type: and indicating the label type of the 6DoF to which the video content of the area belongs, wherein the values and meanings of the label type are shown in the following table.

Value taking	Description of the invention
		0x00	Complete 6DoF
0x01	Window 6DoF
		0x02	Omnidirectional 6DoF
0x03～0xFF	The part is a reserved field

The constraints added to different video types may be as follows:

3DoF+{

HXmax

HXmin

HYmax

HYmin

HZmax

HZmin

}

window 6DoF

Xmax

Yawmax

Yawmin

Pitchmax

Pitchmin

}

Omnidirectional 6DoF

Xmax

Xmin

Ymax

Ymin

Zmax

Zmin

}

HXmax, HXmin, HYmax, HYmin, HZmax, HZmin indicate a maximum value of head movement of the viewer on the x-axis, a minimum value of head movement on the x-axis, a maximum value of head movement on the y-axis, a minimum value of head movement on the y-axis, a maximum value of head movement on the z-axis, a minimum value of head movement on the z-axis, respectively;

xmax, xmin, ymax, ymin, zmax, zmin indicate the maximum value of the movement of the steps of the viewer on the x-axis, the minimum value of the movement of the steps on the x-axis, the maximum value of the movement of the steps on the y-axis, the minimum value of the movement of the steps on the y-axis, the maximum value of the movement of the steps on the z-axis, and the minimum value of the movement of the steps on the z-axis, respectively;

yawmax, yawmin, pitchmax, pitchmin indicate the maximum value of head rotation on yaw, the minimum value of head rotation on yaw, the maximum value of head rotation on pitch, the minimum value of head rotation on pitch, respectively, for the viewer.

Based on the above Information, fig. 7 gives an organization structure of VR Asset Information descriptor in MMT transmission signaling in the OMAF standard for these Information.

Virtual stadium applications of application instances:

in panoramic video applications, although a 360-degree view angle range is included, the control of the user side for switching the viewing direction is limited, and is often limited to consuming the prepared 360-degree panoramic content, and the view of the user does not change along with the movement of the user. The indication information provided by the invention can identify the 6DoF video content associated label, so that the user preference is combined, an immersive feeling is provided for people through a 3D display screen and a surround sound system, the video is positioned to a corresponding visual angle along with the movement of the user, and the video of the part is presented to the user.

In particular, by positioning multiple sensors to capture multiple views of a live event, the cameras can be fixed at the perimeter of the stadium or along tracks that allow the cameras to move to capture multiple views of the scene, even taking a photograph overhead using a helicopter. When the user consumes the panoramic video, the media content is presented to the user by locating the tag with 6DoF _, type 0X00 and according to the corresponding restrictions in 6DoF for three rotations Yaw, roll, pitch and X, Y, Z axis translations.

In summary, in many video media applications, people pay more attention to the immersion of users, the interaction between users and the environment and among users, and the like, for example, in virtual navigation, by setting a window 6DoF with a limited forward range of an X axis for a user, a better audio-visual experience can be obtained than that of a traditional VR video; in a virtual stadium, the same game is experienced by using another empty stadium by using the 6DoF technology without limitation, so that a very good immersion experience is brought to the user, which cannot be achieved by the traditional three-degree-of-freedom.

It is known to those skilled in the art that, in addition to implementing the system, apparatus and its various modules provided by the present invention in pure computer readable program code, the system, apparatus and its various modules provided by the present invention can be implemented in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like by completely programming the method steps. Therefore, the system, the device and the modules thereof provided by the present invention can be considered as a hardware component, and the modules included in the system, the device and the modules thereof for implementing various programs can also be considered as structures in the hardware component; modules for performing various functions may also be considered to be both software programs for performing the methods and structures within hardware components.

The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes and modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims

1. A method for indicating information identification based on video content is characterized by comprising the following steps:

a video type judging step: judging whether the video is a three-degree-of-freedom video;

and (3) constraint establishment step: establishing constraint on the three degrees of freedom plus the video;

the three-degree-of-freedom plus video building constraint is that the three-degree-of-freedom plus video building constraint on the head movement of a viewer;

the method for establishing the constraint comprises the following steps: respectively setting a maximum value and a minimum value on an x axis, a y axis and a z axis;

the fields for respectively setting the maximum value and the minimum value on the x axis, the y axis and the z axis are as follows: HXmax, HXmin, HYmax, HYmin, HZmax and HZmin, wherein the field represents the depth information of the virtual scene; the virtual scene is a scene presented in a virtual reality VR.

2. An indication information identification system based on video content, comprising:

a video type judging module: judging whether the video is three-degree-of-freedom plus video;

a constraint establishing module: establishing constraint on the three degrees of freedom plus the video;