CN109257584B - User watching viewpoint sequence prediction method for 360-degree video transmission - Google Patents

User watching viewpoint sequence prediction method for 360-degree video transmission Download PDF

Info

Publication number
CN109257584B
CN109257584B CN201810886661.7A CN201810886661A CN109257584B CN 109257584 B CN109257584 B CN 109257584B CN 201810886661 A CN201810886661 A CN 201810886661A CN 109257584 B CN109257584 B CN 109257584B
Authority
CN
China
Prior art keywords
viewpoint
sequence
user
future
view
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810886661.7A
Other languages
Chinese (zh)
Other versions
CN109257584A (en
Inventor
邹君妮
杨琴
刘昕
李成林
熊红凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN201810886661.7A priority Critical patent/CN109257584B/en
Publication of CN109257584A publication Critical patent/CN109257584A/en
Application granted granted Critical
Publication of CN109257584B publication Critical patent/CN109257584B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a method for predicting a view point sequence watched by a user in 360-degree video transmission, which comprises the following steps: using the viewpoint positions of the past time of a user as the input of a viewpoint sequence prediction model, predicting the viewpoint positions of a plurality of future times through the viewpoint sequence prediction model, wherein the viewpoint positions of the plurality of future times form a first viewpoint sequence; using video content as input of a viewpoint tracking model through the viewpoint tracking model, and predicting viewpoint positions at a plurality of moments in the future through the viewpoint tracking model, wherein the viewpoint positions at the plurality of moments in the future form a second viewpoint sequence; and determining a future viewing viewpoint sequence of the user by combining the first viewpoint sequence and the second viewpoint sequence. The prediction method in the invention has good practicability and expansibility, and can change the sequence length of the prediction viewpoint according to the head movement speed of the user.

Description

User watching viewpoint sequence prediction method for 360-degree video transmission
Technical Field
The invention relates to the technical field of video communication, in particular to a user watching viewpoint sequence prediction method for 360-degree video transmission.
Background
Compared with the traditional video, the 360-degree video captures each azimuth scene of the real world by adopting an omnibearing camera and splices the scenes to form a panoramic image. When watching 360-degree videos, the user can freely rotate the head to adjust the watching visual angle, and immersive experience is obtained. However, 360 degree video has ultra-high resolution, and the bandwidth consumed for transmitting the complete 360 degree video is up to 6 times that of the conventional video. In situations where network bandwidth resources are limited, especially for mobile networks, it is difficult to transmit a full 360 degree video.
Limited by the field of view area of the head mounted display, the user can only view a portion of the 360 degree video at each moment. Therefore, bandwidth can be more effectively utilized by selecting a video area in which the user is interested for transmission according to the head movement of the user. From the acquisition of the user's demand information and the feedback of this information to the server side, the user experiences a Round-Trip Time (RTT) from the user to the server until the user receives the video content. The user may have moved head position during this time period, resulting in the content received by the user no longer being a part of his interest. In order to avoid transmission delay caused by RTT delay, it is necessary to predict the viewpoint of the user.
From the search of the prior art, it is found that, in order to realize the viewpoint prediction of the user, a common method is to infer the viewpoint position at a future time by using the viewpoint positions at past times. An article entitled "Shooting a moving target: Motion-prediction-based transmission for 360-degree video" is published in an IEEE International conference on Big Data conference by Bao et al, and the article proposes a simple model directly taking the viewpoint position at the current moment as the viewpoint position at the future moment, and three regression models for performing regression analysis on the change relationship of the viewpoint position of the user along with the time by adopting linear regression and feedforward neural networks to predict the viewpoint position at the future moment. However, factors such as occupation, age, gender, preference and the like of a user can influence the region of interest of the user for 360-degree video, the relation between the viewpoint position at the future time and the viewpoint position at the past time can be characterized as a nonlinear and long-time dependence relation, and three prediction models provided by the article can only predict a single viewpoint position and cannot predict viewpoint positions at multiple times in the future.
Through retrieval, the article entitled "Predicting head objects in 360virtual reality video is published by a.d. aladagli et al in International Conference on 3 similarity, 2018, pp.1-6, which considers the influence of video content on the viewpoint position of a user, predicts the saliency region of a video based on a saliency algorithm, and predicts the viewpoint position of the user accordingly. However, this article does not consider the influence of the past time viewpoint position on the viewing viewpoint.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a user watching viewpoint sequence prediction method for 360-degree video transmission.
The invention provides a method for predicting a view point sequence watched by a user in 360-degree video transmission, which comprises the following steps:
using the viewpoint positions of the past time of a user as the input of a viewpoint sequence prediction model, predicting the viewpoint positions of a plurality of future times through the viewpoint sequence prediction model, wherein the viewpoint positions of the plurality of future times form a first viewpoint sequence;
using video content as input of a viewpoint tracking model through the viewpoint tracking model, and predicting viewpoint positions at a plurality of moments in the future through the viewpoint tracking model, wherein the viewpoint positions at the plurality of moments in the future form a second viewpoint sequence;
and determining a future viewing viewpoint sequence of the user by combining the first viewpoint sequence and the second viewpoint sequence.
Optionally, before the viewpoint positions of the past time of the user are used as the input of the viewpoint sequence prediction model, and the viewpoint positions of a plurality of time in the future are predicted by the viewpoint sequence prediction model, the method further includes:
constructing a viewpoint sequence prediction model based on a recurrent neural network; the viewpoint sequence prediction model is used for encoding the input viewpoint positions and then inputting the encoded viewpoint positions into a cyclic neural network, calculating values of a hiding unit and an output unit, learning long-time dependency relations among viewing viewpoints of a user at different moments, and outputting viewpoint positions of multiple moments in the future; the viewpoint positions include: unit circle projection of a pitch angle, a yaw angle and a rolling angle, wherein the change range of the viewpoint position is-1 to 1; a hyperbolic tangent function is employed as an activation function of an output unit, the activation function defining an output range of the viewpoint position.
Optionally, the using the viewpoint positions of the past time of the user as an input of a viewpoint sequence prediction model, and predicting the viewpoint positions of a plurality of time in the future by using the viewpoint sequence prediction model includes:
taking the viewpoint position of the user at the current moment as the input of the first iteration of the viewpoint sequence prediction model to obtain the predicted viewpoint position of the first iteration;
and circularly taking the predicted viewpoint position of the previous iteration as the input of the next iteration of the viewpoint sequence prediction model to obtain the predicted viewpoint positions of a plurality of moments in the future.
Optionally, the length of the first viewpoint sequence is related to the head movement speed when viewed by the user, and the slower the head movement speed of the user is, the longer the length of the corresponding first viewpoint sequence is; the faster the head movement speed of the user is, the shorter the length of the corresponding first view sequence is.
Optionally, before the video content is used as an input of the viewpoint tracking model through the viewpoint tracking model, and the viewpoint positions at a plurality of time points in the future are predicted through the viewpoint tracking model, the method further includes:
constructing a viewpoint tracking model according to a filter algorithm related to target tracking, wherein the related filter algorithm is as follows: setting a relevant filter, wherein the relevant filter forms a maximum response value for a video area at the position of the viewpoint.
Optionally, the taking the video content as an input of the viewpoint tracking model through a viewpoint tracking model, and predicting the viewpoint positions at a plurality of time points in the future through the viewpoint tracking model includes:
projecting a spherical image of a 360-degree video frame at a future moment into a planar image by adopting an equidistant cylindrical projection mode;
determining a boundary frame in the plane image through the viewpoint tracking model, wherein the area in the boundary frame is a viewpoint area, and determining a corresponding viewpoint position according to the viewpoint area.
Optionally, the determining a future viewing viewpoint sequence of the user by combining the first viewpoint sequence and the second viewpoint sequence includes:
for the position and position of the viewpoint in the first viewpoint sequenceSetting different weight values w for the viewpoint positions in the second viewpoint sequence1And w2(ii) a And weight w1And w2Satisfy w1+w21 is ═ 1; wherein: weight value w1And w2The setting of (2) needs to meet the principle of minimum error between the predicted future viewing viewpoint position of the user and the actual viewing viewpoint position of the user;
according to the weight value w1And w2Calculating a future viewing viewpoint sequence of the user according to the viewpoint position in the first viewpoint sequence and the viewpoint position in the second viewpoint sequence; the calculation formula is as follows:
Figure BDA0001755825660000031
wherein:
Figure BDA0001755825660000032
from time t +1 to t + twTemporal user future viewing viewpoint position, w1Is a weight value of the first view sequence,
Figure BDA0001755825660000033
from time t +1 to t + twPosition of a view in a temporal first view sequence, w2Is a weight value of the second view sequence,
Figure BDA0001755825660000034
from time t +1 to t + twThe view position in the second view sequence at time instant ⊙ represents an element-by-element multiplication, t being the current time instant, twIs a predicted time window.
Optionally, the weight w of the second view sequence predicted by the view tracking model increases with the prediction time instant2Gradually decreases.
Compared with the prior art, the invention has the following beneficial effects:
the method for predicting the view point sequence of the user watching in 360-degree video transmission combines a recurrent neural network to learn the long-time dependence relationship among the watching view points of the user at different moments, and predicts the view point positions of a plurality of moments in the future based on the user view point positions at the past moments; meanwhile, the influence of video content on viewing viewpoints is considered, and future viewpoint sequences are predicted based on the video content; and finally, the influence of the cyclic neural network and the video content on the viewing viewpoint is synthesized to obtain a future viewing viewpoint sequence of the user, the length of the predicted viewpoint sequence can be changed according to the head movement speed of the user, and the method has good practicability and expansibility.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
fig. 1 is a system block diagram of a method for predicting a view sequence of a user viewing using 360-degree video transmission according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a viewpoint area according to an embodiment of the present invention.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that variations and modifications can be made by persons skilled in the art without departing from the spirit of the invention. All falling within the scope of the present invention.
Fig. 1 is a system block diagram of a method for predicting a view sequence of a user by using 360-degree video transmission according to an embodiment of the present invention, as shown in fig. 1, including: a viewpoint predicting module based on a recurrent neural network, a viewpoint tracking module based on a relevant filter and a fusing module, wherein: the cyclic neural network viewpoint predicting module is used for predicting viewpoint positions of a plurality of moments in the future based on the viewpoint positions of the user at the past moment by combining the long-time dependency relationship between the watching viewpoints of the user at different moments learned by the cyclic neural network; the related filter viewpoint tracking module considers the influence of video content on viewing viewpoints, provides a viewpoint tracking module based on a related filter, explores the relation between the video content and viewpoint sequences, and predicts future viewpoint sequences based on the video content; and the fusion module combines the prediction results of the circular neural network viewpoint prediction module and the related filter viewpoint tracking module, the advantages of the two modules are complementary, and the prediction accuracy of the model is improved.
In the embodiment, the long-time dependency relationship among the viewing viewpoints of the user at different moments is learned by combining a recurrent neural network, and the viewpoint positions of a plurality of moments in the future are predicted based on the viewpoint positions of the user at the past moments; meanwhile, the influence of video content on viewing viewpoints is considered, a viewpoint tracking module based on a relevant filter is provided, the relation between the video content and a viewpoint sequence is explored, and a future viewpoint sequence is predicted based on the video content; and finally, the prediction results of the circular neural network viewpoint prediction module and the related filter viewpoint tracking module are combined through the fusion module, the advantages of the two modules are complementary, and the prediction accuracy of the model is improved. The viewpoint sequence prediction structure provided by the invention can change the sequence length of the predicted viewpoint according to the head movement speed of the user, has good practicability and expansibility, and lays a solid foundation for the high-efficiency transmission of 360-degree videos.
Specifically, in the present embodiment, the predicted viewpoint positions are actually the predicted pitch angle (θ) and yaw angle
Figure BDA0001755825660000051
And a unit circle projection position of a roll angle (psi), wherein a pitch angle (theta), a yaw angle (phi) are predicted
Figure BDA0001755825660000052
And a roll angle (psi) corresponding to an angle at which the user's head is rotated about the X, Y and Z axes. Fig. 2 is a schematic view of a viewpoint area according to an embodiment of the present invention, referring to fig. 2; these three angles, which define the initial position of the user's head, are all 0 degrees, each varying between-180 ° and 180 °. These three angles determine a unique viewpoint position for a user viewing video with a head mounted display, and experiments show that the yaw angle when the user turns his head
Figure BDA00017558256600000523
The change is most pronounced with respect to the other two angles and is therefore most difficult to predict.
In this example, the main focus is on yaw angle
Figure BDA0001755825660000053
The proposed system architecture can be extended directly to the prediction of the other two corners. According to the angle definition, -180 ° and 179 ° differ by 1 ° instead of 359 °, in order to avoid this problem, it is first necessary to transform the prediction angle, using
Figure BDA0001755825660000054
As an input, wherein
Figure BDA0001755825660000055
Before outputting the prediction result, the predicted V istIs inversely transformed, wherein
Figure BDA0001755825660000056
Wherein, VtYaw angle at time t
Figure BDA0001755825660000057
An output vector obtained after the transformation of the g function,
Figure BDA0001755825660000058
yaw angle at time t
Figure BDA0001755825660000059
The value of the sine of (c) is,
Figure BDA00017558256600000510
yaw angle at time t
Figure BDA00017558256600000511
The cosine value of (a) of (b),
Figure BDA00017558256600000512
for yaw angle at time t
Figure BDA00017558256600000513
Is transformed.
In this embodiment, the cyclic neural network viewpoint predicting module uses the viewpoint position at the current time
Figure BDA00017558256600000514
Predicting yaw angles at a plurality of times as input
Figure BDA00017558256600000515
Wherein t iswIs a predicted time window for the time in which,
Figure BDA00017558256600000516
the value of the yaw angle at time t +1,
Figure BDA00017558256600000517
is t + twThe value of the yaw angle at that moment. If the user's head moves slowly, a larger prediction time window t may be selectedwWhereas the prediction time window needs to be set to a smaller value. During the training process, for each time step i, will
Figure BDA00017558256600000518
Encoding into 128-dimensional vector xiI ranges from t to t + tw-1. Then x is putiInputting the data into a recurrent neural network, and calculating a hidden unit hiAnd an output unit
Figure BDA00017558256600000519
From t to t + tw-1 at each time step, applying to the following update equation:
Figure BDA00017558256600000520
hi=σ2(Whxxi+Whhhi-1+bh) (2)
yi=Wohhi+bo(3)
Figure BDA00017558256600000521
wherein, WxvTo make a yaw angle
Figure BDA00017558256600000522
Encoding into 128-dimensional vector xiWeight matrix of the process, WhxIs an input unit xiTo a hidden unit hiWeight matrix of connections, WhhHidden unit h at time i-1i-1Hidden unit h to moment iiWeight matrix of connections, WohFor hiding the unit hiTo the output unit oiWeight matrix of connections, bxAs a bias vector for the encoding process, bhFor computing hidden units hiOffset vector of boFor calculating the output unit oiThe offset vector of (2). In the test process, the viewpoint position at the current moment is adopted
Figure BDA0001755825660000061
As input for the first iteration, for other time steps, the prediction result of the last iteration is used as input for the next iteration, i.e.
Figure BDA0001755825660000062
σ1And σ2Is an activation function, where σ1Is a linear rectification function, σ2Is a hyperbolic tangent function;
Figure BDA0001755825660000063
the predicted value of the user viewpoint position at the time of i +1,
Figure BDA0001755825660000064
for yaw angle at time i
Figure BDA00017558256600000613
The function of hi-1Hidden unit at time i-1, g-1(yi) For the output result yiInverse transformation of the g function of (c).
In this embodiment, the relevant filter viewpoint tracking module designs a relevant filter having a maximum response to a region where a viewpoint is located according to a target tracking relevant filter algorithm, and adopts a future-time 360-degree video frame
Figure BDA0001755825660000065
As input, making predictions of the location of the view point based on the video content; wherein: ft+1A 360 degree video frame at time t +1,
Figure BDA0001755825660000066
is t + twA 360 degree video frame at time. Since the target tracking correlation filter algorithm is mainly used for tracking a specific object in a video, a tracked viewpoint is more abstract than the specific object in the embodiment. Therefore, it is necessary to project the spherical image of the 360-degree video frame into a planar image by using an equidistant cylindrical projection method, and reposition the region corresponding to the viewpoint on the planar image. For the projected plane image, the image content near the pole is expanded horizontally, and the corresponding area corresponding to the viewpoint is no longer rectangular, so a bounding box is set around the viewpoint, and the size and shape of the viewpoint area are redefined. Thus, the bounding box of the viewpoint can be predicted based on the video content, thereby predicting the viewpoint position
Figure BDA0001755825660000067
In this embodiment, the prediction results of the cyclic neural network viewpoint prediction module and the relevant filter viewpoint tracking module are combined, and different weights are given to obtain the final prediction result, that is, the final prediction result is obtained
Figure BDA0001755825660000068
Figure BDA0001755825660000069
Wherein,
Figure BDA00017558256600000610
is the finalThe result of the prediction is that,
Figure BDA00017558256600000611
and
Figure BDA00017558256600000612
respectively, predicting results of a circular neural network viewpoint predicting module and a related filter viewpoint tracking module, ⊙ and element-by-element multiplication, and weight w1And w2Satisfy w1+w2A weight value that minimizes the error of the final viewpoint position prediction value is used as 1. The correlation filter viewpoint tracking module cannot update the filter, the difference between the viewpoint estimation value and the true value gradually increases along with the accumulation of errors, and the weight of the prediction result of the correlation filter viewpoint tracking is gradually reduced for a large prediction window. The advantages of the viewpoint sequence prediction module based on the recurrent neural network and the advantages of the viewpoint tracking system module based on the relevant filter are complemented, and the prediction accuracy is improved.
The key parameters in this embodiment are set as follows: the experimental Data comes from Y.Bao et al, who published an article entitled "Shooting a moving target: Motion-prediction-based transmission for 360-degree video" in the Conference of IEEEInternational Conference on Big Data, and the Data collected head Motion information when 153 volunteers watched 16 segments of 360-degree video, and some volunteers only watched part of the video, and collected 985 watching samples in total. In the data preprocessing of the present embodiment, each viewing sample is sampled 10 times per second, and 289 motion data are recorded for each viewing sample, so that 285665 motion data are obtained. 80% of the athletic data was used as the training set and 20% of the athletic data was used as the test set. For the recurrent neural network module, the hidden unit size is set to 256, and the momentum and weight attenuation are set to 0.8 and 0.999 respectively by using Adam (adaptive moment estimation) optimization method. The batch size is 128, training 500 cycles total. The learning rate linearly decays from 0.001 to 0.0001 during the first 250 cycles of training. For the relevant filter viewpoint tracking module, the image size is adjusted to 1800 × 900, and the bounding box size is set to 10 × 10. For the fusion module, pair w1And w2Different values are assigned, and the final weight value is selected so that the error of the final viewpoint position prediction value is minimized.
The invention provides a viewpoint sequence prediction system based on the past viewpoint position of a user and 360-degree video content, which is suitable for the requirement of improving the bandwidth utilization rate in 360-degree video transmission. The viewpoint sequence prediction structure provided by the invention can predict the viewpoint positions of the user at a plurality of moments in the future, can change the sequence length of the predicted viewpoint according to the head movement speed of the user, has good practicability and expansibility, and lays a solid foundation for the high-efficiency transmission of 360-degree videos.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims (8)

1. A method for predicting a view sequence viewed by a user in 360-degree video transmission, the method comprising:
using the viewpoint positions of the past time of a user as the input of a viewpoint sequence prediction model, predicting the viewpoint positions of a plurality of future times through the viewpoint sequence prediction model, wherein the viewpoint positions of the plurality of future times predicted by the viewpoint sequence prediction model form a first viewpoint sequence; the viewpoint sequence prediction model is constructed based on a cyclic neural network and is used for encoding the input viewpoint positions and then inputting the encoded viewpoint positions into the cyclic neural network, calculating the values of a hiding unit and an output unit, learning the long-time dependence relationship among the viewing viewpoints of the user at different moments and outputting the viewpoint positions of a plurality of moments in the future; the viewpoint positions include: unit circle projection of a pitch angle, a yaw angle and a rolling angle, wherein the change range of the viewpoint position is-1 to 1; adopting a hyperbolic tangent function as an activation function of an output unit, wherein the activation function limits the output range of the viewpoint position;
using video content as input of a viewpoint tracking model through the viewpoint tracking model, predicting viewpoint positions at a plurality of moments in the future through the viewpoint tracking model, wherein the viewpoint positions at the plurality of moments in the future predicted by the viewpoint tracking model form a second viewpoint sequence; the viewpoint tracking model is constructed according to a filter algorithm related to target tracking, wherein the related filter algorithm is as follows: setting a relevant filter, wherein the relevant filter forms a maximum response value for a video area at the position of a viewpoint;
and determining a future viewing viewpoint sequence of the user by combining the first viewpoint sequence and the second viewpoint sequence.
2. The method of claim 1, wherein before the viewpoint positions of the user at past time are input as the viewpoint sequence prediction model, the method further comprises: and constructing a viewpoint sequence prediction model based on the recurrent neural network.
3. The method of claim 2, wherein the predicting the view sequence of the user's view at the past time point by using the view position of the user as an input of a view sequence prediction model, comprises:
taking the viewpoint position of the user at the current moment as the input of the first iteration of the viewpoint sequence prediction model to obtain the predicted viewpoint position of the first iteration;
and circularly taking the predicted viewpoint position of the previous iteration as the input of the next iteration of the viewpoint sequence prediction model to obtain the predicted viewpoint positions of a plurality of moments in the future.
4. The method of claim 1, wherein the length of the first view sequence is related to the head movement speed of the user during the viewing, and the slower the head movement speed of the user, the longer the length of the corresponding first view sequence; the faster the head movement speed of the user is, the shorter the length of the corresponding first view sequence is.
5. The method of predicting a user-viewing viewpoint sequence for 360 degree video transmission as claimed in claim 1, further comprising, before using video content as input to the viewpoint tracking model by the viewpoint tracking model, and predicting viewpoint positions at a plurality of time points in the future by the viewpoint tracking model: and constructing a viewpoint tracking model according to a filter algorithm related to target tracking.
6. The method of claim 5, wherein the predicting the viewpoint sequence for user viewing in 360 degree video transmission comprises using a viewpoint tracking model to input video content as the input of the viewpoint tracking model, and predicting the viewpoint positions at a plurality of time points in the future by the viewpoint tracking model, and comprises:
projecting a spherical image of a 360-degree video frame at a future moment into a planar image by adopting an equidistant cylindrical projection mode;
determining a boundary frame in the plane image through the viewpoint tracking model, wherein the area in the boundary frame is a viewpoint area, and determining a corresponding viewpoint position according to the viewpoint area.
7. The method of predicting a user's viewing view sequence for a 360 degree video transmission of any of claims 1-6 wherein said determining a user's future viewing view sequence in combination with the first view sequence and the second view sequence comprises:
setting different weight values w for the viewpoint position in the first viewpoint sequence and the viewpoint position in the second viewpoint sequence respectively1And w2(ii) a And weight w1And w2Satisfy w1+w21 is ═ 1; wherein: weight value w1And w2The setting of (2) needs to meet the principle of minimum error between the predicted future viewing viewpoint position of the user and the actual viewing viewpoint position of the user;
according to the weight value w1And w2Calculating a future viewing viewpoint sequence of the user according to the viewpoint position in the first viewpoint sequence and the viewpoint position in the second viewpoint sequence; the calculation formula is as follows:
Figure FDA0002300676740000021
wherein:
Figure FDA0002300676740000022
from time t +1 to t + twTemporal user future viewing viewpoint position, w1Is a weight value of the first view sequence,
Figure FDA0002300676740000023
from time t +1 to t + twPosition of a view in a temporal first view sequence, w2Is a weight value of the second view sequence,
Figure FDA0002300676740000024
from time t +1 to t + twThe view position in the second view sequence at time instant ⊙ represents an element-by-element multiplication, t being the current time instant, twIs a predicted time window.
8. The method of claim 7, wherein the weights w of the second view sequence predicted by the view tracking model are increased with increasing prediction time2Gradually decreases.
CN201810886661.7A 2018-08-06 2018-08-06 User watching viewpoint sequence prediction method for 360-degree video transmission Active CN109257584B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810886661.7A CN109257584B (en) 2018-08-06 2018-08-06 User watching viewpoint sequence prediction method for 360-degree video transmission

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810886661.7A CN109257584B (en) 2018-08-06 2018-08-06 User watching viewpoint sequence prediction method for 360-degree video transmission

Publications (2)

Publication Number Publication Date
CN109257584A CN109257584A (en) 2019-01-22
CN109257584B true CN109257584B (en) 2020-03-10

Family

ID=65048730

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810886661.7A Active CN109257584B (en) 2018-08-06 2018-08-06 User watching viewpoint sequence prediction method for 360-degree video transmission

Country Status (1)

Country Link
CN (1) CN109257584B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109862019B (en) * 2019-02-20 2021-10-22 联想(北京)有限公司 Data processing method, device and system
CN110248212B (en) * 2019-05-27 2020-06-02 上海交通大学 Multi-user 360-degree video stream server-side code rate self-adaptive transmission method and system
CN110166850B (en) * 2019-05-30 2020-11-06 上海交通大学 Method and system for predicting panoramic video watching position by multiple CNN networks
CN110248178B (en) * 2019-06-18 2021-11-23 深圳大学 Viewport prediction method and system using object tracking and historical track panoramic video
CN114040184B (en) * 2021-11-26 2024-07-16 京东方科技集团股份有限公司 Image display method, system, storage medium and computer program product

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104768018A (en) * 2015-02-04 2015-07-08 浙江工商大学 Fast viewpoint predicting method based on depth map
CN106612426A (en) * 2015-10-26 2017-05-03 华为技术有限公司 Method and device for transmitting multi-view video
CN107274472A (en) * 2017-06-16 2017-10-20 福州瑞芯微电子股份有限公司 A kind of method and apparatus of raising VR play frame rate
CN107422844A (en) * 2017-03-27 2017-12-01 联想(北京)有限公司 A kind of information processing method and electronic equipment
CN107533230A (en) * 2015-03-06 2018-01-02 索尼互动娱乐股份有限公司 Head mounted display tracing system
CN107770561A (en) * 2017-10-30 2018-03-06 河海大学 A kind of multiresolution virtual reality device screen content encryption algorithm using eye-tracking data
CN108134941A (en) * 2016-12-01 2018-06-08 联发科技股份有限公司 Adaptive video coding/decoding method and its device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10432988B2 (en) * 2016-04-15 2019-10-01 Ati Technologies Ulc Low latency wireless virtual reality systems and methods
US9681096B1 (en) * 2016-07-18 2017-06-13 Apple Inc. Light field capture

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104768018A (en) * 2015-02-04 2015-07-08 浙江工商大学 Fast viewpoint predicting method based on depth map
CN107533230A (en) * 2015-03-06 2018-01-02 索尼互动娱乐股份有限公司 Head mounted display tracing system
CN106612426A (en) * 2015-10-26 2017-05-03 华为技术有限公司 Method and device for transmitting multi-view video
CN108134941A (en) * 2016-12-01 2018-06-08 联发科技股份有限公司 Adaptive video coding/decoding method and its device
CN107422844A (en) * 2017-03-27 2017-12-01 联想(北京)有限公司 A kind of information processing method and electronic equipment
CN107274472A (en) * 2017-06-16 2017-10-20 福州瑞芯微电子股份有限公司 A kind of method and apparatus of raising VR play frame rate
CN107770561A (en) * 2017-10-30 2018-03-06 河海大学 A kind of multiresolution virtual reality device screen content encryption algorithm using eye-tracking data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Viewpoint-predicting-based Remote Rendering on Mobile Devices using Multiple;Wang Xiaochuan,Liang Xiaohui;《2015 International Conference on Virtual Reality and Visualization》;20160512;全文 *

Also Published As

Publication number Publication date
CN109257584A (en) 2019-01-22

Similar Documents

Publication Publication Date Title
CN109257584B (en) User watching viewpoint sequence prediction method for 360-degree video transmission
Xu et al. Predicting head movement in panoramic video: A deep reinforcement learning approach
WO2023216572A1 (en) Cross-video target tracking method and system, and electronic device and storage medium
CN103795976B (en) A kind of full-time empty 3 d visualization method
CN113811920A (en) Distributed pose estimation
CN111540055A (en) Three-dimensional model driving method, device, electronic device and storage medium
US10545215B2 (en) 4D camera tracking and optical stabilization
CN110246160B (en) Video target detection method, device, equipment and medium
Wang et al. Background modeling and referencing for moving cameras-captured surveillance video coding in HEVC
Yang et al. Single and sequential viewports prediction for 360-degree video streaming
US10200618B2 (en) Automatic device operation and object tracking based on learning of smooth predictors
WO2008054489A2 (en) Wide-area site-based video surveillance system
CN112207821B (en) Target searching method of visual robot and robot
CN110111364B (en) Motion detection method and device, electronic equipment and storage medium
CN113365156B (en) Panoramic video multicast stream view angle prediction method based on limited view field feedback
CN109255351A (en) Bounding box homing method, system, equipment and medium based on Three dimensional convolution neural network
CN112449152A (en) Method, system and equipment for synchronizing multiple paths of videos
CN114266823A (en) Monocular SLAM method combining SuperPoint network characteristic extraction
WO2014205769A1 (en) Local binary pattern-based optical flow
Huang et al. One-shot imitation drone filming of human motion videos
CN117221633B (en) Virtual reality live broadcast system based on meta universe and digital twin technology
Fang et al. Coordinate-aligned multi-camera collaboration for active multi-object tracking
Liu et al. Edge-to-fog computing for color-assisted moving object detection
CN113592875B (en) Data processing method, image processing method, storage medium, and computing device
CN116485974A (en) Picture rendering, data prediction and training method, system, storage and server thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant