CN109257584B - User watching viewpoint sequence prediction method for 360-degree video transmission - Google Patents
User watching viewpoint sequence prediction method for 360-degree video transmission Download PDFInfo
- Publication number
- CN109257584B CN109257584B CN201810886661.7A CN201810886661A CN109257584B CN 109257584 B CN109257584 B CN 109257584B CN 201810886661 A CN201810886661 A CN 201810886661A CN 109257584 B CN109257584 B CN 109257584B
- Authority
- CN
- China
- Prior art keywords
- viewpoint
- sequence
- user
- future
- view
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 30
- 230000005540 biological transmission Effects 0.000 title claims abstract description 20
- 230000004886 head movement Effects 0.000 claims abstract description 11
- 230000008859 change Effects 0.000 claims abstract description 7
- 238000013528 artificial neural network Methods 0.000 claims description 20
- 230000006870 function Effects 0.000 claims description 12
- 125000004122 cyclic group Chemical group 0.000 claims description 8
- 230000000306 recurrent effect Effects 0.000 claims description 8
- 230000004913 activation Effects 0.000 claims description 5
- 230000004044 response Effects 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 2
- 230000007423 decrease Effects 0.000 claims description 2
- 238000005096 rolling process Methods 0.000 claims description 2
- 230000002123 temporal effect Effects 0.000 claims description 2
- 239000011159 matrix material Substances 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000012549 training Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 230000004927 fusion Effects 0.000 description 3
- 230000033001 locomotion Effects 0.000 description 3
- 230000000386 athletic effect Effects 0.000 description 2
- 230000000295 complement effect Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000000611 regression analysis Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a method for predicting a view point sequence watched by a user in 360-degree video transmission, which comprises the following steps: using the viewpoint positions of the past time of a user as the input of a viewpoint sequence prediction model, predicting the viewpoint positions of a plurality of future times through the viewpoint sequence prediction model, wherein the viewpoint positions of the plurality of future times form a first viewpoint sequence; using video content as input of a viewpoint tracking model through the viewpoint tracking model, and predicting viewpoint positions at a plurality of moments in the future through the viewpoint tracking model, wherein the viewpoint positions at the plurality of moments in the future form a second viewpoint sequence; and determining a future viewing viewpoint sequence of the user by combining the first viewpoint sequence and the second viewpoint sequence. The prediction method in the invention has good practicability and expansibility, and can change the sequence length of the prediction viewpoint according to the head movement speed of the user.
Description
Technical Field
The invention relates to the technical field of video communication, in particular to a user watching viewpoint sequence prediction method for 360-degree video transmission.
Background
Compared with the traditional video, the 360-degree video captures each azimuth scene of the real world by adopting an omnibearing camera and splices the scenes to form a panoramic image. When watching 360-degree videos, the user can freely rotate the head to adjust the watching visual angle, and immersive experience is obtained. However, 360 degree video has ultra-high resolution, and the bandwidth consumed for transmitting the complete 360 degree video is up to 6 times that of the conventional video. In situations where network bandwidth resources are limited, especially for mobile networks, it is difficult to transmit a full 360 degree video.
Limited by the field of view area of the head mounted display, the user can only view a portion of the 360 degree video at each moment. Therefore, bandwidth can be more effectively utilized by selecting a video area in which the user is interested for transmission according to the head movement of the user. From the acquisition of the user's demand information and the feedback of this information to the server side, the user experiences a Round-Trip Time (RTT) from the user to the server until the user receives the video content. The user may have moved head position during this time period, resulting in the content received by the user no longer being a part of his interest. In order to avoid transmission delay caused by RTT delay, it is necessary to predict the viewpoint of the user.
From the search of the prior art, it is found that, in order to realize the viewpoint prediction of the user, a common method is to infer the viewpoint position at a future time by using the viewpoint positions at past times. An article entitled "Shooting a moving target: Motion-prediction-based transmission for 360-degree video" is published in an IEEE International conference on Big Data conference by Bao et al, and the article proposes a simple model directly taking the viewpoint position at the current moment as the viewpoint position at the future moment, and three regression models for performing regression analysis on the change relationship of the viewpoint position of the user along with the time by adopting linear regression and feedforward neural networks to predict the viewpoint position at the future moment. However, factors such as occupation, age, gender, preference and the like of a user can influence the region of interest of the user for 360-degree video, the relation between the viewpoint position at the future time and the viewpoint position at the past time can be characterized as a nonlinear and long-time dependence relation, and three prediction models provided by the article can only predict a single viewpoint position and cannot predict viewpoint positions at multiple times in the future.
Through retrieval, the article entitled "Predicting head objects in 360virtual reality video is published by a.d. aladagli et al in International Conference on 3 similarity, 2018, pp.1-6, which considers the influence of video content on the viewpoint position of a user, predicts the saliency region of a video based on a saliency algorithm, and predicts the viewpoint position of the user accordingly. However, this article does not consider the influence of the past time viewpoint position on the viewing viewpoint.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a user watching viewpoint sequence prediction method for 360-degree video transmission.
The invention provides a method for predicting a view point sequence watched by a user in 360-degree video transmission, which comprises the following steps:
using the viewpoint positions of the past time of a user as the input of a viewpoint sequence prediction model, predicting the viewpoint positions of a plurality of future times through the viewpoint sequence prediction model, wherein the viewpoint positions of the plurality of future times form a first viewpoint sequence;
using video content as input of a viewpoint tracking model through the viewpoint tracking model, and predicting viewpoint positions at a plurality of moments in the future through the viewpoint tracking model, wherein the viewpoint positions at the plurality of moments in the future form a second viewpoint sequence;
and determining a future viewing viewpoint sequence of the user by combining the first viewpoint sequence and the second viewpoint sequence.
Optionally, before the viewpoint positions of the past time of the user are used as the input of the viewpoint sequence prediction model, and the viewpoint positions of a plurality of time in the future are predicted by the viewpoint sequence prediction model, the method further includes:
constructing a viewpoint sequence prediction model based on a recurrent neural network; the viewpoint sequence prediction model is used for encoding the input viewpoint positions and then inputting the encoded viewpoint positions into a cyclic neural network, calculating values of a hiding unit and an output unit, learning long-time dependency relations among viewing viewpoints of a user at different moments, and outputting viewpoint positions of multiple moments in the future; the viewpoint positions include: unit circle projection of a pitch angle, a yaw angle and a rolling angle, wherein the change range of the viewpoint position is-1 to 1; a hyperbolic tangent function is employed as an activation function of an output unit, the activation function defining an output range of the viewpoint position.
Optionally, the using the viewpoint positions of the past time of the user as an input of a viewpoint sequence prediction model, and predicting the viewpoint positions of a plurality of time in the future by using the viewpoint sequence prediction model includes:
taking the viewpoint position of the user at the current moment as the input of the first iteration of the viewpoint sequence prediction model to obtain the predicted viewpoint position of the first iteration;
and circularly taking the predicted viewpoint position of the previous iteration as the input of the next iteration of the viewpoint sequence prediction model to obtain the predicted viewpoint positions of a plurality of moments in the future.
Optionally, the length of the first viewpoint sequence is related to the head movement speed when viewed by the user, and the slower the head movement speed of the user is, the longer the length of the corresponding first viewpoint sequence is; the faster the head movement speed of the user is, the shorter the length of the corresponding first view sequence is.
Optionally, before the video content is used as an input of the viewpoint tracking model through the viewpoint tracking model, and the viewpoint positions at a plurality of time points in the future are predicted through the viewpoint tracking model, the method further includes:
constructing a viewpoint tracking model according to a filter algorithm related to target tracking, wherein the related filter algorithm is as follows: setting a relevant filter, wherein the relevant filter forms a maximum response value for a video area at the position of the viewpoint.
Optionally, the taking the video content as an input of the viewpoint tracking model through a viewpoint tracking model, and predicting the viewpoint positions at a plurality of time points in the future through the viewpoint tracking model includes:
projecting a spherical image of a 360-degree video frame at a future moment into a planar image by adopting an equidistant cylindrical projection mode;
determining a boundary frame in the plane image through the viewpoint tracking model, wherein the area in the boundary frame is a viewpoint area, and determining a corresponding viewpoint position according to the viewpoint area.
Optionally, the determining a future viewing viewpoint sequence of the user by combining the first viewpoint sequence and the second viewpoint sequence includes:
for the position and position of the viewpoint in the first viewpoint sequenceSetting different weight values w for the viewpoint positions in the second viewpoint sequence1And w2(ii) a And weight w1And w2Satisfy w1+w21 is ═ 1; wherein: weight value w1And w2The setting of (2) needs to meet the principle of minimum error between the predicted future viewing viewpoint position of the user and the actual viewing viewpoint position of the user;
according to the weight value w1And w2Calculating a future viewing viewpoint sequence of the user according to the viewpoint position in the first viewpoint sequence and the viewpoint position in the second viewpoint sequence; the calculation formula is as follows:
wherein:from time t +1 to t + twTemporal user future viewing viewpoint position, w1Is a weight value of the first view sequence,from time t +1 to t + twPosition of a view in a temporal first view sequence, w2Is a weight value of the second view sequence,from time t +1 to t + twThe view position in the second view sequence at time instant ⊙ represents an element-by-element multiplication, t being the current time instant, twIs a predicted time window.
Optionally, the weight w of the second view sequence predicted by the view tracking model increases with the prediction time instant2Gradually decreases.
Compared with the prior art, the invention has the following beneficial effects:
the method for predicting the view point sequence of the user watching in 360-degree video transmission combines a recurrent neural network to learn the long-time dependence relationship among the watching view points of the user at different moments, and predicts the view point positions of a plurality of moments in the future based on the user view point positions at the past moments; meanwhile, the influence of video content on viewing viewpoints is considered, and future viewpoint sequences are predicted based on the video content; and finally, the influence of the cyclic neural network and the video content on the viewing viewpoint is synthesized to obtain a future viewing viewpoint sequence of the user, the length of the predicted viewpoint sequence can be changed according to the head movement speed of the user, and the method has good practicability and expansibility.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
fig. 1 is a system block diagram of a method for predicting a view sequence of a user viewing using 360-degree video transmission according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a viewpoint area according to an embodiment of the present invention.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that variations and modifications can be made by persons skilled in the art without departing from the spirit of the invention. All falling within the scope of the present invention.
Fig. 1 is a system block diagram of a method for predicting a view sequence of a user by using 360-degree video transmission according to an embodiment of the present invention, as shown in fig. 1, including: a viewpoint predicting module based on a recurrent neural network, a viewpoint tracking module based on a relevant filter and a fusing module, wherein: the cyclic neural network viewpoint predicting module is used for predicting viewpoint positions of a plurality of moments in the future based on the viewpoint positions of the user at the past moment by combining the long-time dependency relationship between the watching viewpoints of the user at different moments learned by the cyclic neural network; the related filter viewpoint tracking module considers the influence of video content on viewing viewpoints, provides a viewpoint tracking module based on a related filter, explores the relation between the video content and viewpoint sequences, and predicts future viewpoint sequences based on the video content; and the fusion module combines the prediction results of the circular neural network viewpoint prediction module and the related filter viewpoint tracking module, the advantages of the two modules are complementary, and the prediction accuracy of the model is improved.
In the embodiment, the long-time dependency relationship among the viewing viewpoints of the user at different moments is learned by combining a recurrent neural network, and the viewpoint positions of a plurality of moments in the future are predicted based on the viewpoint positions of the user at the past moments; meanwhile, the influence of video content on viewing viewpoints is considered, a viewpoint tracking module based on a relevant filter is provided, the relation between the video content and a viewpoint sequence is explored, and a future viewpoint sequence is predicted based on the video content; and finally, the prediction results of the circular neural network viewpoint prediction module and the related filter viewpoint tracking module are combined through the fusion module, the advantages of the two modules are complementary, and the prediction accuracy of the model is improved. The viewpoint sequence prediction structure provided by the invention can change the sequence length of the predicted viewpoint according to the head movement speed of the user, has good practicability and expansibility, and lays a solid foundation for the high-efficiency transmission of 360-degree videos.
Specifically, in the present embodiment, the predicted viewpoint positions are actually the predicted pitch angle (θ) and yaw angleAnd a unit circle projection position of a roll angle (psi), wherein a pitch angle (theta), a yaw angle (phi) are predictedAnd a roll angle (psi) corresponding to an angle at which the user's head is rotated about the X, Y and Z axes. Fig. 2 is a schematic view of a viewpoint area according to an embodiment of the present invention, referring to fig. 2; these three angles, which define the initial position of the user's head, are all 0 degrees, each varying between-180 ° and 180 °. These three angles determine a unique viewpoint position for a user viewing video with a head mounted display, and experiments show that the yaw angle when the user turns his headThe change is most pronounced with respect to the other two angles and is therefore most difficult to predict.
In this example, the main focus is on yaw angleThe proposed system architecture can be extended directly to the prediction of the other two corners. According to the angle definition, -180 ° and 179 ° differ by 1 ° instead of 359 °, in order to avoid this problem, it is first necessary to transform the prediction angle, usingAs an input, whereinBefore outputting the prediction result, the predicted V istIs inversely transformed, whereinWherein, VtYaw angle at time tAn output vector obtained after the transformation of the g function,yaw angle at time tThe value of the sine of (c) is,yaw angle at time tThe cosine value of (a) of (b),for yaw angle at time tIs transformed.
In this embodiment, the cyclic neural network viewpoint predicting module uses the viewpoint position at the current timePredicting yaw angles at a plurality of times as inputWherein t iswIs a predicted time window for the time in which,the value of the yaw angle at time t +1,is t + twThe value of the yaw angle at that moment. If the user's head moves slowly, a larger prediction time window t may be selectedwWhereas the prediction time window needs to be set to a smaller value. During the training process, for each time step i, willEncoding into 128-dimensional vector xiI ranges from t to t + tw-1. Then x is putiInputting the data into a recurrent neural network, and calculating a hidden unit hiAnd an output unitFrom t to t + tw-1 at each time step, applying to the following update equation:
hi=σ2(Whxxi+Whhhi-1+bh) (2)
yi=Wohhi+bo(3)
wherein, WxvTo make a yaw angleEncoding into 128-dimensional vector xiWeight matrix of the process, WhxIs an input unit xiTo a hidden unit hiWeight matrix of connections, WhhHidden unit h at time i-1i-1Hidden unit h to moment iiWeight matrix of connections, WohFor hiding the unit hiTo the output unit oiWeight matrix of connections, bxAs a bias vector for the encoding process, bhFor computing hidden units hiOffset vector of boFor calculating the output unit oiThe offset vector of (2). In the test process, the viewpoint position at the current moment is adoptedAs input for the first iteration, for other time steps, the prediction result of the last iteration is used as input for the next iteration, i.e.σ1And σ2Is an activation function, where σ1Is a linear rectification function, σ2Is a hyperbolic tangent function;the predicted value of the user viewpoint position at the time of i +1,for yaw angle at time iThe function of hi-1Hidden unit at time i-1, g-1(yi) For the output result yiInverse transformation of the g function of (c).
In this embodiment, the relevant filter viewpoint tracking module designs a relevant filter having a maximum response to a region where a viewpoint is located according to a target tracking relevant filter algorithm, and adopts a future-time 360-degree video frameAs input, making predictions of the location of the view point based on the video content; wherein: ft+1A 360 degree video frame at time t +1,is t + twA 360 degree video frame at time. Since the target tracking correlation filter algorithm is mainly used for tracking a specific object in a video, a tracked viewpoint is more abstract than the specific object in the embodiment. Therefore, it is necessary to project the spherical image of the 360-degree video frame into a planar image by using an equidistant cylindrical projection method, and reposition the region corresponding to the viewpoint on the planar image. For the projected plane image, the image content near the pole is expanded horizontally, and the corresponding area corresponding to the viewpoint is no longer rectangular, so a bounding box is set around the viewpoint, and the size and shape of the viewpoint area are redefined. Thus, the bounding box of the viewpoint can be predicted based on the video content, thereby predicting the viewpoint position
In this embodiment, the prediction results of the cyclic neural network viewpoint prediction module and the relevant filter viewpoint tracking module are combined, and different weights are given to obtain the final prediction result, that is, the final prediction result is obtained Wherein,is the finalThe result of the prediction is that,andrespectively, predicting results of a circular neural network viewpoint predicting module and a related filter viewpoint tracking module, ⊙ and element-by-element multiplication, and weight w1And w2Satisfy w1+w2A weight value that minimizes the error of the final viewpoint position prediction value is used as 1. The correlation filter viewpoint tracking module cannot update the filter, the difference between the viewpoint estimation value and the true value gradually increases along with the accumulation of errors, and the weight of the prediction result of the correlation filter viewpoint tracking is gradually reduced for a large prediction window. The advantages of the viewpoint sequence prediction module based on the recurrent neural network and the advantages of the viewpoint tracking system module based on the relevant filter are complemented, and the prediction accuracy is improved.
The key parameters in this embodiment are set as follows: the experimental Data comes from Y.Bao et al, who published an article entitled "Shooting a moving target: Motion-prediction-based transmission for 360-degree video" in the Conference of IEEEInternational Conference on Big Data, and the Data collected head Motion information when 153 volunteers watched 16 segments of 360-degree video, and some volunteers only watched part of the video, and collected 985 watching samples in total. In the data preprocessing of the present embodiment, each viewing sample is sampled 10 times per second, and 289 motion data are recorded for each viewing sample, so that 285665 motion data are obtained. 80% of the athletic data was used as the training set and 20% of the athletic data was used as the test set. For the recurrent neural network module, the hidden unit size is set to 256, and the momentum and weight attenuation are set to 0.8 and 0.999 respectively by using Adam (adaptive moment estimation) optimization method. The batch size is 128, training 500 cycles total. The learning rate linearly decays from 0.001 to 0.0001 during the first 250 cycles of training. For the relevant filter viewpoint tracking module, the image size is adjusted to 1800 × 900, and the bounding box size is set to 10 × 10. For the fusion module, pair w1And w2Different values are assigned, and the final weight value is selected so that the error of the final viewpoint position prediction value is minimized.
The invention provides a viewpoint sequence prediction system based on the past viewpoint position of a user and 360-degree video content, which is suitable for the requirement of improving the bandwidth utilization rate in 360-degree video transmission. The viewpoint sequence prediction structure provided by the invention can predict the viewpoint positions of the user at a plurality of moments in the future, can change the sequence length of the predicted viewpoint according to the head movement speed of the user, has good practicability and expansibility, and lays a solid foundation for the high-efficiency transmission of 360-degree videos.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.
Claims (8)
1. A method for predicting a view sequence viewed by a user in 360-degree video transmission, the method comprising:
using the viewpoint positions of the past time of a user as the input of a viewpoint sequence prediction model, predicting the viewpoint positions of a plurality of future times through the viewpoint sequence prediction model, wherein the viewpoint positions of the plurality of future times predicted by the viewpoint sequence prediction model form a first viewpoint sequence; the viewpoint sequence prediction model is constructed based on a cyclic neural network and is used for encoding the input viewpoint positions and then inputting the encoded viewpoint positions into the cyclic neural network, calculating the values of a hiding unit and an output unit, learning the long-time dependence relationship among the viewing viewpoints of the user at different moments and outputting the viewpoint positions of a plurality of moments in the future; the viewpoint positions include: unit circle projection of a pitch angle, a yaw angle and a rolling angle, wherein the change range of the viewpoint position is-1 to 1; adopting a hyperbolic tangent function as an activation function of an output unit, wherein the activation function limits the output range of the viewpoint position;
using video content as input of a viewpoint tracking model through the viewpoint tracking model, predicting viewpoint positions at a plurality of moments in the future through the viewpoint tracking model, wherein the viewpoint positions at the plurality of moments in the future predicted by the viewpoint tracking model form a second viewpoint sequence; the viewpoint tracking model is constructed according to a filter algorithm related to target tracking, wherein the related filter algorithm is as follows: setting a relevant filter, wherein the relevant filter forms a maximum response value for a video area at the position of a viewpoint;
and determining a future viewing viewpoint sequence of the user by combining the first viewpoint sequence and the second viewpoint sequence.
2. The method of claim 1, wherein before the viewpoint positions of the user at past time are input as the viewpoint sequence prediction model, the method further comprises: and constructing a viewpoint sequence prediction model based on the recurrent neural network.
3. The method of claim 2, wherein the predicting the view sequence of the user's view at the past time point by using the view position of the user as an input of a view sequence prediction model, comprises:
taking the viewpoint position of the user at the current moment as the input of the first iteration of the viewpoint sequence prediction model to obtain the predicted viewpoint position of the first iteration;
and circularly taking the predicted viewpoint position of the previous iteration as the input of the next iteration of the viewpoint sequence prediction model to obtain the predicted viewpoint positions of a plurality of moments in the future.
4. The method of claim 1, wherein the length of the first view sequence is related to the head movement speed of the user during the viewing, and the slower the head movement speed of the user, the longer the length of the corresponding first view sequence; the faster the head movement speed of the user is, the shorter the length of the corresponding first view sequence is.
5. The method of predicting a user-viewing viewpoint sequence for 360 degree video transmission as claimed in claim 1, further comprising, before using video content as input to the viewpoint tracking model by the viewpoint tracking model, and predicting viewpoint positions at a plurality of time points in the future by the viewpoint tracking model: and constructing a viewpoint tracking model according to a filter algorithm related to target tracking.
6. The method of claim 5, wherein the predicting the viewpoint sequence for user viewing in 360 degree video transmission comprises using a viewpoint tracking model to input video content as the input of the viewpoint tracking model, and predicting the viewpoint positions at a plurality of time points in the future by the viewpoint tracking model, and comprises:
projecting a spherical image of a 360-degree video frame at a future moment into a planar image by adopting an equidistant cylindrical projection mode;
determining a boundary frame in the plane image through the viewpoint tracking model, wherein the area in the boundary frame is a viewpoint area, and determining a corresponding viewpoint position according to the viewpoint area.
7. The method of predicting a user's viewing view sequence for a 360 degree video transmission of any of claims 1-6 wherein said determining a user's future viewing view sequence in combination with the first view sequence and the second view sequence comprises:
setting different weight values w for the viewpoint position in the first viewpoint sequence and the viewpoint position in the second viewpoint sequence respectively1And w2(ii) a And weight w1And w2Satisfy w1+w21 is ═ 1; wherein: weight value w1And w2The setting of (2) needs to meet the principle of minimum error between the predicted future viewing viewpoint position of the user and the actual viewing viewpoint position of the user;
according to the weight value w1And w2Calculating a future viewing viewpoint sequence of the user according to the viewpoint position in the first viewpoint sequence and the viewpoint position in the second viewpoint sequence; the calculation formula is as follows:
wherein:from time t +1 to t + twTemporal user future viewing viewpoint position, w1Is a weight value of the first view sequence,from time t +1 to t + twPosition of a view in a temporal first view sequence, w2Is a weight value of the second view sequence,from time t +1 to t + twThe view position in the second view sequence at time instant ⊙ represents an element-by-element multiplication, t being the current time instant, twIs a predicted time window.
8. The method of claim 7, wherein the weights w of the second view sequence predicted by the view tracking model are increased with increasing prediction time2Gradually decreases.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810886661.7A CN109257584B (en) | 2018-08-06 | 2018-08-06 | User watching viewpoint sequence prediction method for 360-degree video transmission |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810886661.7A CN109257584B (en) | 2018-08-06 | 2018-08-06 | User watching viewpoint sequence prediction method for 360-degree video transmission |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109257584A CN109257584A (en) | 2019-01-22 |
CN109257584B true CN109257584B (en) | 2020-03-10 |
Family
ID=65048730
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810886661.7A Active CN109257584B (en) | 2018-08-06 | 2018-08-06 | User watching viewpoint sequence prediction method for 360-degree video transmission |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109257584B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109862019B (en) * | 2019-02-20 | 2021-10-22 | 联想(北京)有限公司 | Data processing method, device and system |
CN110248212B (en) * | 2019-05-27 | 2020-06-02 | 上海交通大学 | Multi-user 360-degree video stream server-side code rate self-adaptive transmission method and system |
CN110166850B (en) * | 2019-05-30 | 2020-11-06 | 上海交通大学 | Method and system for predicting panoramic video watching position by multiple CNN networks |
CN110248178B (en) * | 2019-06-18 | 2021-11-23 | 深圳大学 | Viewport prediction method and system using object tracking and historical track panoramic video |
CN114040184B (en) * | 2021-11-26 | 2024-07-16 | 京东方科技集团股份有限公司 | Image display method, system, storage medium and computer program product |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104768018A (en) * | 2015-02-04 | 2015-07-08 | 浙江工商大学 | Fast viewpoint predicting method based on depth map |
CN106612426A (en) * | 2015-10-26 | 2017-05-03 | 华为技术有限公司 | Method and device for transmitting multi-view video |
CN107274472A (en) * | 2017-06-16 | 2017-10-20 | 福州瑞芯微电子股份有限公司 | A kind of method and apparatus of raising VR play frame rate |
CN107422844A (en) * | 2017-03-27 | 2017-12-01 | 联想(北京)有限公司 | A kind of information processing method and electronic equipment |
CN107533230A (en) * | 2015-03-06 | 2018-01-02 | 索尼互动娱乐股份有限公司 | Head mounted display tracing system |
CN107770561A (en) * | 2017-10-30 | 2018-03-06 | 河海大学 | A kind of multiresolution virtual reality device screen content encryption algorithm using eye-tracking data |
CN108134941A (en) * | 2016-12-01 | 2018-06-08 | 联发科技股份有限公司 | Adaptive video coding/decoding method and its device |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10432988B2 (en) * | 2016-04-15 | 2019-10-01 | Ati Technologies Ulc | Low latency wireless virtual reality systems and methods |
US9681096B1 (en) * | 2016-07-18 | 2017-06-13 | Apple Inc. | Light field capture |
-
2018
- 2018-08-06 CN CN201810886661.7A patent/CN109257584B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104768018A (en) * | 2015-02-04 | 2015-07-08 | 浙江工商大学 | Fast viewpoint predicting method based on depth map |
CN107533230A (en) * | 2015-03-06 | 2018-01-02 | 索尼互动娱乐股份有限公司 | Head mounted display tracing system |
CN106612426A (en) * | 2015-10-26 | 2017-05-03 | 华为技术有限公司 | Method and device for transmitting multi-view video |
CN108134941A (en) * | 2016-12-01 | 2018-06-08 | 联发科技股份有限公司 | Adaptive video coding/decoding method and its device |
CN107422844A (en) * | 2017-03-27 | 2017-12-01 | 联想(北京)有限公司 | A kind of information processing method and electronic equipment |
CN107274472A (en) * | 2017-06-16 | 2017-10-20 | 福州瑞芯微电子股份有限公司 | A kind of method and apparatus of raising VR play frame rate |
CN107770561A (en) * | 2017-10-30 | 2018-03-06 | 河海大学 | A kind of multiresolution virtual reality device screen content encryption algorithm using eye-tracking data |
Non-Patent Citations (1)
Title |
---|
Viewpoint-predicting-based Remote Rendering on Mobile Devices using Multiple;Wang Xiaochuan,Liang Xiaohui;《2015 International Conference on Virtual Reality and Visualization》;20160512;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN109257584A (en) | 2019-01-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109257584B (en) | User watching viewpoint sequence prediction method for 360-degree video transmission | |
Xu et al. | Predicting head movement in panoramic video: A deep reinforcement learning approach | |
WO2023216572A1 (en) | Cross-video target tracking method and system, and electronic device and storage medium | |
CN103795976B (en) | A kind of full-time empty 3 d visualization method | |
CN113811920A (en) | Distributed pose estimation | |
CN111540055A (en) | Three-dimensional model driving method, device, electronic device and storage medium | |
US10545215B2 (en) | 4D camera tracking and optical stabilization | |
CN110246160B (en) | Video target detection method, device, equipment and medium | |
Wang et al. | Background modeling and referencing for moving cameras-captured surveillance video coding in HEVC | |
Yang et al. | Single and sequential viewports prediction for 360-degree video streaming | |
US10200618B2 (en) | Automatic device operation and object tracking based on learning of smooth predictors | |
WO2008054489A2 (en) | Wide-area site-based video surveillance system | |
CN112207821B (en) | Target searching method of visual robot and robot | |
CN110111364B (en) | Motion detection method and device, electronic equipment and storage medium | |
CN113365156B (en) | Panoramic video multicast stream view angle prediction method based on limited view field feedback | |
CN109255351A (en) | Bounding box homing method, system, equipment and medium based on Three dimensional convolution neural network | |
CN112449152A (en) | Method, system and equipment for synchronizing multiple paths of videos | |
CN114266823A (en) | Monocular SLAM method combining SuperPoint network characteristic extraction | |
WO2014205769A1 (en) | Local binary pattern-based optical flow | |
Huang et al. | One-shot imitation drone filming of human motion videos | |
CN117221633B (en) | Virtual reality live broadcast system based on meta universe and digital twin technology | |
Fang et al. | Coordinate-aligned multi-camera collaboration for active multi-object tracking | |
Liu et al. | Edge-to-fog computing for color-assisted moving object detection | |
CN113592875B (en) | Data processing method, image processing method, storage medium, and computing device | |
CN116485974A (en) | Picture rendering, data prediction and training method, system, storage and server thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |