CN109257584A

CN109257584A - The user of 360 degree of transmission of video watches view sequence prediction technique

Info

Publication number: CN109257584A
Application number: CN201810886661.7A
Authority: CN
Inventors: 邹君妮; 杨琴; 刘昕; 李成林; 熊红凯
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2018-08-06
Filing date: 2018-08-06
Publication date: 2019-01-22
Anticipated expiration: 2038-08-06
Also published as: CN109257584B

Abstract

The present invention provides a kind of users of 360 degree of transmission of video to watch view sequence prediction technique, the described method includes: using the viewpoint position of user's last time as the input of view sequence prediction model, by the viewpoint position at view sequence prediction model prediction following multiple moment, the viewpoint position at multiple moment in future constitutes the first view sequence；By viewpoint trace model, using video content as the input of the viewpoint trace model, by the viewpoint position at viewpoint trace model prediction following multiple moment, the viewpoint position at multiple moment in future constitutes the second view sequence；In conjunction with the first view sequence and the second view sequence, the viewing view sequence in user's future is determined.Prediction technique in the present invention can have good practicability and scalability, can change the sequence length of prediction viewpoint according to user's head movement velocity.

Description

The user of 360 degree of transmission of video watches view sequence prediction technique

Technical field

The present invention relates to technical field of video communication, and in particular, to the user of 360 degree of transmission of video watches view sequence Prediction technique.

Background technique

360 degree of videos are a kind of important applications of virtual reality technology, and compared with conventional video, 360 degree of videos are using complete Orientation camera captures each orientation scene of real world, and these scenes are spliced to form panoramic picture.When 360 degree of viewing When video, head adjustment viewing visual angle can be freely rotated in user, obtain immersion experience.However, 360 degree of videos have superelevation Resolution ratio transmits complete 360 degree of videos and the bandwidth consumed is needed to be up to 6 times of conventional video or more.Network bandwidth resources by In the case where limit, for mobile network, it is highly difficult for transmitting complete 360 degree of videos.

It is limited to the field of view of head-mounted display, each moment user can only watch a part of 360 degree of videos.Cause This moves the selection interested video area of user according to user's head and transmits and more efficient can utilize bandwidth.From acquisition The demand information of user, and this information is fed back into server end, until user receives video content, can undergo from user To the round-trip delay (Round-Trip Time, RTT) of server.And user may have occurred that head within this period Portion position is mobile, and the content for causing user to receive no longer is its interested part.In order to avoid the transmission of RTT time delay bring Hysteresis quality needs to predict the viewpoint of user.

After searching and discovering the prior art, in order to realize the view prediction of user, a kind of common method is by right The viewpoint position of last time infers the viewpoint position of future time instance.Y.Bao et al. is in " IEEE International Conference on Big Data " entitled " Shooting a moving target:Motion- has been delivered in meeting The article of prediction-based transmission for 360-degree videos ", this article propose direct incite somebody to action Naive model of the viewpoint position at current time as the viewpoint position of future time instance, and use linear regression, Feedforward Neural Networks Network carries out regression analysis to user's viewpoint position relationship that changes with time, and then predicts three kinds of viewpoint position etc. of future time instance Regression model.But such as user's occupation, at the age, gender, it is emerging for the sense of 360 degree of videos that the factors such as preference will affect user Interesting region, when the relationship between the viewpoint position of future time instance and the viewpoint position of last time can be characterized as non-linear and long Between dependence, this article propose three kinds of prediction models can only predict single viewpoint position, unpredictable multiple moment in future Viewpoint position.

Through retrieval, it has also been found that, A.D.Aladagli et al. is in " International Conference on 3d Immersion, 2018, pp.1-6 " deliver entitled " Predicting head trajectories in 360virtual The article of reality videos ", this article consider influence of the video content for user's viewpoint position, are calculated based on conspicuousness Method predicts the salient region of video, predicts user's viewpoint position with this.But this article does not account for last time viewpoint position Set the influence for watching viewpoint.

Summary of the invention

For the defects in the prior art, the object of the present invention is to provide a kind of users of 360 degree of transmission of video to watch view Point sequence prediction technique.

The user that the present invention provides a kind of 360 degree of transmission of video watches view sequence prediction technique, comprising:

It is pre- by the view sequence using the viewpoint position of user's last time as the input of view sequence prediction model The viewpoint position at model prediction following multiple moment is surveyed, the viewpoint position at multiple moment in future constitutes the first view sequence；

By viewpoint trace model, using video content as the input of the viewpoint trace model, by the viewpoint with The viewpoint position at track model prediction following multiple moment, the viewpoint position at multiple moment in future constitute the second view sequence；

In conjunction with the first view sequence and the second view sequence, the viewing view sequence in user's future is determined.

Optionally, using the viewpoint position of user's last time as the input of view sequence prediction model, by described Before the viewpoint position at view sequence prediction model prediction following multiple moment, further includes:

View sequence prediction model is constructed based on Recognition with Recurrent Neural Network；Wherein, the view sequence prediction model is used to incite somebody to action It is input to Recognition with Recurrent Neural Network after the viewpoint position coding of input, calculates the value of hidden unit and output unit, study user is not The long-time dependence between viewing viewpoint in the same time, the viewpoint position at output following multiple moment；The viewpoint position packet Include: the unit circular projection of pitch angle, yaw angle, roll angle, the variation range of the viewpoint position are -1 to 1；Just using hyperbolic Activation primitive of the function as output unit is cut, the activation primitive limits the output area of the viewpoint position.

Optionally, described using the viewpoint position of user's last time as the input of view sequence prediction model, pass through institute State the viewpoint position at view sequence prediction model prediction following multiple moment, comprising:

Using the viewpoint position at user's current time as the input of the view sequence prediction model first time iteration, obtain The prediction viewpoint position of first time iteration；

Circulation is using the prediction viewpoint position of last iteration as the defeated of the view sequence prediction model next iteration Enter, obtains the prediction viewpoint position at following multiple moment.

Optionally, the speed of head movement is related when the length of first view sequence is watched with user, user's head Movement velocity is slower, then the length of corresponding first view sequence is longer；User's head movement velocity is faster, then and corresponding first The length of view sequence is shorter.

Optionally, institute is being passed through using video content as the input of the viewpoint trace model by viewpoint trace model Before the viewpoint position for stating viewpoint trace model prediction following multiple moment, further includes:

Viewpoint trace model is constructed according to the relevant algorithm filter of target following, wherein the relevant filter is calculated Method refers to: setting correlation filter, the correlation filter will form maximum response to the video area of viewpoint position.

Optionally, described to be passed through by viewpoint trace model using video content as the input of the viewpoint trace model The viewpoint position at viewpoint trace model prediction following multiple moment, comprising:

Using cylindrical equidistant projection mode, the spherical image of 360 degree of video frames of future time instance is projected into flat image；

A bounding box is determined in the flat image by the viewpoint trace model, the region in the bounding box As view region determines corresponding viewpoint position according to the view region.

Optionally, first view sequence of combination and the second view sequence, determine the viewing view sequence in user's future, Include:

For the viewpoint position in the viewpoint position and second view sequence in first view sequence, it is respectively set Different weighted value w₁And w₂；And weight w₁And w₂Meet w₁+w₂=1；Wherein: weighted value w₁And w₂Setting need meet prediction The following viewing viewpoint position and actual user of user watch the error minimum principle of viewpoint position；

According to weighted value w₁And w₂And first viewpoint position in view sequence, the viewpoint position in the second view sequence It sets, calculates the viewing view sequence in user's future；It is as follows in calculation formula:

Wherein:For the t+1 moment to t+t_wThe viewing viewpoint position in user's future moment, w₁For the first viewpoint sequence The weighted value of column,For the t+1 moment to t+t_wViewpoint position in the first view sequence of moment, w₂For the second viewpoint sequence The weighted value of column,For the t+1 moment to t+t_wViewpoint position in the second view sequence of moment, ⊙ are indicated by element phase Multiplication, t are current time, t_wFor predicted time window.

Optionally, the weight for the second view sequence predicted with the increase of prediction time, the viewpoint trace model w₂It is gradually reduced.

Compared with prior art, the present invention have it is following the utility model has the advantages that

The user of 360 degree of transmission of video provided by the invention watches view sequence prediction technique, combines circulation nerve net Network learns the long-time dependence between the viewing viewpoint of user's different moments, and user's viewpoint position based on last time is pre- Survey the viewpoint position at following multiple moment；Influence of the video content to viewing viewpoint is considered simultaneously, is predicted based on video content Following view sequence；The influence of Recognition with Recurrent Neural Network and video content to viewing viewpoint is finally integrated, user's future is obtained View sequence is watched, the length of prediction view sequence can be changed according to user's head movement velocity, there is good practicability And scalability.

Detailed description of the invention

Upon reading the detailed description of non-limiting embodiments with reference to the following drawings, other feature of the invention, Objects and advantages will become more apparent upon:

Fig. 1 is that the user of 360 degree of transmission of video of application that one embodiment of the invention provides watches view sequence prediction technique System block diagram；

Fig. 2 is the schematic illustration for the view region that one embodiment of the invention provides.

Specific embodiment

The present invention is described in detail combined with specific embodiments below.Following embodiment will be helpful to the technology of this field Personnel further understand the present invention, but the invention is not limited in any way.It should be pointed out that the ordinary skill of this field For personnel, without departing from the inventive concept of the premise, various modifications and improvements can be made.These belong to the present invention Protection scope.

Fig. 1 is that the user of 360 degree of transmission of video of application that one embodiment of the invention provides watches view sequence prediction technique System block diagram, as shown in Figure 1, comprising: the view prediction module based on Recognition with Recurrent Neural Network, the viewpoint based on correlation filter Tracking module, Fusion Module, in which: Recognition with Recurrent Neural Network view prediction module, in conjunction with Recognition with Recurrent Neural Network, study user is different Long-time dependence between the viewing viewpoint at moment, user's viewpoint position prediction following multiple moment based on last time Viewpoint position；Correlation filter viewpoint tracking module, it is contemplated that influence of the video content to viewing viewpoint is proposed based on correlation The viewpoint tracking module of filter explores the relationship between video content and view sequence, following based on video content prediction View sequence；Fusion Module combines the pre- of Recognition with Recurrent Neural Network view prediction module and correlation filter viewpoint tracking module It surveys as a result, two modules mutual supplement with each other's advantages, improve the prediction accuracy of model.

In the present embodiment, the long-time learnt between the viewing viewpoint of user's different moments in conjunction with Recognition with Recurrent Neural Network is relied on Relationship, the viewpoint position at user's viewpoint position prediction following multiple moment based on last time；Video content is considered simultaneously Influence to viewing viewpoint proposes the viewpoint tracking module based on correlation filter, explores between video content and view sequence Relationship, based on following view sequence of video content prediction；Recognition with Recurrent Neural Network viewpoint is combined finally by Fusion Module The prediction result of prediction module and correlation filter viewpoint tracking module, two modules have complementary advantages, and improve the prediction of model Accuracy.The pre- geodesic structure of view sequence proposed by the present invention can change the sequence of prediction viewpoint according to user's head movement velocity Column length has good practicability and scalability, has laid solid foundation for the high efficiency of transmission of 360 degree of videos.

Specifically, in the present embodiment, practical prediction viewpoint position is prediction pitch angle (θ), yaw angleAnd roll angle The unit circle projected position of (ψ), wherein prediction pitch angle (θ), yaw angleWith roll angle (ψ) corresponding user's head around X-axis, The angle that Y-axis and Z axis are rotated.Fig. 2 is the schematic illustration for the view region that one embodiment of the invention provides, referring to fig. 2； These three angles for defining the initial position of user's head are 0 degree, and each angle change range is between -180 ° to 180 °. For the user using head-mounted display viewing video, these three angles have determined unique viewpoint position, work as user When rotating head, experiment shows yaw angleIt is most obvious relative to other two angle change, so being most difficult to predict.

In this example, it is primarily upon for yaw anglePrediction, the system structure of proposition can extend directly into The prediction at other two angle.It is defined according to angle, -180 ° and 179 ° 1 ° of differences rather than 359 ° are first in order to avoid this problem Choosing needs to convert pre- measuring angle, usesAs input, whereinBefore exporting prediction result, to the V for predicting to obtain_tInverse transformation is done, InWherein, V_tFor t moment yaw angleThe gained output vector after g functional transformation,When for t Carve yaw angleSine value,For t moment yaw angleCosine value,It is t moment to yaw angleFunctional transformation.

In the present embodiment, the Recognition with Recurrent Neural Network view prediction module uses current time viewpoint positionAs defeated Enter, predicts multiple moment yaw anglesWherein t_wIt is predicted time window,For t The value of+1 moment yaw angle,For t+t_wThe value of moment yaw angle.If user's head movement is slowly, one can choose Larger predicted time window t_w, otherwise predicted time window needs to be arranged a smaller value.In the training process, for each time I is walked, it willIt is encoded to 128 dimensional vector x_i, the value range of i is t to t+t_w-1.Then by x_iIt is input in Recognition with Recurrent Neural Network, Calculate hidden unit h_iAnd output unitFrom t to t+t_w- 1 each time step, is applied to following renewal equation:

h_i=σ₂(W_hxx_i+W_hhh_i-1+b_h) (2)

y_i=W_ohh_i+b_o (3)

Wherein, W_xvFor by yaw angleIt is encoded to 128 dimensional vector x_iThe weight matrix of process, W_hxFor input unit x_iTo hidden Hide unit h_iThe weight matrix of connection, W_hhFor the hidden unit h at i-1 moment_i-1To the hidden unit h at i moment_iThe weight of connection Matrix, W_ohFor hidden unit h_iTo output unit o_iThe weight matrix of connection, b_xFor the bias vector of cataloged procedure, b_hTo calculate Hidden unit h_iBias vector, b_oTo calculate output unit o_iBias vector.During the test, it is regarded using current time Point positionAs the input of first time iteration, other times are walked, using the prediction result of last iteration as next The input of secondary iteration, i.e.,σ₁And σ₂It is activation primitive, wherein σ₁It is line rectification function, σ₂It is tanh letter Number；For the predicted value of i+1 moment user viewpoint position,For to i moment yaw angleFunctional transformation value, h_i-1For The hidden unit at i-1 moment, g^-1(y_i) it is for exporting result y_iG function inverse transformation.

In the present embodiment, the correlation filter viewpoint tracking module is according to target following correlation filter algorithm, design Correlation filter have peak response for viewpoint region, using 360 degree of video frames of future time instance As input, made prediction based on video content to viewpoint position；Wherein: F_t+1For 360 degree of video frames of t+1 moment,For t +t_w360 degree of video frames of moment.Since target following correlation filter algorithm is mainly used for tracking the specific object in video, this The viewpoint tracked in embodiment is more abstract compared to specific object.Therefore, it is necessary to first use cylindrical equidistant projection mode by 360 degree The spherical image of video frame projects into flat image, and viewpoint corresponding region is relocated on flat image.Projection is obtained Flat image for, expanded horizontally close to the picture material of pole, corresponding viewpoint corresponding region is no longer rectangle, so A bounding box is set around viewpoint, redefines view region size and shape.Thus, it is possible to be predicted based on video content The bounding box of viewpoint out, to predict viewpoint position

In the present embodiment, the prediction of Recognition with Recurrent Neural Network view prediction module and correlation filter viewpoint tracking module is combined As a result, assigning different weights to obtain final prediction result, i.e., Wherein,It is final prediction result,WithIt is Recognition with Recurrent Neural Network view prediction mould respectively The prediction result of block and correlation filter viewpoint tracking module, ⊙ and by element multiplication, weight w₁And w₂Meet w₁+w₂=1, it adopts With making the final the smallest weighted value of viewpoint position predicted value error.Wherein, the correlation filter viewpoint tracking module, filter Wave device can not be updated, and the gap between viewpoint estimated value and true value is gradually increased with error accumulation, for big pre- Window is surveyed, the weight of the prediction result of correlation filter viewpoint tracking is gradually reduced.By the view sequence based on Recognition with Recurrent Neural Network Prediction module and viewpoint tracking system module based on correlation filter have complementary advantages, and improve prediction accuracy.

The setting of key parameter in the present embodiment are as follows: experimental data is from Y.Bao et al. in " IEEE International Conference on Big Data " entitled " Shooting a moving has been delivered in meeting The article of target:Motion-prediction-based transmission for 360-degree videos ", the number According to the head movement information acquired when 153 volunteers watch 16 sections of 360 degree of videos, part volunteer only watches part and regards Frequently, 985 viewing samples are acquired altogether.In the present embodiment data prediction, 10 times per second are carried out to each viewing sample and is adopted Sample, each viewing sample record 289 exercise datas altogether, 285665 exercise datas are obtained.Using 80% exercise data As training set, 20% exercise data is as test set.To the Recognition with Recurrent Neural Network module, hidden unit is sized to 256, using Adam (adaptive moments estimation) optimization method, momentum and weight decaying are respectively set to 0.8 and 0.999.It is of large quantities Small is 128, trains 500 periods altogether.Learning rate linear attenuation from 0.001 to 0.0001 during preceding 250 cycle training. To the correlation filter viewpoint tracking module, adjusting image size is 1800 × 900, and setting bounding box size is 10 × 10. For the Fusion Module, to w₁And w₂Different value is assigned, is selected so that final viewpoint position predicted value error is the smallest as most Whole weighted value.

The present invention is the needs for adapting to improve bandwidth availability ratio in 360 degree of transmission of video, is proposed based on user's past tense Carve the view sequence forecasting system of viewpoint position and 360 degree of video contents.The pre- geodesic structure of view sequence proposed by the present invention, can The viewpoint position of the following multiple moment users of prediction, and the sequence of prediction viewpoint can be changed according to user's head movement velocity Length has good practicability and scalability, has laid solid foundation for the high efficiency of transmission of 360 degree of videos.

Specific embodiments of the present invention are described above.It is to be appreciated that the invention is not limited to above-mentioned Particular implementation, those skilled in the art can make a variety of changes or modify within the scope of the claims, this not shadow Ring substantive content of the invention.In the absence of conflict, the feature in embodiments herein and embodiment can any phase Mutually combination.

Claims

1. a kind of user of 360 degree of transmission of video watches view sequence prediction technique characterized by comprising

Using the viewpoint position of user's last time as the input of view sequence prediction model, mould is predicted by the view sequence The viewpoint position at type prediction following multiple moment, the viewpoint position at multiple moment in future constitute the first view sequence；

By viewpoint trace model, using video content as the input of the viewpoint trace model, mould is tracked by the viewpoint The viewpoint position at type prediction following multiple moment, the viewpoint position at multiple moment in future constitute the second view sequence；

2. the user of 360 degree of transmission of video according to claim 1 watches view sequence prediction technique, which is characterized in that Using the viewpoint position of user's last time as the input of view sequence prediction model, pass through the view sequence prediction model Before the viewpoint position at prediction following multiple moment, further includes:

View sequence prediction model is constructed based on Recognition with Recurrent Neural Network；Wherein, the view sequence prediction model will be for that will input Viewpoint position coding after be input to Recognition with Recurrent Neural Network, the value of hidden unit and output unit is calculated, when learning user's difference Long-time dependence between the viewing viewpoint at quarter, the viewpoint position at output following multiple moment；The viewpoint position includes: to bow The elevation angle, yaw angle, roll angle unit circular projection, the variation range of the viewpoint position is -1 to 1；Using hyperbolic tangent function As the activation primitive of output unit, the activation primitive limits the output area of the viewpoint position.

3. the user of 360 degree of transmission of video according to claim 2 watches view sequence prediction technique, which is characterized in that It is described using the viewpoint position of user's last time as the input of view sequence prediction model, pass through the view sequence and predict mould The viewpoint position at type prediction following multiple moment, comprising:

Using the viewpoint position at user's current time as the input of the view sequence prediction model first time iteration, first is obtained The prediction viewpoint position of secondary iteration；

Circulation is obtained using the prediction viewpoint position of last iteration as the input of the view sequence prediction model next iteration To the prediction viewpoint position at following multiple moment.

4. the user of 360 degree of transmission of video according to claim 1 watches view sequence prediction technique, which is characterized in that The speed of head movement is related when the length of first view sequence is watched with user, and user's head movement velocity is slower, then The length of corresponding first view sequence is longer；User's head movement velocity is faster, then the length of corresponding first view sequence It is shorter.

5. the user of 360 degree of transmission of video according to claim 1 watches view sequence prediction technique, which is characterized in that Passing through the viewpoint trace model using video content as the input of the viewpoint trace model by viewpoint trace model Before the viewpoint position at prediction following multiple moment, further includes:

Viewpoint trace model is constructed according to the relevant algorithm filter of target following, wherein the relevant algorithm filter is Refer to: setting correlation filter, the correlation filter will form maximum response to the video area of viewpoint position.

6. the user of 360 degree of transmission of video according to claim 5 watches view sequence prediction technique, which is characterized in that It is described that viewpoint tracking mould is passed through using video content as the input of the viewpoint trace model by viewpoint trace model The viewpoint position at type prediction following multiple moment, comprising:

A bounding box is determined in the flat image by the viewpoint trace model, the region in the bounding box is View region determines corresponding viewpoint position according to the view region.

7. the user of 360 degree of transmission of video according to claim 1 to 6 watches view sequence prediction technique, It is characterized in that, first view sequence of combination and the second view sequence determine the viewing view sequence in user's future, comprising:

For the viewpoint position in the viewpoint position and second view sequence in first view sequence, difference is respectively set Weighted value w₁And w₂；And weight w₁And w₂Meet w₁+w₂=1；Wherein: weighted value w₁And w₂Setting need meet prediction use Following viewing viewpoint position in family and actual user watch the error minimum principle of viewpoint position；

According to weighted value w₁And w₂And first viewpoint position in view sequence, the viewpoint position in the second view sequence, meter Calculate the viewing view sequence in user's future；It is as follows in calculation formula:

Wherein:For the t+1 moment to t+t_wThe viewing viewpoint position in user's future moment, w₁For the first view sequence Weighted value,For the t+1 moment to t+t_wViewpoint position in the first view sequence of moment, w₂For the second view sequence Weighted value,For the t+1 moment to t+t_wViewpoint position in the second view sequence of moment, ⊙ indicate to transport by element multiplication It calculates, t is current time, t_wFor predicted time window.

8. the user of 360 degree of transmission of video according to claim 7 watches view sequence prediction technique, which is characterized in that with The increase of prediction time, the weight w for the second view sequence that the viewpoint trace model is predicted₂It is gradually reduced.