CN110166850B - Method and system for predicting panoramic video watching position by multiple CNN networks - Google Patents

Method and system for predicting panoramic video watching position by multiple CNN networks Download PDF

Info

Publication number
CN110166850B
CN110166850B CN201910465138.1A CN201910465138A CN110166850B CN 110166850 B CN110166850 B CN 110166850B CN 201910465138 A CN201910465138 A CN 201910465138A CN 110166850 B CN110166850 B CN 110166850B
Authority
CN
China
Prior art keywords
video frame
panoramic video
saliency map
network
cnn
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910465138.1A
Other languages
Chinese (zh)
Other versions
CN110166850A (en
Inventor
宋利
李逍
解蓉
张文军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN201910465138.1A priority Critical patent/CN110166850B/en
Publication of CN110166850A publication Critical patent/CN110166850A/en
Application granted granted Critical
Publication of CN110166850B publication Critical patent/CN110166850B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/816Monomedia components thereof involving special video data, e.g 3D video

Abstract

The invention provides a method and a system for predicting the watching position of a panoramic video by a multiple CNN network, wherein the method comprises the following steps: based on the watching track of the previous period of time, a neural network method is used for predicting the watching point of the next moment; mapping a panoramic video frame at a moment to be predicted into small video frames in multiple directions, obtaining a corresponding saliency map of each small video frame through a first Convolutional Neural Network (CNN), merging the saliency maps into a saliency map of the whole video frame, and refining the saliency map of the whole video frame through a second Convolutional Neural Network (CNN) to obtain a saliency map of the panoramic video frame; and inputting the predicted viewing point and the panoramic video frame saliency map into a full-connection network to obtain a final predicted point, namely a panoramic video viewing position point. The invention comprehensively considers the problems of time continuity when watching the video and the mapping distortion of the panoramic video, and combines the time continuity and the mapping distortion to obtain the final optimal prediction point, thereby realizing higher prediction accuracy.

Description

Method and system for predicting panoramic video watching position by multiple CNN networks
Technical Field
The invention relates to a method for predicting the viewing position of a panoramic video, in particular to a method and a system for predicting the viewing position of the panoramic video based on a multiple convolutional neural network.
Background
In recent years, video traffic still occupies a large portion of the overall network traffic, and panoramic video has rapidly evolved due to its unique immersive experience. However, because the panoramic video data volume is large, the requirement on the network environment is quite high, and the current network basic configuration is not enough to transmit the huge information volume at all without adding certain preprocessing. Therefore, the problem to be solved is to reduce the amount of transmitted data, but at the same time, the video quality can be maintained as much as possible, and the video that the audience wants to watch can be transmitted in advance by predicting the watching position in the panoramic video and according to the existing panoramic video space blocking transmission protocol, such as MPEG-DASH and the like, so that the watching experience of the audience can be improved by higher prediction accuracy, and the limited network resources can be more fully and reasonably utilized.
The prediction of the view angle when watching the panoramic video has several problems which are difficult to solve, and different audiences can have great difference on the interested area of the same video; the same audience can watch different video content watching areas with great randomness; since the data size of a panoramic video is several times larger than that of a normal video, even if the same viewer watches the same video, the watching point at a certain moment has a large uncertainty.
In recent years, many view angle prediction methods for panoramic videos have been proposed, but most of the methods are considered to be not comprehensive enough, for example, some methods predict the view angle at the next moment according to the previous watching track, and the adopted specific methods include linear regression, neural network and the like, so that a certain accuracy can be achieved. The predictable range can be further reduced by considering the addition of the significant area of the video frame at the corresponding moment in the prediction, and the accuracy of the prediction should be correspondingly improved. Moreover, when the video is played, the time of scene switching and the like inevitably occurs in the content, and a large error exists in the prediction according to the previous track, so that the correction of the prediction area and the improvement of the prediction accuracy rate according to the salient area of the existing video frame are very important.
Disclosure of Invention
Aiming at the problem that the prediction accuracy rate of the panoramic video view angle area is not high enough in the prior art, the invention provides a method, a system and a terminal for predicting the panoramic video watching position based on a multiple Convolutional Neural Network (CNN).
In order to realize the purpose, the invention adopts the technical scheme that:
according to a first aspect of the present invention, there is provided a method for predicting a panoramic video viewing position by multiple CNN networks, the method comprising:
based on the watching track of the previous period of time, a neural network method is used for predicting the watching point of the next moment;
mapping the panoramic video frame into small video frames in multiple directions, obtaining a corresponding saliency map by each small video frame through a first Convolutional Neural Network (CNN), combining the saliency maps into a saliency map of the whole video frame, and refining the saliency map of the whole video frame through a second Convolutional Neural Network (CNN) to obtain a saliency map of the panoramic video frame;
and inputting the predicted viewing point and the panoramic video frame saliency map into a full-connection network to obtain a final predicted point, namely a panoramic video viewing position point.
Optionally, the neural network method uses an LSTM model, reads a viewing trajectory of a previous second, and inputs the trajectory into the LSTM model to predict a viewing point at a next time.
Optionally, the panoramic video frame at the time to be predicted is mapped into small video frames in multiple directions, wherein: and performing cube mapping on the panoramic video frame to be predicted to obtain small video frames in six directions, namely up, down, front, rear, left and right.
Optionally, each of the small video frames obtains a corresponding saliency map through a first convolutional neural network CNN, where: and (3) passing each small video frame through the trained VGG16 network to obtain a corresponding saliency map.
Optionally, the fully connected network is a two-layer fully connected network.
The invention designs a method for predicting the watching position of a panoramic video based on a multiple CNN network, which comprises the steps of firstly, obtaining a predicted point at a corresponding moment by using an LSTM network, and taking the watching habit of people and the distortion problem of cube mapping into consideration when analyzing a saliency map of a panoramic video frame; when the merged saliency map is obtained, corresponding distortion problems exist, so that the merged saliency map is passed through a second CNN network to obtain a final saliency map.
According to a second aspect of the present invention, there is provided a system for predicting a panoramic video viewing position by multiple CNN networks, comprising:
the neural network module is used for predicting a viewing point at the next moment by using a neural network method according to the viewing track of the previous period of time;
a mapping module that maps the panoramic video frame into small video frames in a plurality of directions;
a saliency map construction module, which obtains a corresponding saliency map from each small video frame through a first Convolutional Neural Network (CNN), merges the saliency maps into a saliency map of the whole video frame, and refines the saliency map of the whole video frame through a second Convolutional Neural Network (CNN) to obtain a saliency map of the panoramic video frame;
and the prediction module inputs the view point predicted by the neural network module and the panoramic video frame saliency map obtained by the saliency map construction module into a full-connection network to obtain a final predicted point, namely a panoramic video viewing position point.
Optionally, the mapping module, wherein:
and performing cube mapping on the panoramic video frame to be predicted to obtain small video frames in six directions, namely up, down, front, rear, left and right.
Optionally, the prediction module, wherein:
the fully connected network is a two-layer fully connected network.
According to a third aspect of the present invention, there is provided a terminal comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor is operable to execute the method for predicting a panoramic video viewing position by multiple CNN networks.
According to a fourth aspect of the present invention, there is provided a computer readable medium having stored thereon a computer program, characterized in that the computer program, when executed by a processor, implements the method for predicting a panoramic video viewing position by multiple CNN networks.
Compared with the prior art, the invention has the following beneficial effects:
according to the method, the system and the terminal, the distortion problem during panoramic video mapping is considered by watching the track, the saliency map of the panoramic video frame and the mapping distortion problem of the panoramic video, how to process the distortion problem is also considered, the prediction point obtained according to the track and the saliency map of the video frame are combined to obtain the final prediction point, and the prediction accuracy is effectively improved.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a flow chart of a method for panoramic video view prediction according to an embodiment of the present invention;
FIG. 2 is a representation of an original video frame, a mapped thumbnail, a saliency map of a thumbnail, and a merged saliency map in accordance with an embodiment of the present invention;
fig. 3 is a comparison of saliency maps obtained from the merged saliency map through second CNN network learning according to an embodiment of the present invention;
FIGS. 4a and 4b are the comparison of the accuracy of the LSTM plus saliency map versus the LSTM plus no saliency map for a 1 second prediction interval in an embodiment of the present invention;
fig. 5 is a block diagram of a system for panoramic video view prediction according to an embodiment of the present invention.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that variations and modifications can be made by persons skilled in the art without departing from the spirit of the invention. All falling within the scope of the present invention.
The invention comprehensively considers the watching track and the saliency map of the video frame when watching the panoramic video, and considers the mapped saliency map and the distortion problem when merging, and can realize higher prediction accuracy rate by applying the multiple CNN network compared with the traditional method.
Specifically, referring to fig. 1, a method for predicting a panoramic video viewing position based on a multiple CNN network in an embodiment of the present invention includes the following steps:
and S1, inputting the watching track of the previous period into an LSTM (Long Short-Time Memory) network, wherein the LSTM network has Memory capacity and better learning capacity for Time sequences, so that the predicted point of the next moment is obtained through the LSTM network.
S2, mapping the panoramic video frame into small video frames in multiple directions, obtaining a corresponding saliency map from each small video frame through a first Convolutional Neural Network (CNN), and combining the saliency maps into a saliency map of the whole video frame;
when a panoramic video is watched, the attention of the upper area and the lower area of the video is less, the attention of the upper area and the lower area is more middle area, and each area has its own saliency map, so that a panoramic video frame is mapped to obtain mapping maps in 6 directions of upper, lower, front, rear, left and right, the 6 mapping maps are respectively used for obtaining 6 corresponding saliency maps through a first CNN network, and then the 6 saliency maps are inversely mapped into a saliency map of the whole video frame, wherein the saliency map is a gray map.
S3, enabling the obtained saliency map of the whole video frame to pass through a second CNN network, and further obtaining a refined saliency map of the panoramic video frame;
the problem that the saliency map of the whole video frame has certain distortion and coverage at the splicing position during reflection is solved through the second CNN network.
And S4, inputting the predicted point at the next moment obtained in S1 and the saliency map of the panoramic video frame obtained in S3 into a two-layer full-connection network to obtain the final predicted point.
Referring to fig. 5, in correspondence to the method described above, an embodiment of the present invention further provides a system for predicting a panoramic video viewing position by using multiple CNN networks, where the system includes:
the neural network module is used for predicting a viewing point at the next moment by using a neural network method according to the viewing track of the previous period of time;
a mapping module that maps a panoramic video frame at a time to be predicted into small video frames in a plurality of directions;
the salient map building module is used for obtaining a corresponding salient map from each small video frame through a first convolutional neural network CNN, combining the salient maps into a salient map of the whole video frame, and refining the salient map of the whole video frame through a second convolutional neural network CNN to obtain a panoramic video frame salient map;
and the prediction module inputs the viewing point predicted by the neural network module and the panoramic video frame saliency map obtained by the saliency map construction module into a full-connection network to obtain a final predicted point, namely a panoramic video viewing position point.
Corresponding to the method, the embodiment of the invention also provides a terminal, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor can be used for executing the method for predicting the panoramic video viewing position by the multiple CNN networks when executing the program.
Corresponding to the above method, an embodiment of the present invention further provides a computer readable medium having a computer program stored thereon, which when executed by a processor, implements the method for predicting a panoramic video viewing position by multiple CNN networks.
How the above method of the present invention is implemented is illustrated by a specific embodiment, and the specific operation flow is shown in fig. 1:
inputting the previous watching track of one second into an LSTM network to predict a watching point of the next moment;
secondly, mapping the video frame into small images in 6 directions by using a cube mapping method, learning by using a first CNN network (VGG-16) to obtain a saliency map of the 6 small images, and merging the saliency maps of the obtained small images into a saliency map of the whole video frame.
Thirdly, the saliency map obtained above is subjected to distortion and superposition during merging, and then passes through a second CNN network to obtain a final effective video frame saliency map.
And fourthly, inputting the obtained view point at the next moment and the obtained final effective video frame saliency map as features into a two-layer full-connection network, and outputting a final predicted point.
Firstly, adopting LSTM network as the accurate method to predict the viewing point, secondly, obtaining the corresponding saliency map according to the characteristics of the panoramic video frame, and finally combining the two to obtain the final prediction point. The following describes the prediction of the view points and the corresponding data set by the LSTM method, then the acquisition of the saliency map, and finally how to obtain the final predicted points.
The LSTM method predicts the viewpoint:
assume that the next 0.1s viewing position is predicted with the previous second's viewing trajectory. The LSTM model is trained by using various types of panoramic videos in advance, and then a predicted position point P using the model is obtainedLSTM-0.1s
2. Obtaining a saliency map using a first CNN network
Firstly, video frames at corresponding time in a data set are extracted, namely, one video frame is extracted every 1 second, and then the video frames are mapped into mapping maps in 6 directions by using cube mapping. The first CNN model is to change the last pooling layer of the VGG-16 network into 4 convolutional layers so as to be able to better learn the key information of the whole image, and the CNN model also needs to be trained in advance by using a data set, then the 6 mapping images are respectively passed through the network to obtain corresponding saliency maps, and finally the saliency maps are combined into a saliency map of the whole video frame. A specific example is shown in fig. 2.
The cube mapping is adopted because when viewing a panoramic video, most of the viewing part of the viewer will concentrate on the middle area of the whole picture, and pay relatively little attention to the areas above and below. The cube mapping may take into account multiple directions, so this approach is adopted.
3. Obtaining an improved saliency map using a second CNN network, and obtaining a final predicted point by combining with the LSTM predicted point
The saliency maps obtained in the previous step have problems of distortion or coverage and the like during merging, so that the saliency maps are further combinedThe refined saliency map is obtained through a CNN network, and the network model also needs to be trained in advance. After obtaining the refined saliency map, adding PLSTM-0.1sCombined with the saliency map, is input into a two-layer fully connected network and the final predicted point is then obtained.
The two-layer fully-connected network can be directly realized by adopting the existing fully-connected network.
The following table summarizes the predicted accuracy of LSTM versus conventional methods, where the values represent the average error angle from the actual viewing value.
TABLE 1 prediction accuracy of the present method and the conventional method
Figure BDA0002079202350000071
Fig. 4a, 4b show the results of comparing the accuracy of LSTM plus saliency map with that of the 1 second prediction interval. Including results directly predicted from the saliency map, LSTM predicted results, plus one CNN network predicted results and plus two CNN network predicted results. The comparison shows that the current method has better prediction accuracy compared with the original method.
Fig. 3 is the result of coding the drivingmincity frame 7 under two methods in one embodiment of the present invention, which shows better visual quality and less blocking in some regions of interest, such as the shape of the car, compared to the "original method" above.
FIG. 4 shows the result of encoding Aeriological City 15 th frame in two ways according to an embodiment of the present invention, and it can be seen that the wall edge of the "original method" has been distorted, and the wall edge of the "current method" is continuous and has higher visual quality.
In summary, the above embodiments of the present invention comprehensively consider the time continuity when viewing the video and the mapping distortion problem of the panoramic video, and combine the two to obtain the final optimal prediction point, thereby achieving a higher prediction accuracy.
It should be noted that, the steps in the method provided by the present invention may be implemented by using corresponding modules, devices, units, and the like in the system, and those skilled in the art may refer to the technical solution of the system to implement the step flow of the method, that is, the embodiment in the system may be understood as a preferred example for implementing the method, and details are not described herein.
Those skilled in the art will appreciate that, in addition to implementing the system and its various devices provided by the present invention in purely computer readable program code means, the method steps can be fully programmed to implement the same functions by implementing the system and its various devices in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system and various devices thereof provided by the present invention can be regarded as a hardware component, and the devices included in the system and various devices thereof for realizing various functions can also be regarded as structures in the hardware component; means for performing the functions may also be regarded as structures within both software modules and hardware components for performing the methods.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes and modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention.

Claims (10)

1. A method for predicting the viewing position of a panoramic video by a multiple CNN network is characterized in that: the method comprises the following steps:
based on the watching track of the previous period of time, a neural network method is used for predicting the watching point of the next moment;
mapping the panoramic video frame into small video frames in multiple directions, obtaining a corresponding saliency map by each small video frame through a first Convolutional Neural Network (CNN), combining the saliency maps into a saliency map of the whole video frame, and refining the saliency map of the whole video frame through a second Convolutional Neural Network (CNN) to obtain a saliency map of the panoramic video frame; when a panoramic video is watched, the attention of the upper area and the lower area of the video is less, the attention of the upper area and the lower area is more middle area, each area has its own saliency map, the panoramic video frame is mapped to obtain mapping maps in 6 directions, namely, the upper direction, the lower direction, the front direction, the rear direction, the left direction and the right direction, the 6 mapping maps are respectively used for obtaining 6 corresponding saliency maps through a first CNN network, and then the 6 saliency maps are inversely mapped into a saliency map of the whole video frame, wherein the saliency map is a gray map;
and inputting the predicted viewing point and the panoramic video frame saliency map into a full-connection network to obtain a final predicted point, namely a panoramic video viewing position point.
2. The method for predicting a panoramic video viewing position by multiple CNN networks as claimed in claim 1, wherein: the neural network method adopts an LSTM model, reads the watching track of the previous second, and inputs the track into the LSTM model to predict and obtain the watching point of the next moment.
3. The method for predicting a panoramic video viewing position by multiple CNN networks as claimed in claim 1, wherein: the mapping of the panoramic video frame into small video frames in multiple directions, wherein:
and performing cube mapping on the panoramic video frame to be predicted to obtain small video frames in six directions, namely up, down, front, rear, left and right.
4. The method for predicting a panoramic video viewing position by multiple CNN networks as claimed in claim 1, wherein: and obtaining a corresponding saliency map of each small video frame through a first Convolutional Neural Network (CNN), wherein:
and (3) passing each small video frame through the trained VGG16 network to obtain a corresponding saliency map.
5. The method for predicting a panoramic video viewing position by multiple CNN networks as claimed in claim 1, wherein: the fully connected network is a two-layer fully connected network.
6. A system for predicting the viewing position of a panoramic video by a multiple CNN network is characterized in that: the method comprises the following steps:
the neural network module is used for predicting a viewing point at the next moment by using a neural network method according to the viewing track of the previous period of time;
a mapping module that maps a panoramic video frame at a time to be predicted into small video frames in a plurality of directions;
a saliency map construction module, which obtains a corresponding saliency map from each small video frame through a first Convolutional Neural Network (CNN), merges the saliency maps into a saliency map of the whole video frame, and refines the saliency map of the whole video frame through a second Convolutional Neural Network (CNN) to obtain a saliency map of the panoramic video frame; when a panoramic video is watched, the attention of the upper area and the lower area of the video is less, the attention of the upper area and the lower area is more middle area, each area has its own saliency map, the panoramic video frame is mapped to obtain mapping maps in 6 directions, namely, the upper direction, the lower direction, the front direction, the rear direction, the left direction and the right direction, the 6 mapping maps are respectively used for obtaining 6 corresponding saliency maps through a first CNN network, and then the 6 saliency maps are inversely mapped into a saliency map of the whole video frame, wherein the saliency map is a gray map;
and the prediction module inputs the view point predicted by the neural network module and the panoramic video frame saliency map obtained by the saliency map construction module into a full-connection network to obtain a final predicted point, namely a panoramic video viewing position point.
7. The system for multiple CNN network predictive panoramic video viewing location of claim 6, wherein: the mapping module, wherein:
and performing cube mapping on the panoramic video frame to be predicted to obtain small video frames in six directions, namely up, down, front, rear, left and right.
8. The system for multiple CNN network predictive panoramic video viewing location of claim 6, wherein: the prediction module, wherein:
the fully connected network is a two-layer fully connected network.
9. A terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor is operable to perform the method of any of claims 1 to 5 when executing the program.
10. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-5.
CN201910465138.1A 2019-05-30 2019-05-30 Method and system for predicting panoramic video watching position by multiple CNN networks Active CN110166850B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910465138.1A CN110166850B (en) 2019-05-30 2019-05-30 Method and system for predicting panoramic video watching position by multiple CNN networks

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910465138.1A CN110166850B (en) 2019-05-30 2019-05-30 Method and system for predicting panoramic video watching position by multiple CNN networks

Publications (2)

Publication Number Publication Date
CN110166850A CN110166850A (en) 2019-08-23
CN110166850B true CN110166850B (en) 2020-11-06

Family

ID=67630671

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910465138.1A Active CN110166850B (en) 2019-05-30 2019-05-30 Method and system for predicting panoramic video watching position by multiple CNN networks

Country Status (1)

Country Link
CN (1) CN110166850B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112468828B (en) * 2020-11-25 2022-06-17 深圳大学 Code rate distribution method and device for panoramic video, mobile terminal and storage medium
CN113329266B (en) * 2021-06-08 2022-07-05 合肥工业大学 Panoramic video self-adaptive transmission method based on limited user visual angle feedback
CN113949893A (en) * 2021-10-15 2022-01-18 中国联合网络通信集团有限公司 Live broadcast processing method and device, electronic equipment and readable storage medium
CN115022546B (en) * 2022-05-31 2023-11-14 咪咕视讯科技有限公司 Panoramic video transmission method, device, terminal equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020933A (en) * 2012-12-06 2013-04-03 天津师范大学 Multi-source image fusion method based on bionic visual mechanism
CN106157319A (en) * 2016-07-28 2016-11-23 哈尔滨工业大学 The significance detection method that region based on convolutional neural networks and Pixel-level merge
CN108462868A (en) * 2018-02-12 2018-08-28 叠境数字科技(上海)有限公司 The prediction technique of user's fixation point in 360 degree of panorama VR videos
CN108694471A (en) * 2018-06-11 2018-10-23 深圳市唯特视科技有限公司 A kind of user preference prediction technique based on personalized attention network
CN108765383A (en) * 2018-03-22 2018-11-06 山西大学 Video presentation method based on depth migration study
CN109784150A (en) * 2018-12-06 2019-05-21 东南大学 Video driving behavior recognition methods based on multitask space-time convolutional neural networks

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8502860B2 (en) * 2009-09-29 2013-08-06 Toyota Motor Engineering & Manufacturing North America (Tema) Electronic control system, electronic control unit and associated methodology of adapting 3D panoramic views of vehicle surroundings by predicting driver intent
CN105323552B (en) * 2015-10-26 2019-03-12 北京时代拓灵科技有限公司 A kind of panoramic video playback method and system
CN105915937B (en) * 2016-05-10 2019-12-13 上海乐相科技有限公司 Panoramic video playing method and device
US10547704B2 (en) * 2017-04-06 2020-01-28 Sony Interactive Entertainment Inc. Predictive bitrate selection for 360 video streaming
US10062414B1 (en) * 2017-08-22 2018-08-28 Futurewei Technologies, Inc. Determining a future field of view (FOV) for a particular user viewing a 360 degree video stream in a network
CN109257584B (en) * 2018-08-06 2020-03-10 上海交通大学 User watching viewpoint sequence prediction method for 360-degree video transmission

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020933A (en) * 2012-12-06 2013-04-03 天津师范大学 Multi-source image fusion method based on bionic visual mechanism
CN106157319A (en) * 2016-07-28 2016-11-23 哈尔滨工业大学 The significance detection method that region based on convolutional neural networks and Pixel-level merge
CN108462868A (en) * 2018-02-12 2018-08-28 叠境数字科技(上海)有限公司 The prediction technique of user's fixation point in 360 degree of panorama VR videos
CN108765383A (en) * 2018-03-22 2018-11-06 山西大学 Video presentation method based on depth migration study
CN108694471A (en) * 2018-06-11 2018-10-23 深圳市唯特视科技有限公司 A kind of user preference prediction technique based on personalized attention network
CN109784150A (en) * 2018-12-06 2019-05-21 东南大学 Video driving behavior recognition methods based on multitask space-time convolutional neural networks

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Predicting Video Saliency with Object-to-Motion CNN and Two-layer Convolutional LSTM;Lai Jiang 等;《https://arxiv.org/abs/1709.06316》;20190114;第2,6-12页、附图9 *

Also Published As

Publication number Publication date
CN110166850A (en) 2019-08-23

Similar Documents

Publication Publication Date Title
CN110166850B (en) Method and system for predicting panoramic video watching position by multiple CNN networks
CN109379550B (en) Convolutional neural network-based video frame rate up-conversion method and system
US10560725B2 (en) Aggregated region-based reduced bandwidth video streaming
CN108370416A (en) Output video is generated from video flowing
CN110049336B (en) Video encoding method and video decoding method
CN105144728B (en) By the restoring force that the partitioning lost is faced in the dynamic self-adapting stream transmission of HTTP
Yuan et al. Spatial and temporal consistency-aware dynamic adaptive streaming for 360-degree videos
Zhang et al. Adaptive streaming in interactive multiview video systems
CN109688407B (en) Reference block selection method and device for coding unit, electronic equipment and storage medium
CN113365156B (en) Panoramic video multicast stream view angle prediction method based on limited view field feedback
WO2021227704A1 (en) Image recognition method, video playback method, related device, and medium
CN104902279A (en) Video processing method and device
CN111402399A (en) Face driving and live broadcasting method and device, electronic equipment and storage medium
CN113965751B (en) Screen content coding method, device, equipment and storage medium
US11223662B2 (en) Method, system, and non-transitory computer readable record medium for enhancing video quality of video call
US20140082208A1 (en) Method and apparatus for multi-user content rendering
Li et al. A super-resolution flexible video coding solution for improving live streaming quality
CN105578110A (en) Video call method, device and system
CN105407313A (en) Video calling method, equipment and system
CA3182110A1 (en) Reinforcement learning based rate control
CN114157868B (en) Video frame coding mode screening method and device and electronic equipment
US20220327663A1 (en) Video Super-Resolution using Deep Neural Networks
CN111988520B (en) Picture switching method and device, electronic equipment and storage medium
CN113996056A (en) Data sending and receiving method of cloud game and related equipment
CN114363710A (en) Live broadcast watching method and device based on time shifting acceleration

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant