CN110166850B - Method and system for predicting panoramic video watching position by multiple CNN networks - Google Patents
Method and system for predicting panoramic video watching position by multiple CNN networks Download PDFInfo
- Publication number
- CN110166850B CN110166850B CN201910465138.1A CN201910465138A CN110166850B CN 110166850 B CN110166850 B CN 110166850B CN 201910465138 A CN201910465138 A CN 201910465138A CN 110166850 B CN110166850 B CN 110166850B
- Authority
- CN
- China
- Prior art keywords
- video frame
- panoramic video
- saliency map
- network
- cnn
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 60
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 69
- 238000013507 mapping Methods 0.000 claims abstract description 33
- 238000013528 artificial neural network Methods 0.000 claims abstract description 15
- 238000007670 refining Methods 0.000 claims abstract description 4
- 238000004590 computer program Methods 0.000 claims description 7
- 238000010276 construction Methods 0.000 claims description 5
- 230000006870 function Effects 0.000 description 3
- 230000000903 blocking effect Effects 0.000 description 2
- 238000007796 conventional method Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/44016—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/81—Monomedia components thereof
- H04N21/816—Monomedia components thereof involving special video data, e.g 3D video
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a method and a system for predicting the watching position of a panoramic video by a multiple CNN network, wherein the method comprises the following steps: based on the watching track of the previous period of time, a neural network method is used for predicting the watching point of the next moment; mapping a panoramic video frame at a moment to be predicted into small video frames in multiple directions, obtaining a corresponding saliency map of each small video frame through a first Convolutional Neural Network (CNN), merging the saliency maps into a saliency map of the whole video frame, and refining the saliency map of the whole video frame through a second Convolutional Neural Network (CNN) to obtain a saliency map of the panoramic video frame; and inputting the predicted viewing point and the panoramic video frame saliency map into a full-connection network to obtain a final predicted point, namely a panoramic video viewing position point. The invention comprehensively considers the problems of time continuity when watching the video and the mapping distortion of the panoramic video, and combines the time continuity and the mapping distortion to obtain the final optimal prediction point, thereby realizing higher prediction accuracy.
Description
Technical Field
The invention relates to a method for predicting the viewing position of a panoramic video, in particular to a method and a system for predicting the viewing position of the panoramic video based on a multiple convolutional neural network.
Background
In recent years, video traffic still occupies a large portion of the overall network traffic, and panoramic video has rapidly evolved due to its unique immersive experience. However, because the panoramic video data volume is large, the requirement on the network environment is quite high, and the current network basic configuration is not enough to transmit the huge information volume at all without adding certain preprocessing. Therefore, the problem to be solved is to reduce the amount of transmitted data, but at the same time, the video quality can be maintained as much as possible, and the video that the audience wants to watch can be transmitted in advance by predicting the watching position in the panoramic video and according to the existing panoramic video space blocking transmission protocol, such as MPEG-DASH and the like, so that the watching experience of the audience can be improved by higher prediction accuracy, and the limited network resources can be more fully and reasonably utilized.
The prediction of the view angle when watching the panoramic video has several problems which are difficult to solve, and different audiences can have great difference on the interested area of the same video; the same audience can watch different video content watching areas with great randomness; since the data size of a panoramic video is several times larger than that of a normal video, even if the same viewer watches the same video, the watching point at a certain moment has a large uncertainty.
In recent years, many view angle prediction methods for panoramic videos have been proposed, but most of the methods are considered to be not comprehensive enough, for example, some methods predict the view angle at the next moment according to the previous watching track, and the adopted specific methods include linear regression, neural network and the like, so that a certain accuracy can be achieved. The predictable range can be further reduced by considering the addition of the significant area of the video frame at the corresponding moment in the prediction, and the accuracy of the prediction should be correspondingly improved. Moreover, when the video is played, the time of scene switching and the like inevitably occurs in the content, and a large error exists in the prediction according to the previous track, so that the correction of the prediction area and the improvement of the prediction accuracy rate according to the salient area of the existing video frame are very important.
Disclosure of Invention
Aiming at the problem that the prediction accuracy rate of the panoramic video view angle area is not high enough in the prior art, the invention provides a method, a system and a terminal for predicting the panoramic video watching position based on a multiple Convolutional Neural Network (CNN).
In order to realize the purpose, the invention adopts the technical scheme that:
according to a first aspect of the present invention, there is provided a method for predicting a panoramic video viewing position by multiple CNN networks, the method comprising:
based on the watching track of the previous period of time, a neural network method is used for predicting the watching point of the next moment;
mapping the panoramic video frame into small video frames in multiple directions, obtaining a corresponding saliency map by each small video frame through a first Convolutional Neural Network (CNN), combining the saliency maps into a saliency map of the whole video frame, and refining the saliency map of the whole video frame through a second Convolutional Neural Network (CNN) to obtain a saliency map of the panoramic video frame;
and inputting the predicted viewing point and the panoramic video frame saliency map into a full-connection network to obtain a final predicted point, namely a panoramic video viewing position point.
Optionally, the neural network method uses an LSTM model, reads a viewing trajectory of a previous second, and inputs the trajectory into the LSTM model to predict a viewing point at a next time.
Optionally, the panoramic video frame at the time to be predicted is mapped into small video frames in multiple directions, wherein: and performing cube mapping on the panoramic video frame to be predicted to obtain small video frames in six directions, namely up, down, front, rear, left and right.
Optionally, each of the small video frames obtains a corresponding saliency map through a first convolutional neural network CNN, where: and (3) passing each small video frame through the trained VGG16 network to obtain a corresponding saliency map.
Optionally, the fully connected network is a two-layer fully connected network.
The invention designs a method for predicting the watching position of a panoramic video based on a multiple CNN network, which comprises the steps of firstly, obtaining a predicted point at a corresponding moment by using an LSTM network, and taking the watching habit of people and the distortion problem of cube mapping into consideration when analyzing a saliency map of a panoramic video frame; when the merged saliency map is obtained, corresponding distortion problems exist, so that the merged saliency map is passed through a second CNN network to obtain a final saliency map.
According to a second aspect of the present invention, there is provided a system for predicting a panoramic video viewing position by multiple CNN networks, comprising:
the neural network module is used for predicting a viewing point at the next moment by using a neural network method according to the viewing track of the previous period of time;
a mapping module that maps the panoramic video frame into small video frames in a plurality of directions;
a saliency map construction module, which obtains a corresponding saliency map from each small video frame through a first Convolutional Neural Network (CNN), merges the saliency maps into a saliency map of the whole video frame, and refines the saliency map of the whole video frame through a second Convolutional Neural Network (CNN) to obtain a saliency map of the panoramic video frame;
and the prediction module inputs the view point predicted by the neural network module and the panoramic video frame saliency map obtained by the saliency map construction module into a full-connection network to obtain a final predicted point, namely a panoramic video viewing position point.
Optionally, the mapping module, wherein:
and performing cube mapping on the panoramic video frame to be predicted to obtain small video frames in six directions, namely up, down, front, rear, left and right.
Optionally, the prediction module, wherein:
the fully connected network is a two-layer fully connected network.
According to a third aspect of the present invention, there is provided a terminal comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor is operable to execute the method for predicting a panoramic video viewing position by multiple CNN networks.
According to a fourth aspect of the present invention, there is provided a computer readable medium having stored thereon a computer program, characterized in that the computer program, when executed by a processor, implements the method for predicting a panoramic video viewing position by multiple CNN networks.
Compared with the prior art, the invention has the following beneficial effects:
according to the method, the system and the terminal, the distortion problem during panoramic video mapping is considered by watching the track, the saliency map of the panoramic video frame and the mapping distortion problem of the panoramic video, how to process the distortion problem is also considered, the prediction point obtained according to the track and the saliency map of the video frame are combined to obtain the final prediction point, and the prediction accuracy is effectively improved.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a flow chart of a method for panoramic video view prediction according to an embodiment of the present invention;
FIG. 2 is a representation of an original video frame, a mapped thumbnail, a saliency map of a thumbnail, and a merged saliency map in accordance with an embodiment of the present invention;
fig. 3 is a comparison of saliency maps obtained from the merged saliency map through second CNN network learning according to an embodiment of the present invention;
FIGS. 4a and 4b are the comparison of the accuracy of the LSTM plus saliency map versus the LSTM plus no saliency map for a 1 second prediction interval in an embodiment of the present invention;
fig. 5 is a block diagram of a system for panoramic video view prediction according to an embodiment of the present invention.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that variations and modifications can be made by persons skilled in the art without departing from the spirit of the invention. All falling within the scope of the present invention.
The invention comprehensively considers the watching track and the saliency map of the video frame when watching the panoramic video, and considers the mapped saliency map and the distortion problem when merging, and can realize higher prediction accuracy rate by applying the multiple CNN network compared with the traditional method.
Specifically, referring to fig. 1, a method for predicting a panoramic video viewing position based on a multiple CNN network in an embodiment of the present invention includes the following steps:
and S1, inputting the watching track of the previous period into an LSTM (Long Short-Time Memory) network, wherein the LSTM network has Memory capacity and better learning capacity for Time sequences, so that the predicted point of the next moment is obtained through the LSTM network.
S2, mapping the panoramic video frame into small video frames in multiple directions, obtaining a corresponding saliency map from each small video frame through a first Convolutional Neural Network (CNN), and combining the saliency maps into a saliency map of the whole video frame;
when a panoramic video is watched, the attention of the upper area and the lower area of the video is less, the attention of the upper area and the lower area is more middle area, and each area has its own saliency map, so that a panoramic video frame is mapped to obtain mapping maps in 6 directions of upper, lower, front, rear, left and right, the 6 mapping maps are respectively used for obtaining 6 corresponding saliency maps through a first CNN network, and then the 6 saliency maps are inversely mapped into a saliency map of the whole video frame, wherein the saliency map is a gray map.
S3, enabling the obtained saliency map of the whole video frame to pass through a second CNN network, and further obtaining a refined saliency map of the panoramic video frame;
the problem that the saliency map of the whole video frame has certain distortion and coverage at the splicing position during reflection is solved through the second CNN network.
And S4, inputting the predicted point at the next moment obtained in S1 and the saliency map of the panoramic video frame obtained in S3 into a two-layer full-connection network to obtain the final predicted point.
Referring to fig. 5, in correspondence to the method described above, an embodiment of the present invention further provides a system for predicting a panoramic video viewing position by using multiple CNN networks, where the system includes:
the neural network module is used for predicting a viewing point at the next moment by using a neural network method according to the viewing track of the previous period of time;
a mapping module that maps a panoramic video frame at a time to be predicted into small video frames in a plurality of directions;
the salient map building module is used for obtaining a corresponding salient map from each small video frame through a first convolutional neural network CNN, combining the salient maps into a salient map of the whole video frame, and refining the salient map of the whole video frame through a second convolutional neural network CNN to obtain a panoramic video frame salient map;
and the prediction module inputs the viewing point predicted by the neural network module and the panoramic video frame saliency map obtained by the saliency map construction module into a full-connection network to obtain a final predicted point, namely a panoramic video viewing position point.
Corresponding to the method, the embodiment of the invention also provides a terminal, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor can be used for executing the method for predicting the panoramic video viewing position by the multiple CNN networks when executing the program.
Corresponding to the above method, an embodiment of the present invention further provides a computer readable medium having a computer program stored thereon, which when executed by a processor, implements the method for predicting a panoramic video viewing position by multiple CNN networks.
How the above method of the present invention is implemented is illustrated by a specific embodiment, and the specific operation flow is shown in fig. 1:
inputting the previous watching track of one second into an LSTM network to predict a watching point of the next moment;
secondly, mapping the video frame into small images in 6 directions by using a cube mapping method, learning by using a first CNN network (VGG-16) to obtain a saliency map of the 6 small images, and merging the saliency maps of the obtained small images into a saliency map of the whole video frame.
Thirdly, the saliency map obtained above is subjected to distortion and superposition during merging, and then passes through a second CNN network to obtain a final effective video frame saliency map.
And fourthly, inputting the obtained view point at the next moment and the obtained final effective video frame saliency map as features into a two-layer full-connection network, and outputting a final predicted point.
Firstly, adopting LSTM network as the accurate method to predict the viewing point, secondly, obtaining the corresponding saliency map according to the characteristics of the panoramic video frame, and finally combining the two to obtain the final prediction point. The following describes the prediction of the view points and the corresponding data set by the LSTM method, then the acquisition of the saliency map, and finally how to obtain the final predicted points.
The LSTM method predicts the viewpoint:
assume that the next 0.1s viewing position is predicted with the previous second's viewing trajectory. The LSTM model is trained by using various types of panoramic videos in advance, and then a predicted position point P using the model is obtainedLSTM-0.1s。
2. Obtaining a saliency map using a first CNN network
Firstly, video frames at corresponding time in a data set are extracted, namely, one video frame is extracted every 1 second, and then the video frames are mapped into mapping maps in 6 directions by using cube mapping. The first CNN model is to change the last pooling layer of the VGG-16 network into 4 convolutional layers so as to be able to better learn the key information of the whole image, and the CNN model also needs to be trained in advance by using a data set, then the 6 mapping images are respectively passed through the network to obtain corresponding saliency maps, and finally the saliency maps are combined into a saliency map of the whole video frame. A specific example is shown in fig. 2.
The cube mapping is adopted because when viewing a panoramic video, most of the viewing part of the viewer will concentrate on the middle area of the whole picture, and pay relatively little attention to the areas above and below. The cube mapping may take into account multiple directions, so this approach is adopted.
3. Obtaining an improved saliency map using a second CNN network, and obtaining a final predicted point by combining with the LSTM predicted point
The saliency maps obtained in the previous step have problems of distortion or coverage and the like during merging, so that the saliency maps are further combinedThe refined saliency map is obtained through a CNN network, and the network model also needs to be trained in advance. After obtaining the refined saliency map, adding PLSTM-0.1sCombined with the saliency map, is input into a two-layer fully connected network and the final predicted point is then obtained.
The two-layer fully-connected network can be directly realized by adopting the existing fully-connected network.
The following table summarizes the predicted accuracy of LSTM versus conventional methods, where the values represent the average error angle from the actual viewing value.
TABLE 1 prediction accuracy of the present method and the conventional method
Fig. 4a, 4b show the results of comparing the accuracy of LSTM plus saliency map with that of the 1 second prediction interval. Including results directly predicted from the saliency map, LSTM predicted results, plus one CNN network predicted results and plus two CNN network predicted results. The comparison shows that the current method has better prediction accuracy compared with the original method.
Fig. 3 is the result of coding the drivingmincity frame 7 under two methods in one embodiment of the present invention, which shows better visual quality and less blocking in some regions of interest, such as the shape of the car, compared to the "original method" above.
FIG. 4 shows the result of encoding Aeriological City 15 th frame in two ways according to an embodiment of the present invention, and it can be seen that the wall edge of the "original method" has been distorted, and the wall edge of the "current method" is continuous and has higher visual quality.
In summary, the above embodiments of the present invention comprehensively consider the time continuity when viewing the video and the mapping distortion problem of the panoramic video, and combine the two to obtain the final optimal prediction point, thereby achieving a higher prediction accuracy.
It should be noted that, the steps in the method provided by the present invention may be implemented by using corresponding modules, devices, units, and the like in the system, and those skilled in the art may refer to the technical solution of the system to implement the step flow of the method, that is, the embodiment in the system may be understood as a preferred example for implementing the method, and details are not described herein.
Those skilled in the art will appreciate that, in addition to implementing the system and its various devices provided by the present invention in purely computer readable program code means, the method steps can be fully programmed to implement the same functions by implementing the system and its various devices in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system and various devices thereof provided by the present invention can be regarded as a hardware component, and the devices included in the system and various devices thereof for realizing various functions can also be regarded as structures in the hardware component; means for performing the functions may also be regarded as structures within both software modules and hardware components for performing the methods.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes and modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention.
Claims (10)
1. A method for predicting the viewing position of a panoramic video by a multiple CNN network is characterized in that: the method comprises the following steps:
based on the watching track of the previous period of time, a neural network method is used for predicting the watching point of the next moment;
mapping the panoramic video frame into small video frames in multiple directions, obtaining a corresponding saliency map by each small video frame through a first Convolutional Neural Network (CNN), combining the saliency maps into a saliency map of the whole video frame, and refining the saliency map of the whole video frame through a second Convolutional Neural Network (CNN) to obtain a saliency map of the panoramic video frame; when a panoramic video is watched, the attention of the upper area and the lower area of the video is less, the attention of the upper area and the lower area is more middle area, each area has its own saliency map, the panoramic video frame is mapped to obtain mapping maps in 6 directions, namely, the upper direction, the lower direction, the front direction, the rear direction, the left direction and the right direction, the 6 mapping maps are respectively used for obtaining 6 corresponding saliency maps through a first CNN network, and then the 6 saliency maps are inversely mapped into a saliency map of the whole video frame, wherein the saliency map is a gray map;
and inputting the predicted viewing point and the panoramic video frame saliency map into a full-connection network to obtain a final predicted point, namely a panoramic video viewing position point.
2. The method for predicting a panoramic video viewing position by multiple CNN networks as claimed in claim 1, wherein: the neural network method adopts an LSTM model, reads the watching track of the previous second, and inputs the track into the LSTM model to predict and obtain the watching point of the next moment.
3. The method for predicting a panoramic video viewing position by multiple CNN networks as claimed in claim 1, wherein: the mapping of the panoramic video frame into small video frames in multiple directions, wherein:
and performing cube mapping on the panoramic video frame to be predicted to obtain small video frames in six directions, namely up, down, front, rear, left and right.
4. The method for predicting a panoramic video viewing position by multiple CNN networks as claimed in claim 1, wherein: and obtaining a corresponding saliency map of each small video frame through a first Convolutional Neural Network (CNN), wherein:
and (3) passing each small video frame through the trained VGG16 network to obtain a corresponding saliency map.
5. The method for predicting a panoramic video viewing position by multiple CNN networks as claimed in claim 1, wherein: the fully connected network is a two-layer fully connected network.
6. A system for predicting the viewing position of a panoramic video by a multiple CNN network is characterized in that: the method comprises the following steps:
the neural network module is used for predicting a viewing point at the next moment by using a neural network method according to the viewing track of the previous period of time;
a mapping module that maps a panoramic video frame at a time to be predicted into small video frames in a plurality of directions;
a saliency map construction module, which obtains a corresponding saliency map from each small video frame through a first Convolutional Neural Network (CNN), merges the saliency maps into a saliency map of the whole video frame, and refines the saliency map of the whole video frame through a second Convolutional Neural Network (CNN) to obtain a saliency map of the panoramic video frame; when a panoramic video is watched, the attention of the upper area and the lower area of the video is less, the attention of the upper area and the lower area is more middle area, each area has its own saliency map, the panoramic video frame is mapped to obtain mapping maps in 6 directions, namely, the upper direction, the lower direction, the front direction, the rear direction, the left direction and the right direction, the 6 mapping maps are respectively used for obtaining 6 corresponding saliency maps through a first CNN network, and then the 6 saliency maps are inversely mapped into a saliency map of the whole video frame, wherein the saliency map is a gray map;
and the prediction module inputs the view point predicted by the neural network module and the panoramic video frame saliency map obtained by the saliency map construction module into a full-connection network to obtain a final predicted point, namely a panoramic video viewing position point.
7. The system for multiple CNN network predictive panoramic video viewing location of claim 6, wherein: the mapping module, wherein:
and performing cube mapping on the panoramic video frame to be predicted to obtain small video frames in six directions, namely up, down, front, rear, left and right.
8. The system for multiple CNN network predictive panoramic video viewing location of claim 6, wherein: the prediction module, wherein:
the fully connected network is a two-layer fully connected network.
9. A terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor is operable to perform the method of any of claims 1 to 5 when executing the program.
10. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910465138.1A CN110166850B (en) | 2019-05-30 | 2019-05-30 | Method and system for predicting panoramic video watching position by multiple CNN networks |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910465138.1A CN110166850B (en) | 2019-05-30 | 2019-05-30 | Method and system for predicting panoramic video watching position by multiple CNN networks |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110166850A CN110166850A (en) | 2019-08-23 |
CN110166850B true CN110166850B (en) | 2020-11-06 |
Family
ID=67630671
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910465138.1A Active CN110166850B (en) | 2019-05-30 | 2019-05-30 | Method and system for predicting panoramic video watching position by multiple CNN networks |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110166850B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112468828B (en) * | 2020-11-25 | 2022-06-17 | 深圳大学 | Code rate distribution method and device for panoramic video, mobile terminal and storage medium |
CN113329266B (en) * | 2021-06-08 | 2022-07-05 | 合肥工业大学 | Panoramic video self-adaptive transmission method based on limited user visual angle feedback |
CN113949893A (en) * | 2021-10-15 | 2022-01-18 | 中国联合网络通信集团有限公司 | Live broadcast processing method and device, electronic equipment and readable storage medium |
CN114979652A (en) * | 2022-05-20 | 2022-08-30 | 北京字节跳动网络技术有限公司 | Video processing method and device, electronic equipment and storage medium |
CN115022546B (en) * | 2022-05-31 | 2023-11-14 | 咪咕视讯科技有限公司 | Panoramic video transmission method, device, terminal equipment and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103020933A (en) * | 2012-12-06 | 2013-04-03 | 天津师范大学 | Multi-source image fusion method based on bionic visual mechanism |
CN106157319A (en) * | 2016-07-28 | 2016-11-23 | 哈尔滨工业大学 | The significance detection method that region based on convolutional neural networks and Pixel-level merge |
CN108462868A (en) * | 2018-02-12 | 2018-08-28 | 叠境数字科技(上海)有限公司 | The prediction technique of user's fixation point in 360 degree of panorama VR videos |
CN108694471A (en) * | 2018-06-11 | 2018-10-23 | 深圳市唯特视科技有限公司 | A kind of user preference prediction technique based on personalized attention network |
CN108765383A (en) * | 2018-03-22 | 2018-11-06 | 山西大学 | Video presentation method based on depth migration study |
CN109784150A (en) * | 2018-12-06 | 2019-05-21 | 东南大学 | Video driving behavior recognition methods based on multitask space-time convolutional neural networks |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8502860B2 (en) * | 2009-09-29 | 2013-08-06 | Toyota Motor Engineering & Manufacturing North America (Tema) | Electronic control system, electronic control unit and associated methodology of adapting 3D panoramic views of vehicle surroundings by predicting driver intent |
CN105323552B (en) * | 2015-10-26 | 2019-03-12 | 北京时代拓灵科技有限公司 | A kind of panoramic video playback method and system |
CN105915937B (en) * | 2016-05-10 | 2019-12-13 | 上海乐相科技有限公司 | Panoramic video playing method and device |
US10547704B2 (en) * | 2017-04-06 | 2020-01-28 | Sony Interactive Entertainment Inc. | Predictive bitrate selection for 360 video streaming |
US10062414B1 (en) * | 2017-08-22 | 2018-08-28 | Futurewei Technologies, Inc. | Determining a future field of view (FOV) for a particular user viewing a 360 degree video stream in a network |
CN109257584B (en) * | 2018-08-06 | 2020-03-10 | 上海交通大学 | User watching viewpoint sequence prediction method for 360-degree video transmission |
-
2019
- 2019-05-30 CN CN201910465138.1A patent/CN110166850B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103020933A (en) * | 2012-12-06 | 2013-04-03 | 天津师范大学 | Multi-source image fusion method based on bionic visual mechanism |
CN106157319A (en) * | 2016-07-28 | 2016-11-23 | 哈尔滨工业大学 | The significance detection method that region based on convolutional neural networks and Pixel-level merge |
CN108462868A (en) * | 2018-02-12 | 2018-08-28 | 叠境数字科技(上海)有限公司 | The prediction technique of user's fixation point in 360 degree of panorama VR videos |
CN108765383A (en) * | 2018-03-22 | 2018-11-06 | 山西大学 | Video presentation method based on depth migration study |
CN108694471A (en) * | 2018-06-11 | 2018-10-23 | 深圳市唯特视科技有限公司 | A kind of user preference prediction technique based on personalized attention network |
CN109784150A (en) * | 2018-12-06 | 2019-05-21 | 东南大学 | Video driving behavior recognition methods based on multitask space-time convolutional neural networks |
Non-Patent Citations (1)
Title |
---|
Predicting Video Saliency with Object-to-Motion CNN and Two-layer Convolutional LSTM;Lai Jiang 等;《https://arxiv.org/abs/1709.06316》;20190114;第2,6-12页、附图9 * |
Also Published As
Publication number | Publication date |
---|---|
CN110166850A (en) | 2019-08-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110166850B (en) | Method and system for predicting panoramic video watching position by multiple CNN networks | |
US12015787B2 (en) | Predicting and verifying regions of interest selections | |
US10560725B2 (en) | Aggregated region-based reduced bandwidth video streaming | |
CN108370416A (en) | Output video is generated from video flowing | |
CN110049336B (en) | Video encoding method and video decoding method | |
WO2021227704A1 (en) | Image recognition method, video playback method, related device, and medium | |
CN111402399A (en) | Face driving and live broadcasting method and device, electronic equipment and storage medium | |
CN109688407B (en) | Reference block selection method and device for coding unit, electronic equipment and storage medium | |
CN105144728A (en) | Resilience in the presence of missing media segments in dynamic adaptive streaming over http | |
US11159823B2 (en) | Multi-viewport transcoding for volumetric video streaming | |
CN113365156A (en) | Panoramic video multicast stream view angle prediction method based on limited view field feedback | |
US20140082208A1 (en) | Method and apparatus for multi-user content rendering | |
Li et al. | A super-resolution flexible video coding solution for improving live streaming quality | |
CA3182110A1 (en) | Reinforcement learning based rate control | |
CN105578110A (en) | Video call method, device and system | |
CN105407313A (en) | Video calling method, equipment and system | |
CN113747242A (en) | Image processing method, image processing device, electronic equipment and storage medium | |
Hu et al. | Mobile edge assisted live streaming system for omnidirectional video | |
CN114157868B (en) | Video frame coding mode screening method and device and electronic equipment | |
CN112399231A (en) | Playing method | |
CN111988520B (en) | Picture switching method and device, electronic equipment and storage medium | |
CN113996056A (en) | Data sending and receiving method of cloud game and related equipment | |
CN114363710A (en) | Live broadcast watching method and device based on time shifting acceleration | |
Li et al. | Perceptual quality assessment of face video compression: A benchmark and an effective method | |
CN115086665A (en) | Error code masking method, device, system, storage medium and computer equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |