CN114173206A - Low-complexity viewpoint prediction method fusing user interest and behavior characteristics - Google Patents

Low-complexity viewpoint prediction method fusing user interest and behavior characteristics Download PDF

Info

Publication number
CN114173206A
CN114173206A CN202111510706.9A CN202111510706A CN114173206A CN 114173206 A CN114173206 A CN 114173206A CN 202111510706 A CN202111510706 A CN 202111510706A CN 114173206 A CN114173206 A CN 114173206A
Authority
CN
China
Prior art keywords
prediction
user
frame
viewpoint
block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111510706.9A
Other languages
Chinese (zh)
Other versions
CN114173206B (en
Inventor
邓瑞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shaanxi Normal University
Original Assignee
Shaanxi Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shaanxi Normal University filed Critical Shaanxi Normal University
Priority to CN202111510706.9A priority Critical patent/CN114173206B/en
Publication of CN114173206A publication Critical patent/CN114173206A/en
Application granted granted Critical
Publication of CN114173206B publication Critical patent/CN114173206B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/186Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/593Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/625Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using discrete cosine transform [DCT]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/442Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
    • H04N21/44204Monitoring of content usage, e.g. the number of times a movie has been viewed, copied or the amount which has been watched
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/466Learning process for intelligent management, e.g. learning user preferences for recommending movies
    • H04N21/4667Processing of monitored end-user data, e.g. trend analysis based on the log file of viewer selections
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/65Transmission of management data between client and server
    • H04N21/658Transmission by the client directed to the server
    • H04N21/6587Control parameters, e.g. trick play commands, viewpoint selection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/816Monomedia components thereof involving special video data, e.g 3D video
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Physics & Mathematics (AREA)
  • Discrete Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention discloses a low-complexity viewpoint predicting method fusing user interests and behavior characteristics, which comprises the steps of dividing a video to be subjected to viewpoint prediction into a plurality of video segments, and marking the sequence numbers of salient objects in the video segments by utilizing a video frame salient image of the video to be subjected to viewpoint prediction; acquiring the viewpoint staying time of the user who has watched the video on the salient object, classifying the users according to the viewpoint staying time, acquiring the interest model of the user of the same type according to the viewpoint staying time of the user of the same type, and combining the user interest model with the video frame saliency map to obtain an interest distribution map; the method comprises the steps of obtaining a user behavior model by utilizing the random motion of a user viewpoint and viewpoint feedback information of videos watched by a user historically, establishing a low-complexity viewpoint prediction model capable of accurately predicting the future user viewpoint position for a long time by comprehensively considering user interests and user behavior characteristics on the basis of analyzing video saliency characteristics, and predicting the user viewpoint position by utilizing the viewpoint prediction model.

Description

Low-complexity viewpoint prediction method fusing user interest and behavior characteristics
Technical Field
The invention belongs to the technical field of video processing, and particularly relates to a low-complexity viewpoint prediction method fusing user interest and behavior characteristics.
Background
Virtual Reality (VR) video is receiving wide attention and pursuit of people due to its immersive viewing experience and low-cost convenient viewing mode, and is one of the most rapidly developing online VR applications at present. According to the report of VR/AR industry released in 2016, the VR video service will have 5200 ten thousand users in 2020, accounting for 40% of all expected users in VR application field, and the VR video user group will reach 1 hundred million, 7 thousand and 4 million by 2025. However, the "high bit rate, low latency" feature of VR video poses a great challenge to network transmission. Especially in a mobile network, limited bandwidth resources and time-varying network transmission capability will seriously hinder the improvement of VR video user viewing experience.
The VR video covers 360-degree visual field angle, the horizontal visual field range of human eyes does not exceed 180 degrees generally, and the visual field angle which can be supported by VR terminal equipment (such as VR helmet) is only about 90-110 degrees. Therefore, in recent years, VR video adaptive transmission schemes based on video blocking are becoming hot spots and common consensus in academia and industry. According to the scheme, the VR video is divided into a plurality of video blocks according to the space, and the video blocks within the visual angle range are dynamically selected according to the viewpoint of the user for transmission, so that the requirement of the VR video on the network bandwidth can be reduced while good visual experience is ensured. In order to avoid the problems of picture delay, picture blocking or quality reduction and the like caused by transmission delay when the view points of the users are switched, a view point prediction technology is adopted to predict a new view point of the user at the next moment, and the video blocks in a new view angle range are pre-downloaded and pre-cached. Therefore, the accurate prediction of the user view point has an important significance for improving the user viewing experience.
Viewpoint prediction methods which are most mature and widely applied in the current stage of research can be roughly divided into two types: a prediction method based on motion estimation and a prediction method based on content analysis.
The prediction method based on motion estimation mainly predicts the future view angle position according to the historical browsing behavior of the user, but ignores the guiding effect of video content on the view point of the user, and utilizes the user characteristics which are limited to the recent motion of the head of the user, so that the view point prediction for a long time is difficult.
Although the prediction method based on content analysis improves the accuracy of long-term viewpoint prediction to a certain extent through video saliency characteristics or browsing content correlation analysis, the influence of different user characteristics (such as interests, habits, and behaviors) on viewpoint prediction is not deeply explored, and the internal rules of viewpoint changes of different users are difficult to accurately reflect. Meanwhile, the method has extremely high implementation complexity, consumes time, labor and money, and is difficult to use in VR video real-time communication. Such methods are specifically classified into two methods, one is to determine the area where the future viewpoint of the user is located by using strong correlation between the contents browsed by the user, and the other is to perform prediction based on the saliency features of the video. The saliency features reflect the degree to which the user is interested in the video content of various regions in the video. Generally, the stronger the saliency, the more interesting the user, and the higher the user viewing probability. At present, salient region extraction methods for static images are relatively mature, and therefore many salient detection methods for videos are based on existing image salient detection models and are extended to models which can be used for video salient detection by introducing motion features.
Since most videos transmitted and stored on the internet are compression-coded, recently, many scholars begin to explore how to obtain a video saliency detection algorithm in a compression domain to avoid a complex operation process caused by complete decoding, Xu et al calculate a motion saliency map by using the sum of absolute values of motion vectors, and adaptively fuse the motion saliency map with a static saliency map to obtain a final saliency map. Muthus wamy et al think that motion plays a decisive role in video saliency detection, and therefore, the final video saliency detection is achieved by modifying a still image saliency map with an accumulated time domain motion map and combining with the spatiotemporal similarity representing lens motion. In order to better utilize the motion vector to solve the motion saliency map, Fang et al respectively calculate the static saliency map and the motion saliency map of the P frame of the I frame by using a fixed gaussian weight DCT coefficient and a motion vector weighting method, and introduce a fusion rule of normalizing, summing and multiplying parameters to fuse the static saliency map and the motion saliency map of the P frame. By adopting the Gaussian weight for the motion vector, the performance of the algorithm is further improved. However, although the video saliency analysis based on the video compression domain greatly reduces the computational complexity, the accuracy is difficult to be effectively guaranteed because the intra-frame prediction coding mode is not considered.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a low-complexity viewpoint prediction method fusing user interests and behavior characteristics, which is used for establishing a low-complexity viewpoint prediction model capable of accurately predicting the viewpoint position of a future user for a long time by comprehensively considering personalized characteristics such as the user interests and the user behavior characteristics on the basis of analyzing the video saliency characteristics.
In order to achieve the purpose, the invention provides the following technical scheme: a low-complexity viewpoint prediction method fusing user interest and behavior characteristics comprises the following specific steps:
s1, acquiring a video frame saliency map of a viewpoint prediction video to be performed, wherein the video frame saliency map comprises an I frame saliency map and a P frame saliency map;
s2, dividing the video to be viewpoint predicted into several video segments, and marking the serial numbers of pi most significant objects in the video segments by using the video frame significant map;
s3 obtaining the viewpoint stay time of the user who has watched the video on the pi most significant objects, classifying the users according to the viewpoint stay time, obtaining the interest model of the user according to the viewpoint stay time of the same user on the pi most significant objects of the video, and combining the user interest model with the obtained video frame significance map to obtain the interest distribution map of each frame of the video;
s4, constructing a user behavior model by using the random motion of the user viewpoint and the viewpoint feedback information of the video watched by the user historically, and acquiring a user behavior distribution map reflecting the occurrence probability of the user viewpoint according to the user behavior model;
s5, combining the interest distribution map of the users of the same category with the user behavior distribution map to obtain a viewpoint prediction model, and predicting the viewpoint positions of the users by using the viewpoint prediction model.
Further, in step S1, the specific step of generating the I-frame saliency map is as follows:
s1.1, obtaining an intra-frame prediction coding mode and a residual DCT (discrete cosine transformation) coefficient to obtain a DC coefficient of an image block which is not subjected to direct DCT (discrete cosine transformation) transformation without prediction, wherein the DC coefficient is used for representing the brightness and color characteristics of the image block;
s1.2, acquiring a prediction direction corresponding to an intra-frame prediction coding mode, taking the prediction direction as the texture direction of an intra-frame prediction coding image block, and acquiring the texture intensity of an adjacent block similar to the texture direction of the intra-frame prediction coding image block, wherein the texture intensity is taken as the texture intensity of the intra-frame prediction coding image block;
s1.3, obtaining original pixel values of I _ PCM coded image blocks recovered from a compressed domain, and calculating DCT coefficients of the I _ PCM coded image blocks by using the pixel values, wherein DC coefficients in the DCT coefficients are used for expressing the brightness and the color of the I _ PCM coded image blocks, and AC coefficients in the DCT coefficients are used for expressing texture direction and intensity characteristics of the I _ PCM coded image blocks;
s1.4, constructing a motion vector set of the I frame image according to the coding mode and the motion vector of the previous and next P frame inter-frame predictive coding image blocks of the I frame inter-frame predictive coding image block and the time continuity of the viewpoint predictive video content to be carried out;
s1.5, respectively carrying out significance detection on the brightness, the color, the texture intensity, the texture direction and the motion characteristic of the acquired I frame image, and adaptively fusing the significance detection results into an I frame significance map;
the specific steps for generating the P frame saliency map are as follows:
s1.6, obtaining a motion vector of an inter-frame prediction coding image block in a compression domain, sorting and filling the motion vector, and establishing a complete motion vector set for each P frame;
s1.7, translating the significance characteristics of the image blocks in the I frame according to the indication of the motion vector by utilizing the time domain reference relationship between the P frame image blocks and the I frame image blocks in the inter-frame prediction coding process to obtain a P frame significance map.
Further, in step S1.1, intra-frame predictive coding is applied to the N × N image blocks i in the video to be view-point predicted, so that the DCT transform coefficients of the image blocks i
Figure BDA0003405234180000041
Can be calculated from equation (1-1):
Figure BDA0003405234180000042
in the formula (I), the compound is shown in the specification,
Figure BDA0003405234180000043
representing DCT coefficients of the intra prediction block corresponding to the image block i;
Figure BDA0003405234180000044
representing the DCT coefficient of the intra-frame prediction residual block corresponding to the image block i;
in which the DCT coefficients of the intra prediction residual block
Figure BDA0003405234180000045
Directly extracting DCT coefficients in a video compression domain for viewpoint prediction to be carried out;
DCT coefficient
Figure BDA0003405234180000046
Can be expressed by the formula (1-2):
Figure BDA0003405234180000051
in the formula
Figure BDA0003405234180000052
Representing the pixel value of an image block i at (x, y)
Figure BDA0003405234180000053
The intra prediction value of (1);
if define { si,qQ is 0,1, …, Q-1, and is the set of neighboring pixels used by the prediction image block to be coded and reconstructed, the intra prediction value for each pixel of the image block i
Figure BDA0003405234180000054
Calculated from equations (1-3):
Figure BDA0003405234180000055
wherein
Figure BDA0003405234180000056
Is a pixel si,qThe value of the pixel of (a) is,
Figure BDA0003405234180000057
representing a pixel si,qA corresponding prediction weight value;
definition of
Figure BDA0003405234180000058
J-0, 1,2, …, J being the encoded reconstructed neighboring pixel s used by the prediction image blocki,qQ is 0,1, …, Q-1, and the DC coefficients in the DCT coefficients of the block are predicted assuming equal pixel values for the same 4 x 4 block and equal to the average of all pixels in the entire 4 x 4 block
Figure BDA0003405234180000059
Can be obtained by calculation using the formulae (1-4), i.e.
Figure BDA00034052341800000510
u-0 and v-0 (1-4)
In the formula, wjAs the weight, the specific value is determined by the adopted prediction mode;
substituting the formula (1-4) into the formula (1-1) can represent the brightness of the image block iDC coefficient of degree, color characteristic
Figure BDA00034052341800000511
Can be calculated by the formula (1-5),
Figure BDA00034052341800000512
further, in step S1.1, the prediction pixels of the 4 x 4 partition are selected from the pixels S of the 4 neighboring blocks located at the upper left, upper right and left sides thereofi,0~si,12In selecting, order
Figure BDA00034052341800000513
i is 0,1, …, and 3 respectively represent DCT coefficients of the 4 neighboring blocks, then
Figure BDA00034052341800000514
Comprises the following steps:
1) when the prediction mode of the 4 x 4 block is 0,
Figure BDA00034052341800000515
from the pixel s of the adjacent block above iti,1~si,4Prediction is obtained, i.e.
Figure BDA00034052341800000516
Therefore, the temperature of the molten metal is controlled,
Figure BDA00034052341800000517
2) when the prediction mode of the 4 x 4 block is 1,
Figure BDA00034052341800000518
by the pixel s of the adjacent block to its lefti,9~si,12Prediction is obtained, i.e.
Figure BDA00034052341800000519
Therefore, the temperature of the molten metal is controlled,
Figure BDA0003405234180000061
3) when the prediction mode of the 4 x 4 block is 2,
Figure BDA0003405234180000062
by pixels s of adjacent blocks above and to the left of iti,1~si,4,si,9~si,12Prediction is obtained, i.e.
Figure BDA0003405234180000063
Wherein round (α) denotes rounding the value α;
therefore, the temperature of the molten metal is controlled,
Figure BDA0003405234180000064
if s isi,1~si,4Is absent, then
Figure BDA0003405234180000065
If s is9~s12Is absent, then
Figure BDA0003405234180000066
4) When the prediction mode of the 4 x 4 block is 3,
Figure BDA0003405234180000067
by the pixel s of the adjacent block above and to the upper righti,1~si,8Prediction is obtained, i.e.
Figure BDA0003405234180000068
Therefore, the temperature of the molten metal is controlled,
Figure BDA0003405234180000069
5) when the prediction mode of the 4 x 4 block is 4,
Figure BDA00034052341800000610
by pixels s of adjacent blocks located at the upper left, upper and left sides thereofi,0~si,4,si,9~si,12Prediction is obtained, i.e.
Figure BDA00034052341800000611
Therefore, the temperature of the molten metal is controlled,
Figure BDA0003405234180000071
6) when the prediction mode of the 4 x 4 block is 5,
Figure BDA0003405234180000072
by pixels s of adjacent blocks located at the upper left, upper and left sides thereofi,0~si,4,si,9~si,10Prediction is obtained, i.e.
Figure BDA0003405234180000073
Therefore, the temperature of the molten metal is controlled,
Figure BDA0003405234180000074
7) when the prediction mode of the 4 x 4 block is 6,
Figure BDA0003405234180000075
by pixels s of adjacent blocks located at the upper left, upper and left sides thereofi,0~si,3,si,9~si,12Predicted to obtain, i.e.
Figure BDA0003405234180000076
Therefore, the temperature of the molten metal is controlled,
Figure BDA0003405234180000077
8) when the prediction mode of the 4 x 4 block is 7,
Figure BDA0003405234180000078
by the pixel s of the adjacent block above and to the upper righti,1~si,7Predicted to obtain, i.e.
Figure BDA0003405234180000079
Therefore, the temperature of the molten metal is controlled,
Figure BDA0003405234180000081
9) when the prediction mode of the 4 x 4 block is 8,
Figure BDA0003405234180000082
by the pixel s of the adjacent block located at the upper right thereofi,9~si,12Predicted to obtain, i.e.
Figure BDA0003405234180000083
Therefore, the temperature of the molten metal is controlled,
Figure BDA0003405234180000084
further, in step S1.1, when intra prediction is performed based on 16 × 16 partitions, the DC coefficient of each 4 × 4 block in the 16 × 16 partitions is determined
Figure BDA0003405234180000085
Comprises the following steps:
1) when the prediction mode of 16 × 16 partition m is 0, the prediction mode of each 4 × 4 block
Figure BDA0003405234180000086
By pixels s of adjacent partitions above partition mm,1~sm,16Predicted to obtain, i.e.
Figure BDA0003405234180000087
Where mod (·,) represents a complementation operation, mod (i,4) returns the remainder of i divided by 4;
therefore, if sm,1~sm,16The DC coefficients of the 4 × 4 neighboring blocks are sequentially represented from left to right as
Figure BDA0003405234180000088
p=1,2,3,4;
Figure BDA0003405234180000089
2) When the prediction mode of 16 × 16 partition m is 1, for each 4 × 4 block i thereof
Figure BDA00034052341800000810
By pixels s of adjacent partitions to the left of partition mm,17~sm,32Predicted to obtain, i.e.
Figure BDA00034052341800000811
Wherein
Figure BDA00034052341800000812
Representing a rounding-down operation;
therefore, if sm,17~sm,32The DC coefficients of the 4 × 4 neighboring blocks are sequentially represented from left to right as
Figure BDA00034052341800000813
p=5,6,7,8,
Figure BDA0003405234180000091
3) When the prediction mode of 16 × 16 partition m is 2, for each 4 × 4 block i thereof
Figure BDA0003405234180000092
By pixels s of adjacent partitions above and to the left of partition mm,1~sm,32Predicted to obtain, i.e.
Figure BDA0003405234180000093
Therefore, the temperature of the molten metal is controlled,
Figure BDA0003405234180000094
if s ism,1~sm,16Is absent, then
Figure BDA0003405234180000095
If s isM,17~sM,32Is absent, then
Figure BDA0003405234180000096
4) When the prediction mode of 16 × 16 partition m is 3, for each 4 × 4 block i, it is determined
Figure BDA0003405234180000097
By pixels s of adjacent partitions above and to the left of partition mm,1~sm,32Predicted to obtain, i.e.
Figure BDA0003405234180000098
Wherein the content of the first and second substances,
i=0,1,…,15;x,y=0,1,2,3
Clip1(x)=min(255,max(0,x))
Figure BDA0003405234180000099
Figure BDA00034052341800000910
Figure BDA00034052341800000911
therefore, the temperature of the molten metal is controlled,
Figure BDA00034052341800000912
wherein
Figure BDA00034052341800000913
Figure BDA0003405234180000101
Weighting coefficient matrix
Figure BDA00034052341800001013
Is composed of
Figure BDA0003405234180000102
Further, in step S1.2, the texture intensity of the intra-prediction coded image block i is:
Figure BDA0003405234180000103
wherein N isi×NiRepresenting the partition size, N, of an image block ij×NjDenotes the partition size, T, of the neighboring block j that is closest to the texture direction of the image block i and has a higher prediction weightjIndicating the texture strength of the neighboring block j that is closest to the texture direction of the image block i and has a higher prediction weight.
Further, in step S1.3, the texture direction θ of the I _ PCM encoded image block I' isi′And intensity Ti′Expressed by AC coefficients among DCT coefficients, as shown in equations (1-24) and (1-25):
Figure BDA0003405234180000104
Figure BDA0003405234180000105
wherein N isi′×Ni′Represents the partition size of the I _ PCM encoded image block I';
Figure BDA0003405234180000106
u, v ═ 0,1,2,3 denote DCT coefficients obtained by DCT transformation of 4 × 4 using the original pixel values of the I _ PCM encoded image block I' restored in the compressed domain.
Further, in step S3, the interest model Int of the userlComprises the following steps:
Figure BDA0003405234180000107
wherein l is a user category, and users in the clustering center of each category are m respectively in turn1,m2,...,mL
Figure BDA0003405234180000108
For the user m who has watched the video to be detectedlIn a video segment p pi is the largestViewpoint dwell time on salient objects;
Figure BDA0003405234180000109
in the formula (I), the compound is shown in the specification,
Figure BDA00034052341800001010
a set of segments representing the video partition;
Figure BDA00034052341800001011
representing a set of positions of the region in which the salient object o is located in the video segment p,
Figure BDA00034052341800001012
represents user mlThe time of viewpoint stay on the salient object o of the video segment p;
obtaining the user interest degree at the f frame (x, y) according to the interest distribution map
Figure BDA0003405234180000111
Comprises the following steps:
Figure BDA0003405234180000112
in the formula (I), the compound is shown in the specification,
Figure BDA0003405234180000113
indicating the saliency at the f-th frame (x, y).
Further, in step S4, the "current" statistical model is used to describe the random motion of the user' S viewpoint, and the specific motion prediction equation is shown in the following formulas (1-29):
Figure BDA0003405234180000114
in the formula:
Figure BDA0003405234180000115
Figure BDA0003405234180000116
xf,
Figure BDA0003405234180000117
yf,
Figure BDA0003405234180000118
respectively representing the position, the speed and the acceleration of a viewpoint in the x-axis direction and the y-axis direction when a user watches the f-th frame;
Figure BDA0003405234180000119
respectively representing the average acceleration of the user viewpoint in the x-axis direction and the y-axis direction; alpha is the reciprocal of the maneuvering acceleration time constant, namely the maneuvering frequency;
the probability that the viewpoint is located at (x, y) when the user views the (f + δ) -th frame can be calculated by equations (1-30):
Figure BDA00034052341800001110
user behavior model ActkComprises the following steps:
Figure BDA00034052341800001111
and (3) calculating a user behavior distribution diagram reflecting the user viewpoint occurrence probability according to the formulas (1-29) and (1-30).
Further, in step S5, the user viewpoint position prediction is performed by the following formula (1-32):
Figure BDA00034052341800001112
in the formula (I), the compound is shown in the specification,
Figure BDA00034052341800001113
respectively representing the values of the user interest distribution diagram and the user behavior distribution diagram at the f + delta frame (x, y), wherein the function phi is a fusion function of the user interest distribution diagram and the user behavior distribution diagram.
Compared with the prior art, the invention has at least the following beneficial effects:
the invention provides a low-complexity viewpoint prediction method fusing user interest and behavior characteristics, which starts from video compression code streams, comprehensively considers the significance characteristics of video contents and the user interest and behavior characteristics, and designs a viewpoint prediction model capable of accurately predicting the future viewpoint position of a user with lower complexity. Specifically, an interest model of a user is constructed, and is combined with the video saliency characteristics to generate an interest distribution map capable of reflecting the interest degree of the user on each salient object of the VR video; estimating the probability of the future viewpoint appearing at different positions for the user according to the current viewpoint position of the user and the behavior characteristics (such as speed, acceleration, maneuvering frequency and the like) of the user; and integrating the user interest distribution and establishing a viewpoint prediction model capable of accurately predicting the long-term viewpoint change of the user. In addition, the significance analysis of the invention is based on video compression domain information, and the spatial correlation between adjacent blocks of video content and the time continuity between adjacent frames are utilized to carry out comprehensive significance analysis on the intra-frame prediction coding block on the basis of the prior art.
The video significance analysis based on the compression domain adopted by the invention can effectively reduce the calculation and implementation complexity, but is different from the problem that only the video significance analysis under the intra-frame non-prediction coding mode and the inter-frame prediction coding mode is concerned in the prior work, the method provided by the invention analyzes the significance of the brightness, the color and the texture of the intra-frame prediction block by utilizing the extracted prediction residual DCT coefficient, the spatial correlation of the video content and the prediction directionality of the intra-frame prediction mode, estimates the motion vector missing from the intra-frame coding block by combining the time continuity of the video content, and finally obtains the video significance result by self-adaptive fusion with other significance characteristics. Because the intra-frame prediction mode is considered, the method provided by the invention can effectively improve the video significance analysis accuracy based on the H.264\ AVC compressed code stream without increasing the calculation and implementation complexity.
The user characteristics are one of key factors influencing the viewpoint change of the user, but different from the prior work that only the influence of recent viewpoint motion of the user on the viewpoint is focused, the invention deeply explores the action mechanism between the user interest and behavior characteristics and the viewpoint of the user, and combines the user characteristics and the saliency characteristics of the video based on the compressed domain to establish a low-complexity viewpoint prediction model which takes the user as the core and the video content as the guide, thereby remarkably improving the accuracy of viewpoint prediction.
Drawings
FIG. 1 is a diagram of a technical route of a viewpoint prediction algorithm research integrating user interests and behavior characteristics;
fig. 2 is a schematic diagram of 4 x 4 block intra prediction;
fig. 3 is a flow chart of a video saliency detection algorithm based on compressed domain.
Detailed Description
The invention is further described with reference to the following figures and detailed description.
As shown in fig. 1, the present invention provides a low-complexity viewpoint prediction method fusing user interests and behavior characteristics, and the specific implementation steps include:
(1) video saliency detection based on compressed domain
The method comprises the following steps of performing significance detection on viewpoint prediction to be performed to obtain an I frame significance map and a P frame significance map, specifically:
1) extracting compressed domain information: and extracting the coding mode and residual DCT coefficients of the intra-frame prediction coding image block, the coding mode and motion vectors of the inter-frame prediction coding image block and the pixel values of the I-PCM coding image block from the video compression domain.
2) I-frame (intra-coded frame) saliency map generation:
estimating a DC coefficient of the image block after direct DCT (discrete cosine transform) conversion without prediction according to the coding mode of the intra-frame prediction coding image block obtained in the step 1) and the residual DCT coefficient so as to represent the brightness and color characteristics of the image block;
selecting a prediction direction corresponding to an intra-frame prediction mode as a texture direction of an intra-frame prediction coding image block, predicting the texture intensity of the image block by using the texture intensity of an adjacent block with the texture direction similar to that of the intra-frame prediction coding image block so as to represent the texture direction and intensity characteristics of the image block, calculating a DCT (discrete cosine transformation) coefficient of the image block by using an original pixel value of an I _ PCM (inter-frame pulse code) coding image block recovered from a compression domain, and estimating the brightness, the color, the texture direction and the intensity characteristics of the image block according to the DCT coefficient;
and according to the coding mode and the motion vector of the inter-frame predictive coding image block of the previous frame and the next frame of the I frame or the P frame extracted from the compressed domain, constructing a motion vector set based on 4 x 4 blocks for the I frame image by utilizing the time continuity of the video content so as to represent the motion characteristic of the I frame image.
And respectively carrying out significance detection on the brightness, the color, the texture intensity, the texture direction and the motion characteristics of the acquired I frame image, and adaptively fusing the brightness, the color, the texture intensity, the texture direction and the motion characteristics into an I frame significance map so as to comprehensively represent the significance characteristics of each object in the I frame.
3) P-frame (inter-coded frame) saliency map generation:
arranging and filling motion vectors of an inter-frame predictive coding image block extracted from a compressed domain, and establishing a complete motion vector set based on 4 x 4 blocks for each P frame;
and translating the salient features of the corresponding matched blocks in the previous I frame according to the motion vectors by utilizing a time domain reference relation provided by the P frame motion vector set, thereby obtaining a P frame salient map.
(2) Construction of user interest distribution map
5) Salient object segmentation and labeling: and dividing the whole video to be subjected to viewpoint prediction into a plurality of video segments. For each video clip, dividing the I-frame image into a plurality of salient objects according to the I-frame salient map information generated in the step (1), and marking the sequence numbers of pi most salient objects in the I-frame image according to the sequence of the saliency values from large to small.
And simultaneously, according to the time domain reference relation between the P frame image block and the I frame image block in the inter-frame prediction coding process, marking the salient object of the P frame image by using the salient object marking value which is referred to in the previous I frame.
Preferably, if the salient object of the I frame has the same or similar salient feature as or to the salient object of the previous P frame, the salient object of the I frame is preferentially marked according to the marking value of the salient object in the previous P frame.
6) And (3) viewpoint stay time statistics: using the historical user real viewpoint feedback information to count the viewpoint staying time of the user who has watched the video on the pi most significant objects in the video;
7) user classification based on interest similarity: and classifying the users by adopting a K-means clustering algorithm in machine learning according to the viewpoint staying time obtained by statistics, wherein the users in the same category have higher interest similarity than the users in different groups.
8) Obtaining an interest distribution map of each category of users: and (3) generating interest models of users of the same type according to the stay time of the users of the same type on the view points of the pi most significant objects of the video, and generating interest distribution maps of all frames of the video for the users of each type by combining the I frame saliency map and the P frame saliency map acquired in the step (1).
(3) User behavior profile prediction
9) Constructing a user behavior model: by using a modeling method of a mobile target for reference, the random motion of the viewpoint of a user is described by adopting the existing 'current' statistical model, and a user behavior model is constructed by utilizing the viewpoint feedback information of the user historically watching videos.
10) And (3) generating a user behavior distribution diagram: and calculating a user behavior distribution diagram reflecting the user viewpoint occurrence probability according to the user behavior model.
11) And (3) viewpoint prediction: and (4) predicting the viewpoint by combining the interest distribution map of the category where the user is located and the user behavior distribution map.
Example 1
(1) Video saliency detection based on compressed domain
Most of videos transmitted and stored on the internet are compressed and encoded, so that significance detection is directly carried out in a compressed domain, and a complex operation process caused by decoding can be avoided. The intensity, color and texture features of a static image are obtained by using DCT coefficients, the motion intensity is estimated by using motion vectors, and the data are subjected to significance detection and fusion, so that the method is the most effective method for detecting the significance of the compressed domain video at present. However, none of these methods considers the intra-frame prediction coding mode, and it is difficult to perform accurate significance detection on the compressed code stream adopting the h.264/AVC coding standard, which has become one of the mainstream compression coding standards at present.
1) Extraction of brightness and color characteristics of intra-frame prediction coding image block in I frame
The intra-frame prediction mode is an encoding mode which uses spatial correlation to predict a current block to be encoded by using adjacent pixels encoded in the same frame image and performs DCT transformation on a prediction residual. Therefore, for the intra-frame predictive coding block, the DCT coefficient directly extracted from the compressed domain can no longer be directly used to represent the luminance, color, and texture features of the original image block, and needs to be subjected to certain preprocessing, which is specifically as follows:
for an N x N image block i in a video, if the image block i adopts intra-frame prediction coding, DCT (discrete cosine transform) transformation coefficients of the image block i
Figure BDA0003405234180000151
Can be calculated from equation (1-1):
Figure BDA0003405234180000152
in the formula (I), the compound is shown in the specification,
Figure BDA0003405234180000161
representing DCT coefficients of the intra prediction block corresponding to the image block i;
Figure BDA0003405234180000162
and the DCT coefficients of the intra prediction residual block corresponding to the image block i are represented.
For intra-predictive coding blocks, the known encoder only performs a DCT transformation on intra-predictive residual blocks, and therefore from the video compression domainThe directly extracted DCT coefficient is the DCT coefficient of the intra-frame prediction residual block corresponding to the image block i
Figure BDA0003405234180000163
As can be seen from equation (1-1), the DCT transform coefficients of image block i are calculated
Figure BDA0003405234180000164
Only the DCT coefficient of the intra-frame prediction block corresponding to the image block i needs to be estimated
Figure BDA0003405234180000165
The value of (2) is sufficient.
According to the principle of the DCT transformation,
Figure BDA0003405234180000166
can be expressed in the form of equations (1-2).
Figure BDA0003405234180000167
In the formula
Figure BDA0003405234180000168
Representing the pixel value of an image block i at (x, y)
Figure BDA0003405234180000169
The intra prediction value of (1).
In general, an image block i is coded in intra prediction mode if and only if the current image block has a strong spatial correlation with its neighboring blocks, and therefore the predicted pixels of the image block i are weighted by their neighboring pixels. If define { si,qQ0, 1, …, Q-1 is the set of neighboring pixels of the encoded reconstruction used to predict the image block i, the intra prediction value of each pixel of the image block i
Figure BDA00034052341800001610
Calculated from equations (1-3):
Figure BDA00034052341800001611
wherein
Figure BDA00034052341800001612
Is a pixel si,qThe value of the pixel of (a) is,
Figure BDA00034052341800001613
representing a pixel si,qAnd (4) corresponding prediction weight values.
Definition of
Figure BDA00034052341800001614
J is 0,1,2, …, J being the encoded reconstructed neighboring pixel s used by the prediction image block ii,qQ is 0,1, …, Q-1, and the DC coefficients in the DCT coefficients of the block are predicted assuming equal pixel values for the same 4 x 4 block and equal to the average of all pixels in the entire 4 x 4 block
Figure BDA00034052341800001615
Can be composed of
Figure BDA00034052341800001616
Obtained by weighted summation, i.e.
Figure BDA00034052341800001617
In the formula, wjFor the weight value, the specific value depends only on the prediction mode adopted.
Substituting the formula (1-4) into the formula (1-1) can represent the DC coefficient of the brightness and color characteristics of the image block i
Figure BDA00034052341800001618
Can be calculated by the formula (1-5),
Figure BDA0003405234180000171
h.264/AVC supports intra prediction based on two partition sizes, 4 × 4 and 16 × 16, where 4 × 4 partitions have 9 optional prediction modes, each 4 × 4 block in a macroblock is predicted independently (as shown in fig. 2), and 16 × 16 partitions have 4 intra prediction modes, and the whole macroblock is predicted, which is suitable for image coding of flat regions.
Specifically for 4 x 4 partition based prediction pixels are from the pixels s of 4 neighboring blocks located at the top left, top right and left side thereofi,0~si,12In selecting, if order
Figure BDA0003405234180000172
i is 0,1, …, and 3 respectively represent DCT coefficients of the 4 neighboring blocks, then
Figure BDA0003405234180000173
The following formula can be used to calculate:
1) when the prediction mode of the 4 x 4 block is 0,
Figure BDA0003405234180000174
from the pixel s of the adjacent block above iti,1~si,4Prediction is obtained, i.e.
Figure BDA0003405234180000175
Therefore, the temperature of the molten metal is controlled,
Figure BDA0003405234180000176
2) when the prediction mode of the 4 x 4 block is 1,
Figure BDA0003405234180000177
by the pixel s of the adjacent block to its lefti,9~si,12Prediction is obtained, i.e.
Figure BDA0003405234180000178
Therefore, the temperature of the molten metal is controlled,
Figure BDA0003405234180000179
3) when the prediction mode of the 4 x 4 block is 2,
Figure BDA00034052341800001710
by pixels s of adjacent blocks above and to the left of iti,1~si,4,si,9~si,12Prediction is obtained, i.e.
Figure BDA00034052341800001711
Where round (α) means rounding off the value α.
Therefore, the temperature of the molten metal is controlled,
Figure BDA00034052341800001712
if s isi,1~si,4Is absent, then
Figure BDA00034052341800001713
If s is9~s12Is absent, then
Figure BDA00034052341800001714
4) When the prediction mode of the 4 x 4 block is 3,
Figure BDA0003405234180000181
by the pixel s of the adjacent block above and to the upper righti,1~si,8Prediction is obtained, i.e.
Figure BDA0003405234180000182
Thus, it is possible to provide
Figure BDA0003405234180000183
5) When the prediction mode of the 4 x 4 block is 4,
Figure BDA0003405234180000184
by pixels s of adjacent blocks located at the upper left, upper and left sides thereofi,0~si,4,si,9~si,12Prediction is obtained, i.e.
Figure BDA0003405234180000185
Thus, it is possible to provide
Figure BDA0003405234180000186
6) When the prediction mode of the 4 x 4 block is 5,
Figure BDA0003405234180000187
by pixels s of adjacent blocks located at the upper left, upper and left sides thereofi,0~si,4,si,9~si,10Prediction is obtained, i.e.
Figure BDA0003405234180000188
Therefore, the temperature of the molten metal is controlled,
Figure BDA0003405234180000189
7) when the prediction mode of the 4 x 4 block is 6,
Figure BDA00034052341800001810
by pixels s of adjacent blocks located at the upper left, upper and left sides thereofi,0~si,3,si,9~si,12Predicted to obtain, i.e.
Figure BDA0003405234180000191
Therefore, the temperature of the molten metal is controlled,
Figure BDA0003405234180000192
8) when the prediction mode of the 4 x 4 block is 7,
Figure BDA0003405234180000193
by the pixel s of the adjacent block above and to the upper righti,1~si,7Predicted to obtain, i.e.
Figure BDA0003405234180000194
Therefore, the temperature of the molten metal is controlled,
Figure BDA0003405234180000195
9) when the prediction mode of the 4 x 4 block is 8,
Figure BDA0003405234180000196
by the pixel s of the adjacent block located at the upper right thereofi,9~si,12Predicted to obtain, i.e.
Figure BDA0003405234180000197
Therefore, the temperature of the molten metal is controlled,
Figure BDA0003405234180000198
for intra prediction based on 16 × 16 partitions, the present invention uses a similar method to derive and build a corresponding strategy to estimate the DC coefficient of each 4 × 4 block i in a 16 × 16 partition
Figure BDA0003405234180000201
The specific calculation method is as follows to represent the brightness and color characteristics of the original image:
1) when the prediction mode of 16 × 16 partition m is 0, for each 4 × 4 block i, it is determined
Figure BDA0003405234180000202
By pixels s of adjacent partitions above partition mm,1~sm,16Predicted to obtain, i.e.
Figure BDA0003405234180000203
Where mod (,) represents the remainder of the complementation, mod (i,4) returns the remainder of i divided by 4.
Therefore, if sm,1~sm,16The DC coefficients of the 4 × 4 neighboring blocks are sequentially represented from left to right as
Figure BDA0003405234180000204
p=1,2,3,4,
Figure BDA0003405234180000205
2) When the prediction mode of 16 × 16 partition m is 1, for each 4 × 4 block i thereof
Figure BDA0003405234180000206
By pixels s of adjacent partitions to the left of partition mm,17~sm,32Predicted to obtain, i.e.
Figure BDA0003405234180000207
Wherein
Figure BDA0003405234180000208
Indicating a rounding down operation.
Therefore, if sm,17~sm,32The DC coefficients of the 4 × 4 neighboring blocks are sequentially represented from left to right as
Figure BDA0003405234180000209
p=5,6,7,8,
Figure BDA00034052341800002010
3) When the prediction mode of 16 × 16 partition m is 2, for each 4 × 4 block i thereof
Figure BDA00034052341800002011
By pixels s of adjacent partitions above and to the left of partition mm,1~sm,32Predicted to obtain, i.e.
Figure BDA00034052341800002012
Therefore, the temperature of the molten metal is controlled,
Figure BDA00034052341800002013
if s ism,1~sm,16Is absent, then
Figure BDA00034052341800002014
If s isM,17~sM,32Is absent, then
Figure BDA0003405234180000211
4) When the prediction mode of 16 × 16 partition m is 3, for each 4 × 4 block i, it is determined
Figure BDA0003405234180000212
By pixels s of adjacent partitions above and to the left of partition mm,1~sm,32Predicted to obtain, i.e.
Figure BDA0003405234180000213
Wherein the content of the first and second substances,
i=0,1,…,15;x,y=0,1,2,3
Clip1(x)=min(255,max(0,x))
Figure BDA0003405234180000214
Figure BDA0003405234180000215
Figure BDA0003405234180000216
therefore, the temperature of the molten metal is controlled,
Figure BDA0003405234180000217
wherein
Figure BDA0003405234180000218
Figure BDA0003405234180000219
Weighting coefficient matrix
Figure BDA00034052341800002111
Is composed of
Figure BDA00034052341800002110
2) Texture direction and intensity feature extraction of intra-frame prediction coding image block in I frame
And selecting the prediction direction corresponding to the intra-frame prediction coding mode as the texture direction of the image block i by utilizing the characteristic that the intra-frame prediction coding mode is closely related to the image texture information. If the texture direction of the neighboring block j is closest to the texture direction of the image block i and the prediction weight is higher, the texture intensity of the neighboring block j is used to predict the texture intensity of the image block i, i.e. the texture intensity of the neighboring block j is used to predict the texture intensity of the image block i
Figure BDA0003405234180000221
Wherein N isi×NiIndicating the partition size of the image block i. N is a radical ofj×NjDenotes the partition size, T, of the neighboring block j that is closest to the texture direction of the image block i and has a higher prediction weightjIndicating the texture strength of the neighboring block j that is closest to the texture direction of the image block i and has a higher prediction weight.
3) Extraction of brightness, color, texture direction and intensity characteristics of I _ PCM coded image block in I frame
And performing 4-by-4 DCT (discrete cosine transformation) on the image block by using the original pixel value of the I _ PCM encoded image block recovered from the compressed domain, and estimating the brightness, color, texture direction and intensity characteristics of the image block according to the obtained DCT coefficient. Wherein, the brightness and color characteristics of the I _ PCM coded image block I' are described by a DC coefficient in DCT coefficients, and the texture direction thetai′And intensity Ti′Calculated from the AC coefficients in the DCT coefficients, as shown in equations (1-24) and (1-25):
Figure BDA0003405234180000222
Figure BDA0003405234180000223
wherein N isi′×Ni′Representing the partition size of the I PCM encoded image block I'. Since smaller partition sizes are typically used for more texture-rich regions, a scale factor is introduced in equations (1-25)
Figure BDA0003405234180000224
To ensure that smaller sized partitions have greater texture strength.
4) I-frame image motion feature estimation
The motion features of the I-frame image are described by motion vectors. Because all image blocks in the I frame adopt an intra-frame coding mode, motion vectors cannot be directly extracted from a compressed code stream. Therefore, the motion vector of the I-frame image block needs to be interpolated from the motion vectors of the image blocks of the previous and next P-frames of the I-frame image block by fully utilizing the temporal continuity of the video content. Because the motion vector of the inter-frame prediction coding block in the P frame is directly extracted from a compressed domain, the motion vector is obtained from a coding angle, certain noise information exists, real motion characteristics are difficult to represent, and a real motion object is represented in each frame image in a region form, the motion vector needs to be preprocessed before interpolation, including a) motion vector filling, and from the perspective of spatial correlation, the motion vector of an adjacent block in the prediction direction is used for estimating the motion vector missing from the intra-frame prediction block; b) global motion filtering, namely eliminating global motion vectors to obtain motion vectors which can truly reflect the motion of the object; c) time-space domain amplitude filtering, which is used for filtering the isolated motion vector noise with smaller amplitude from the angle of time domain continuity and space correlation; d) time-space domain phase filtering, which is used for filtering the isolated motion vector noise with abrupt change of direction from the angle of phase consistency; e) and expanding the motion area, communicating the cavities of the motion area and improving the integrity of the motion object.
5) I-frame saliency map generation
And respectively carrying out significance detection on the image block brightness, the color, the texture intensity and direction and the motion characteristics obtained according to the steps 1) to 4) by adopting a center-surround operator, and adaptively fusing the significance detection results of the characteristics into an I frame significance map capable of representing the significance characteristics of each object in the I frame.
6) P-frame saliency map generation
For a P frame, a significance analysis is not performed independently for the P frame, but a time domain reference relation between an I frame image block and a P frame image block in an inter-frame prediction coding process is analyzed according to a motion vector of the P frame image block, and a significance characteristic of the I frame image is translated according to an indication of the motion vector to obtain a P frame significance map so as to reduce the calculation complexity. The overall algorithm flow is shown in fig. 3.
(2) Building user interest distribution map
Unlike ordinary video, VR video covers 360-degree field angle, and the scene is complex, usually includes a plurality of different features, different sizes of salient objects, and is distributed in different areas of the image. Therefore, the invention sets up a user interest model capable of reflecting the interest degree of the user on different salient objects by starting from the video salient map and combining with the viewpoint feedback information, and generates a corresponding user personalized interest distribution map for each frame of the video on the basis, thereby better guiding the accurate prediction of the subsequent viewpoint, and the specific thought is as follows:
2.1 dividing the whole video to be view-predicted into several video segments. For each video clip, dividing the I frame image into a plurality of salient objects according to the generated I frame salient image information, and marking the sequence numbers of the pi most salient objects in the I frame image according to the sequence of the salient values from large to small. And simultaneously, according to the time domain reference relation between the P frame and the I frame in the process of predictive coding of the P frame, marking the salient object of the P frame image by using the salient object marking value which is referred to in the previous I frame. If the salient object of the I frame has the same or similar salient object salient characteristics as the salient object of the previous P frame, the salient object of the I frame is preferentially marked according to the marking value of the salient object in the previous P frame.
2.2 for any segment p, using the user viewpoint feedback information to count the viewpoint stay time of the user k' who has watched the video on the pi most significant objects
Figure BDA0003405234180000241
Can be expressed in the following form:
Figure BDA0003405234180000242
in the formula (I), the compound is shown in the specification,
Figure BDA0003405234180000243
a set of segments representing the video partition;
Figure BDA0003405234180000244
representing a set of positions of the region in which the salient object o is located in the video segment p,
Figure BDA0003405234180000245
represents user mlThe time of viewpoint stay on the salient object o of the video segment p.
And 2.3, classifying the users by adopting a K-means clustering algorithm in machine learning according to the viewpoint staying time, so that the users in the same category have higher interest similarity than the users in different groups.
If the users are classified into L classes, the users in the centers of the classes are m in turn1,m2,...,mLThen the interest model Int of class I userslCan be described using equations (1-27).
Figure BDA0003405234180000246
2.4, predicting the category of a user k watching the video for the first time by using the interest similarity of the user when watching other videos, and generating an interest distribution graph of each frame for the user k according to the user interest model and the video significance of the category; estimating the user interest degree at the f frame (x, y) from the interest distribution map
Figure BDA0003405234180000247
Comprises the following steps:
Figure BDA0003405234180000248
in the formula (I), the compound is shown in the specification,
Figure BDA0003405234180000249
indicating the saliency at the f-th frame (x, y).
(3) User behavior profile prediction
While watching VR video, humans switch viewpoints by controlling head motion. Therefore, by taking the modeling method of the maneuvering target as a reference, the random motion of the viewpoint of the user is described by adopting a 'current' statistical model, and a specific motion prediction equation is shown as an equation (1-29):
Figure BDA0003405234180000251
in the formula:
Figure BDA0003405234180000252
Figure BDA0003405234180000253
xf,
Figure BDA0003405234180000254
yf,
Figure BDA0003405234180000255
respectively representing the position, the speed and the acceleration of a viewpoint in the x-axis direction and the y-axis direction when a user watches the f-th frame;
Figure BDA0003405234180000256
respectively representing the average acceleration of the user viewpoint in the x-axis direction and the y-axis direction; α is the reciprocal of the maneuvering acceleration time constant, i.e., the maneuvering frequency.
Due to the randomness, complexity and diversity of viewpoint motion, the situation of inaccurate prediction inevitably occurs when describing the motion state of the viewpoint motion by using the model. Due to the fact thatThe invention introduces two independent random variables exAnd eyDescribing the prediction error of the model to the viewpoint in the x-axis direction and the y-axis direction, and assuming that the mean value of the model is zero and the variance is respectively
Figure BDA0003405234180000257
Are distributed and independent of each other. Then, the probability that the viewpoint is located at (x, y) when the user views the (f + δ) -th frame can be calculated by equations (1-30).
Figure BDA0003405234180000258
Taking into account the parameter a required in the above analysis,
Figure BDA0003405234180000259
are all only related to user behavior characteristics, so we define a user behavior model ActkComprises the following steps:
Figure BDA00034052341800002510
and constructing by using the user viewpoint feedback information. After obtaining the user behavior model, the invention can utilize the user behavior distribution diagram obtained by calculation of the formulas (1-29) and (1-30) and reflecting the appearance probability of the user viewpoint.
(4) Viewpoint prediction
In practical applications, it is found that the user is more inclined to focus on the object which is both in line with the maneuver reality and interesting under the guidance of the selective visual attention mechanism and the inertia of the user behavior. Thus, the viewpoint position of the end user when viewing the f + δ -th frame
Figure BDA0003405234180000261
The prediction can be made from equations (1-32).
Figure BDA0003405234180000262
In the formula (I), the compound is shown in the specification,
Figure BDA0003405234180000263
respectively representing the values of the user interest distribution diagram and the user behavior distribution diagram at the f + delta frame (x, y), wherein the function phi is a fusion function of the user interest distribution diagram and the user behavior distribution diagram.
The "high bit rate and low delay" characteristics of VR video provide great challenges for network transmission. Especially in a mobile network, limited bandwidth resources and time-varying network transmission capability will seriously hinder the improvement of VR video user viewing experience. The VR video covers 360-degree visual field angle, the horizontal visual field range of human eyes does not exceed 180 degrees generally, and the visual field angle which can be supported by VR terminal equipment (such as VR helmet) is only about 90-110 degrees. Therefore, in recent years, VR video adaptive transmission schemes based on video blocking are becoming hot spots and common consensus in academia and industry. The invention divides the VR video into a plurality of video blocks according to the space and dynamically selects the video blocks within the visual angle range according to the viewpoint of the user for transmission, thereby reducing the requirement of the VR video on the network bandwidth while ensuring good visual experience. In order to avoid the problems of picture delay, picture blocking or quality reduction and the like caused by transmission delay when the view points of the users are switched, a view point prediction technology is adopted to predict a new view point of the user at the next moment, and the video blocks in a new view angle range are pre-downloaded and pre-cached. Therefore, the accurate prediction of the user view point has an important significance for improving the user viewing experience.

Claims (10)

1. A low-complexity viewpoint prediction method fusing user interest and behavior characteristics is characterized by comprising the following specific steps:
s1, acquiring a video frame saliency map of a viewpoint prediction video to be performed, wherein the video frame saliency map comprises an I frame saliency map and a P frame saliency map;
s2, dividing the video to be viewpoint predicted into several video segments, and marking the serial numbers of pi most significant objects in the video segments by using the video frame significant map;
s3 obtaining the viewpoint stay time of the user who has watched the video on the pi most significant objects, classifying the users according to the viewpoint stay time, obtaining the interest model of the user according to the viewpoint stay time of the same user on the pi most significant objects of the video, and combining the user interest model with the obtained video frame significance map to obtain the interest distribution map of each frame of the video;
s4, constructing a user behavior model by using the random motion of the user viewpoint and the viewpoint feedback information of the video watched by the user historically, and acquiring a user behavior distribution map reflecting the occurrence probability of the user viewpoint according to the user behavior model;
s5, combining the interest distribution map of the users of the same category with the user behavior distribution map to obtain a viewpoint prediction model, and predicting the viewpoint positions of the users by using the viewpoint prediction model.
2. The method for predicting low-complexity viewpoints by fusing user interests and behavior features as claimed in claim 1, wherein in step S1, the specific steps for generating the I-frame saliency map are as follows:
s1.1, obtaining an intra-frame prediction coding mode and a residual DCT (discrete cosine transformation) coefficient to obtain a DC coefficient of an image block which is not subjected to direct DCT (discrete cosine transformation) transformation without prediction, wherein the DC coefficient is used for representing the brightness and color characteristics of the image block;
s1.2, acquiring a prediction direction corresponding to an intra-frame prediction coding mode, taking the prediction direction as the texture direction of an intra-frame prediction coding image block, and acquiring the texture intensity of an adjacent block similar to the texture direction of the intra-frame prediction coding image block, wherein the texture intensity is taken as the texture intensity of the intra-frame prediction coding image block;
s1.3, obtaining original pixel values of I _ PCM coded image blocks recovered from a compressed domain, and calculating DCT coefficients of the I _ PCM coded image blocks by using the pixel values, wherein DC coefficients in the DCT coefficients are used for expressing the brightness and the color of the I _ PCM coded image blocks, and AC coefficients in the DCT coefficients are used for expressing texture direction and intensity characteristics of the I _ PCM coded image blocks;
s1.4, constructing a motion vector set of the I frame image according to the coding mode and the motion vector of the previous and next P frame inter-frame predictive coding image blocks of the I frame inter-frame predictive coding image block and the time continuity of the viewpoint predictive video content to be carried out;
s1.5, respectively carrying out significance detection on the brightness, the color, the texture intensity, the texture direction and the motion characteristic of the acquired I frame image, and adaptively fusing the significance detection results into an I frame significance map;
the specific steps for generating the P frame saliency map are as follows:
s1.6, obtaining a motion vector of an inter-frame prediction coding image block in a compression domain, sorting and filling the motion vector, and establishing a complete motion vector set for each P frame;
s1.7, translating the significance characteristics of the image blocks in the I frame according to the indication of the motion vector by utilizing the time domain reference relationship between the P frame image blocks and the I frame image blocks in the inter-frame prediction coding process to obtain a P frame significance map.
3. The method according to claim 2, wherein in step S1.1, intra-frame predictive coding is applied to the N x N image blocks i in the video to be view-predicted, so that DCT transform coefficients of the image blocks i are encoded by intra-frame predictive coding
Figure FDA0003405234170000021
Can be calculated from equation (1-1):
Figure FDA0003405234170000022
in the formula (I), the compound is shown in the specification,
Figure FDA0003405234170000023
representing DCT coefficients of the intra prediction block corresponding to the image block i;
Figure FDA0003405234170000024
representing the intra prediction residual corresponding to image block iDCT coefficients of the block;
in which the DCT coefficients of the intra prediction residual block
Figure FDA0003405234170000025
Directly extracting DCT coefficients in a video compression domain for viewpoint prediction to be carried out;
DCT coefficient
Figure FDA0003405234170000026
Can be expressed by the formula (1-2):
Figure FDA0003405234170000027
in the formula
Figure FDA0003405234170000028
Figure FDA0003405234170000029
Representing the pixel value of an image block i at (x, y)
Figure FDA00034052341700000210
The intra prediction value of (1);
if define { si,qQ is 0,1, …, Q-1, and is the set of neighboring pixels used by the prediction image block to be coded and reconstructed, the intra prediction value for each pixel of the image block i
Figure FDA0003405234170000031
Calculated from equations (1-3):
Figure FDA0003405234170000032
wherein
Figure FDA0003405234170000033
Is a pixel si,qThe value of the pixel of (a) is,
Figure FDA0003405234170000034
representing a pixel si,qA corresponding prediction weight value;
definition of
Figure FDA0003405234170000035
Coded reconstructed neighboring pixels s for use in predicting image blocksi,qAnd Q is 0,1, …, and Q-1, and assuming that the pixel values of the same 4 x 4 block are equal and equal to the average value of all pixels in the whole 4 x 4 block, the DC coefficients in the DCT coefficients of the prediction block are predicted
Figure FDA00034052341700000318
Can be obtained by calculation using the formulae (1-4), i.e.
Figure FDA0003405234170000036
In the formula, wjAs the weight, the specific value is determined by the adopted prediction mode;
substituting the formula (1-4) into the formula (1-1) can represent the DC coefficient of the brightness and color characteristics of the image block i
Figure FDA0003405234170000037
Can be calculated by the formula (1-5),
Figure FDA0003405234170000038
4. a low complexity viewpoint prediction method with fusion of user interest and behavior features as claimed in claim 3 wherein, in step S1.1, the predicted pixels of 4 x 4 partition are from the pixels S of 4 neighboring blocks at the top left, top right and left sides of the predicted pixelsi,0~si,12In selecting, order
Figure FDA0003405234170000039
Respectively represent the DCT coefficients of the 4 adjacent blocks, then
Figure FDA00034052341700000310
Comprises the following steps:
1) when the prediction mode of the 4 x 4 block is 0,
Figure FDA00034052341700000311
from the pixel s of the adjacent block above iti,1~si,4Prediction is obtained, i.e.
Figure FDA00034052341700000312
Therefore, the temperature of the molten metal is controlled,
Figure FDA00034052341700000313
2) when the prediction mode of the 4 x 4 block is 1,
Figure FDA00034052341700000314
by the pixel s of the adjacent block to its lefti,9~si,12Prediction is obtained, i.e.
Figure FDA00034052341700000315
Therefore, the temperature of the molten metal is controlled,
Figure FDA00034052341700000316
3) when the prediction mode of the 4 x 4 block is 2,
Figure FDA00034052341700000317
by pixels s of adjacent blocks above and to the left of iti,1~si,4,si,9~si,12Prediction is obtained, i.e.
Figure FDA0003405234170000041
Wherein round (α) denotes rounding the value α;
therefore, the temperature of the molten metal is controlled,
Figure FDA0003405234170000042
if s isi,1~si,4Is absent, then
Figure FDA0003405234170000043
If s is9~s12Is absent, then
Figure FDA0003405234170000044
4) When the prediction mode of the 4 x 4 block is 3,
Figure FDA0003405234170000045
by the pixel s of the adjacent block above and to the upper righti,1~si,8Prediction is obtained, i.e.
Figure FDA0003405234170000046
Therefore, the temperature of the molten metal is controlled,
Figure FDA0003405234170000047
5) when the prediction mode of the 4 x 4 block is 4,
Figure FDA0003405234170000048
by pixels s of adjacent blocks located at the upper left, upper and left sides thereofi,0~si,4,si,9~si,12Prediction is obtained, i.e.
Figure FDA0003405234170000049
Therefore, the temperature of the molten metal is controlled,
Figure FDA00034052341700000410
6) when the prediction mode of the 4 x 4 block is 5,
Figure FDA00034052341700000411
by pixels s of adjacent blocks located at the upper left, upper and left sides thereofi,0~si,4,si,9~si,10Prediction is obtained, i.e.
Figure FDA0003405234170000051
Therefore, the temperature of the molten metal is controlled,
Figure FDA0003405234170000052
7) when the prediction mode of the 4 x 4 block is 6,
Figure FDA0003405234170000053
by pixels s of adjacent blocks located at the upper left, upper and left sides thereofi,0~si,3,si,9~si,12Predicted to obtain, i.e.
Figure FDA0003405234170000054
Therefore, the temperature of the molten metal is controlled,
Figure FDA0003405234170000055
8) when the prediction mode of the 4 x 4 block is 7,
Figure FDA0003405234170000056
by the pixel s of the adjacent block above and to the upper righti,1~si,7Predicted to obtain, i.e.
Figure FDA0003405234170000057
Therefore, the temperature of the molten metal is controlled,
Figure FDA0003405234170000058
9) when the prediction mode of the 4 x 4 block is 8,
Figure FDA0003405234170000059
by the pixel s of the adjacent block located at the upper right thereofi,9~si,12Predicted to obtain, i.e.
Figure FDA0003405234170000061
Therefore, the temperature of the molten metal is controlled,
Figure FDA0003405234170000062
5. the method according to claim 3, wherein in step S1.1, when intra-frame prediction is performed based on 16 × 16 partitions, the DC coefficient of each 4 × 4 block in the 16 × 16 partitions is determined
Figure FDA0003405234170000063
Comprises the following steps:
1) when the prediction mode of 16 × 16 partition m is 0, the prediction mode of each 4 × 4 block
Figure FDA0003405234170000064
By pixels s of adjacent partitions above partition mm,1~sm,16Predicted to obtain, i.e.
Figure FDA0003405234170000065
Where mod (·,) represents a complementation operation, mod (i,4) returns the remainder of i divided by 4;
therefore, if sm,1~sm,16The DC coefficients of the 4 × 4 neighboring blocks are sequentially represented from left to right as
Figure FDA0003405234170000066
Figure FDA0003405234170000067
Figure FDA0003405234170000068
2) When the prediction mode of 16 × 16 partition m is 1, for each 4 × 4 block i thereof
Figure FDA0003405234170000069
By pixels s of adjacent partitions to the left of partition mm,17~sm,32Predicted to obtain, i.e.
Figure FDA00034052341700000610
Wherein
Figure FDA00034052341700000611
Representing a rounding-down operation;
therefore, if sm,17~sm,32The DC coefficients of the 4 × 4 neighboring blocks are sequentially represented from left to right as
Figure FDA00034052341700000612
Figure FDA00034052341700000613
Figure FDA00034052341700000614
3) When the prediction mode of 16 × 16 partition m is 2, for each 4 × 4 block i thereof
Figure FDA0003405234170000071
By pixels s of adjacent partitions above and to the left of partition mm,1~sm,32Predicted to obtain, i.e.
Figure FDA0003405234170000072
Therefore, the temperature of the molten metal is controlled,
Figure FDA0003405234170000073
if s ism,1~sm,16Is absent, then
Figure FDA0003405234170000074
If s isM,17~sM,32Is absent, then
Figure FDA0003405234170000075
4) When the prediction mode of 16 × 16 partition m is 3, for each 4 × 4 block i, it is determined
Figure FDA0003405234170000076
By pixels s of adjacent partitions above and to the left of partition mm,1~sm,32Predicted to obtain, i.e.
Figure FDA0003405234170000077
Wherein the content of the first and second substances,
i=0,1,…,15;x,y=0,1,2,3
Clip1(x)=min(255,max(0,x))
Figure FDA0003405234170000078
Figure FDA0003405234170000079
Figure FDA00034052341700000710
therefore, the temperature of the molten metal is controlled,
Figure FDA00034052341700000711
wherein
Figure FDA00034052341700000712
Figure FDA00034052341700000713
Weighting coefficient matrix
Figure FDA0003405234170000081
Is composed of
Figure FDA0003405234170000082
6. The method as claimed in claim 2, wherein in step S1.2, the texture intensity of the intra-prediction coded image block i is:
Figure FDA0003405234170000083
wherein N isi×NiRepresenting the partition size, N, of an image block ij×NjDenotes the partition size, T, of the neighboring block j that is closest to the texture direction of the image block i and has a higher prediction weightjIndicating the texture strength of the neighboring block j that is closest to the texture direction of the image block i and has a higher prediction weight.
7. The method as claimed in claim 2, wherein in step S1.3, the texture direction θ of the I _ PCM encoded image block I' is determined by the texture direction θi′And intensity Ti′Expressed by AC coefficients among DCT coefficients, as shown in equations (1-24) and (1-25):
Figure FDA0003405234170000084
Figure FDA0003405234170000085
wherein N isi′×Ni′Represents the partition size of the I _ PCM encoded image block I';
Figure FDA0003405234170000086
represents the DCT coefficient obtained by DCT transformation of 4 x 4 of the original pixel value of the I _ PCM coded image block I' restored by the compressed domain.
8. The method for predicting low-complexity viewpoint with fusion of user interest and behavior feature as claimed in claim 1, wherein in step S3, the interest model Int of the userlComprises the following steps:
Figure FDA0003405234170000087
wherein l is a user category, and users in the clustering center of each category are m respectively in turn1,m2,...,mL
Figure FDA0003405234170000088
For the user m who has watched the video to be detectedlView dwell time on the pi most salient objects in video segment p;
Figure FDA0003405234170000089
in the formula (I), the compound is shown in the specification,
Figure FDA0003405234170000091
a set of segments representing the video partition;
Figure FDA0003405234170000092
representing a set of positions of the region in which the salient object o is located in the video segment p,
Figure FDA0003405234170000093
represents user mlThe time of viewpoint stay on the salient object o of the video segment p;
according toThe interest distribution map obtains the user interest degree at the f frame (x, y)
Figure FDA0003405234170000094
Comprises the following steps:
Figure FDA0003405234170000095
in the formula (I), the compound is shown in the specification,
Figure FDA0003405234170000096
indicating the saliency at the f-th frame (x, y).
9. The method for predicting low-complexity viewpoint according to claim 1, wherein in step S4, the "current" statistical model is used to describe the random motion of the viewpoint of the user, and the specific motion prediction equation is shown in equations (1-29):
Figure FDA0003405234170000097
in the formula:
Figure FDA0003405234170000098
Figure FDA0003405234170000099
xf,
Figure FDA00034052341700000910
yf,
Figure FDA00034052341700000911
respectively indicate that the viewpoint of the user is on the x axis when the user watches the f-th framePosition, velocity, acceleration in the direction and y-axis direction;
Figure FDA00034052341700000912
respectively representing the average acceleration of the user viewpoint in the x-axis direction and the y-axis direction; alpha is the reciprocal of the maneuvering acceleration time constant, namely the maneuvering frequency;
the probability that the viewpoint is located at (x, y) when the user views the (f + δ) -th frame can be calculated by equations (1-30):
Figure FDA00034052341700000913
user behavior model ActkComprises the following steps:
Figure FDA00034052341700000914
and (3) calculating a user behavior distribution diagram reflecting the user viewpoint occurrence probability according to the formulas (1-29) and (1-30).
10. The method for predicting a low-complexity viewpoint with user interest and behavior feature combined according to claim 1, wherein in step S5, the user viewpoint location prediction is performed according to the following formula (1-32):
Figure FDA0003405234170000101
in the formula (I), the compound is shown in the specification,
Figure FDA0003405234170000102
respectively representing the values of the user interest distribution diagram and the user behavior distribution diagram at the f + delta frame (x, y), wherein the function phi is a fusion function of the user interest distribution diagram and the user behavior distribution diagram.
CN202111510706.9A 2021-12-10 2021-12-10 Low-complexity viewpoint prediction method integrating user interests and behavior characteristics Active CN114173206B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111510706.9A CN114173206B (en) 2021-12-10 2021-12-10 Low-complexity viewpoint prediction method integrating user interests and behavior characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111510706.9A CN114173206B (en) 2021-12-10 2021-12-10 Low-complexity viewpoint prediction method integrating user interests and behavior characteristics

Publications (2)

Publication Number Publication Date
CN114173206A true CN114173206A (en) 2022-03-11
CN114173206B CN114173206B (en) 2023-06-06

Family

ID=80485557

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111510706.9A Active CN114173206B (en) 2021-12-10 2021-12-10 Low-complexity viewpoint prediction method integrating user interests and behavior characteristics

Country Status (1)

Country Link
CN (1) CN114173206B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115103023A (en) * 2022-06-14 2022-09-23 北京字节跳动网络技术有限公司 Video caching method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018055509A (en) * 2016-09-29 2018-04-05 ファイフィット株式会社 Method of pre-treating composite finite element, method of analyzing composite material, analysis service system and computer readable recording medium
CN109101896A (en) * 2018-07-19 2018-12-28 电子科技大学 A kind of video behavior recognition methods based on temporal-spatial fusion feature and attention mechanism
CN111325124A (en) * 2020-02-05 2020-06-23 上海交通大学 Real-time man-machine interaction system under virtual scene
JP2020150519A (en) * 2019-03-15 2020-09-17 エヌ・ティ・ティ・コミュニケーションズ株式会社 Attention degree calculating device, attention degree calculating method and attention degree calculating program

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018055509A (en) * 2016-09-29 2018-04-05 ファイフィット株式会社 Method of pre-treating composite finite element, method of analyzing composite material, analysis service system and computer readable recording medium
CN109101896A (en) * 2018-07-19 2018-12-28 电子科技大学 A kind of video behavior recognition methods based on temporal-spatial fusion feature and attention mechanism
JP2020150519A (en) * 2019-03-15 2020-09-17 エヌ・ティ・ティ・コミュニケーションズ株式会社 Attention degree calculating device, attention degree calculating method and attention degree calculating program
CN111325124A (en) * 2020-02-05 2020-06-23 上海交通大学 Real-time man-machine interaction system under virtual scene

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MAI XU等: "Predicting head movement in panoramic video:a deep reinforcement learning approach", 《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》 *
张霁雯: "基于用户兴趣特征的微波信息传播预测方法研究", 《知网》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115103023A (en) * 2022-06-14 2022-09-23 北京字节跳动网络技术有限公司 Video caching method, device, equipment and storage medium
CN115103023B (en) * 2022-06-14 2024-04-05 北京字节跳动网络技术有限公司 Video caching method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN114173206B (en) 2023-06-06

Similar Documents

Publication Publication Date Title
CN110087087B (en) VVC inter-frame coding unit prediction mode early decision and block division early termination method
CN109309834B (en) Video compression method based on convolutional neural network and HEVC compression domain significant information
CN104378643B (en) A kind of 3D video depths image method for choosing frame inner forecast mode and system
CN108989802B (en) HEVC video stream quality estimation method and system by utilizing inter-frame relation
CN103618900B (en) Video area-of-interest exacting method based on coding information
CN111355956B (en) Deep learning-based rate distortion optimization rapid decision system and method in HEVC intra-frame coding
EP3343923B1 (en) Motion vector field coding method and decoding method, and coding and decoding apparatuses
CN103826125B (en) Concentration analysis method and device for compression monitor video
CN110852964A (en) Image bit enhancement method based on deep learning
CN105933711B (en) Neighborhood optimum probability video steganalysis method and system based on segmentation
CN111479110B (en) Fast affine motion estimation method for H.266/VVC
WO2016155070A1 (en) Method for acquiring adjacent disparity vectors in multi-texture multi-depth video
CN114745549B (en) Video coding method and system based on region of interest
CN112001308A (en) Lightweight behavior identification method adopting video compression technology and skeleton features
Liu et al. Fast depth intra coding based on depth edge classification network in 3D-HEVC
Fu et al. Efficient depth intra frame coding in 3D-HEVC by corner points
CN114173206B (en) Low-complexity viewpoint prediction method integrating user interests and behavior characteristics
CN106878754B (en) A kind of 3D video depth image method for choosing frame inner forecast mode
CN117176960A (en) Convolutional neural network chroma prediction coding method with multi-scale position information embedded
US20050259878A1 (en) Motion estimation algorithm
Zuo et al. Bi-layer texture discriminant fast depth intra coding for 3D-HEVC
Bachu et al. Adaptive order search and tangent-weighted trade-off for motion estimation in H. 264
CN107509074B (en) Self-adaptive 3D video compression coding and decoding method based on compressed sensing
Bocheck et al. Real-time estimation of subjective utility functions for MPEG-4 video objects
CN109982079B (en) Intra-frame prediction mode selection method combined with texture space correlation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant