CN111263225A

CN111263225A - Video stuck prediction method and device, computer equipment and storage medium

Info

Publication number: CN111263225A
Application number: CN202010018475.9A
Authority: CN
Inventors: 崔渊博; 李晓宵; 金红; 刘长永; 杨满智; 陈晓光
Original assignee: Eversec Beijing Technology Co Ltd
Current assignee: Eversec Beijing Technology Co Ltd
Priority date: 2020-01-08
Filing date: 2020-01-08
Publication date: 2020-06-09

Abstract

The embodiment of the invention discloses a video stuck prediction method, a video stuck prediction device, computer equipment and a storage medium. The method comprises the following steps: acquiring flow characteristics of a target video in a time period before a set moment to form a flow characteristic sequence; and inputting the flow characteristic sequence into a pre-trained stuck prediction model to obtain a stuck prediction result of the target video in a time interval after a set moment, wherein the time length of the time period is greater than or equal to the time length of the time interval. The embodiment of the invention can accurately predict the pause of the video in the next time interval.

Description

Video stuck prediction method and device, computer equipment and storage medium

Technical Field

The embodiment of the invention relates to the technical field of multimedia, in particular to a video stuck prediction method, a video stuck prediction device, computer equipment and a storage medium.

Background

Video jamming is a phenomenon often encountered by user devices playing video. The scheduling strategy and the configuration scheme can be adjusted by matching the video pause prediction result with a base station scheduling algorithm, so that smooth video playing is ensured, and the viewing experience of a user is improved.

At present, the judgment method of whether the video is stuck or not is as follows: the method comprises the steps of counting the receiving frame rate of main video flow in each unit time within a certain preset time period, calculating the variance of the receiving frame rate, and determining whether the video is blocked within the time period by judging whether the variance is larger than a blocking threshold value.

The pause judgment mode is to judge the pause condition of the video in a time period through the video frame rate variance in the time period, and whether the video in the next time period is paused or not cannot be predicted, and the real-time performance is not realized.

Disclosure of Invention

The embodiment of the invention provides a video pause prediction method, a video pause prediction device, computer equipment and a storage medium, which can accurately predict the pause of a video in the next time.

In a first aspect, an embodiment of the present invention provides a video stuck prediction method, including:

acquiring flow characteristics of a target video in a time period before a set moment to form a flow characteristic sequence;

and inputting the flow characteristic sequence into a pre-trained stuck prediction model to obtain a stuck prediction result of the target video in a time interval after a set moment, wherein the time length of the time period is greater than or equal to the time length of the time interval.

In a second aspect, an embodiment of the present invention further provides a video stuck prediction apparatus, including:

the flow characteristic sequence generation module is used for acquiring flow characteristics of the target video in a time period before a set moment to form a flow characteristic sequence;

and the video pause prediction module is used for inputting the flow characteristic sequence into a pre-trained pause prediction model to obtain a pause prediction result of the target video in a time interval after the set moment, wherein the duration of the time period is greater than or equal to the duration of the time interval.

In a third aspect, an embodiment of the present invention further provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the video katon prediction method according to any one of the embodiments of the present invention.

In a fourth aspect, the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the video kation prediction method according to any one of the embodiments of the present invention.

According to the embodiment of the invention, the flow characteristic of the video in a period of time before the set time is obtained, the flow characteristic sequence is formed and is input into the pre-trained stuck prediction model, the stuck prediction result of the video in a period of time after the set time can be obtained, the problem that the video stuck cannot be predicted in the prior art is solved, the video stuck in the future period of time is predicted, and the real-time performance of video stuck prediction is improved.

Drawings

Fig. 1a is a flowchart of a video stuck prediction method according to a first embodiment of the present invention;

FIG. 1b is a diagram of a neural network model according to a first embodiment of the present invention;

FIG. 1c is a diagram illustrating a long term memory network model according to an embodiment of the present invention;

FIG. 1d is a schematic diagram of a convolutional neural network model according to a first embodiment of the present invention;

FIG. 2 is a flowchart of a video stuck prediction method according to a second embodiment of the present invention;

fig. 3 is a schematic structural diagram of a video morton prediction apparatus according to a third embodiment of the present invention;

fig. 4 is a schematic structural diagram of a computer device in the fourth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Example one

Fig. 1a is a schematic diagram of a flowchart of a video stuck prediction method according to a first embodiment of the present invention, which may be applied to obtain traffic characteristics of a video in a period of time and predict whether a video is stuck in a next period of time. As shown in fig. 1a, the method of this embodiment specifically includes:

and S110, acquiring the flow characteristics of the target video in a time period before the set time to form a flow characteristic sequence.

The target video may refer to a video that may be played in real-time on a network (e.g., a video website), or may also be a pre-recorded video. Optionally, the target video is a video played in real time, and the set time is a current system time. The target video is a video played in real time, and the time is set as the current system time, so that whether the current system time is blocked in a future time interval or not can be quickly and accurately predicted.

The time period before the set time refers to a time period before the set time, wherein the time period may be set as needed, for example, the time period is 3 seconds.

When a user plays a video by using a video player, network traffic is generated. The traffic characteristics are generally used to describe the network condition within a preset time period, so as to reflect the network quality when the user terminal plays the video. Traffic characteristics may refer to characteristics of network traffic generated by playing video. The flow characteristic sequence comprises at least one or several of the following flow characteristics.

Optionally, the flow characteristics include at least one of: byte number per second, byte number per second uplink, byte number per second downlink, data packet number per second uplink, data packet number per second downlink, byte number per second jitter, byte number per second uplink jitter, byte number per second downlink jitter, number of packets per second uplink jitter, number of packets per second downlink jitter, maximum byte number per second uplink, maximum byte number per second downlink, maximum packet number per second uplink, maximum packet number per second downlink, minimum packet number per second uplink, minimum packet number per second downlink, total packet delay per second, average packet delay per second uplink, total packet delay per second uplink, average packet delay per second downlink, code rate, and video resolution. The uplink refers to a direction in which a local IP (Internet Protocol, Internet Protocol address) sends the number of bytes to the server IP. The downlink refers to the direction of sending byte number to the local IP by the server IP.

Specifically, the number of bytes per second (kb/s): total number of bytes transferred per second. Upstream bytes per second (up _ kb/s): the sum of the number of bytes sent by the local IP to the video server IP per second. Downstream bytes per second (down _ kb/s): the sum of the number of bytes sent by the video server IP per second to the local IP.

Number of packets per second (packets/s): number of data packets transmitted per second. Number of upstream packets per second (up _ packets/s): the local IP sends the sum of the number of packets per second to the video server IP. Number of packets per second (down _ packets/s) downstream: the sum of the number of packets (packets) per second sent by the video server IP to the local IP.

Number of jitters per second (shake _ kb/s): the difference between the number of bytes per second at the time of the request (s is the unit) and the number of bytes per second at the previous time. Number of upstream bytes per second jitter (up _ shake _ kb/s): the difference between the number of bytes per second in the upstream at the request time (s is the unit) and the number of bytes per second in the upstream at the previous time. Number of bytes per second jittered downstream (down _ shake _ kb/s): the difference between the number of bytes per second in the downstream of the request time (s is unit) and the number of bytes per second in the downstream of the previous time.

Number of jitter packets per second (shake _ packets/s): the difference between the number of packets per second at the requested time (s in units) and the number of packets per second at the previous time. Number of upstream jitter packets per second (up _ shake _ packets/s): the difference between the number of packets per second in the upstream at the requested time (s in units) and the number of packets per second in the upstream at the previous time. Number of downstream per second jitter packets (down _ shake _ packets/s): the difference between the number of packets per second in downstream at the requested time (s is the unit) and the number of packets per second in downstream at the previous time.

Maximum number of bytes per second (max _ kb/s): recording the initial byte number per second (kb/s) at the initial moment as the maximum byte number per second, if the byte number per second counted at any current moment is larger than the maximum byte number per second counted in history, updating, otherwise, keeping unchanged.

Maximum number of bytes per second (up _ max _ kb/s) upstream: recording the number of bytes per second (up _ kb/s) of the uplink at the initial moment as the maximum number of bytes per second of the uplink, if the number of bytes per second of the uplink counted at any current moment is larger than the maximum number of bytes per second of the uplink counted historically, updating, and otherwise, keeping the number of bytes per second of the uplink unchanged.

Maximum number of bytes per second (down _ max _ kb/s) of downlink: firstly recording the number of bytes per second (down _ kb/s) of downlink at the initial moment as the maximum number of bytes per second of downlink, if the number of bytes per second of downlink counted at any current moment is larger than the maximum number of bytes per second of downlink counted in history, updating, otherwise, keeping the number of bytes per second of downlink unchanged.

Maximum number of packets per second (max _ packets/s): firstly, recording the number of packets per second (packets/s) at the initial moment as the maximum number of packets per second, if the number of packets per second counted at any current moment is larger than the maximum number of packets per second counted in the history, updating, and otherwise, keeping the number of packets per second unchanged.

Maximum number of packets per second upstream (up _ max _ packets/s): the uplink packet number per second (up _ packets/s) at the initial moment is recorded as the maximum uplink packet number per second, if the uplink packet number per second counted at any current moment is larger than the uplink packet number per second counted in the history, the updating is carried out, and if not, the updating is kept unchanged.

Maximum number of packets per second (down _ max _ packets/s): the downlink packet number per second (down _ packets/s) at the initial time is recorded as the maximum downlink packet number per second, if the downlink packet number per second counted at any current time is larger than the maximum downlink packet number per second counted in the history, the update is carried out, otherwise, the update is kept unchanged.

Minimum number of packets per second (min _ packets/s): and recording the number of packets per second (up _ packets/s) at the initial moment as the minimum number of packets per second, if the number of packets per second counted at any current moment is less than the minimum number of packets per second counted in the history, updating, and otherwise, keeping the number of packets per second unchanged.

Minimum number of packets per second upstream (up _ min _ packets/s): and recording the uplink packet number per second (up _ packets/s) at the initial time as the minimum uplink packet number per second, if the uplink packet number per second counted at any current time is less than the uplink packet number per second counted in the history, updating, and otherwise, keeping unchanged.

Downlink minimum number of packets per second (down _ min _ packets/s): and recording the downlink packet number per second (down _ packets/s) at the initial time as the minimum downlink packet number per second, if the downlink packet number per second counted at any current time is less than the maximum downlink packet number per second counted in the history, updating, and otherwise, keeping unchanged.

Sum of inter-packet delay per second (latency _ time) the sum of the inter-packet delays for all packets transmitted per second at the current time (s in units).

Sum of uplink inter-packet delay per second (up _ latency _ time): the sum of the inter-packet delays for all packets transmitted upstream per second at the current time (s is the unit).

Downlink inter-packet delay sum per second (down _ latency _ time): the sum of the inter-packet delays for all packets transmitted downstream per second at the current time (s is the unit).

Average delay per second (avg _ decay _ time): the current time (s is the unit) is the average value of the inter-packet delay for transmitting all packets per second.

Average inter-packet delay per second (avg _ up _ decay _ time): the average value of the inter-packet delay of all the packets transmitted upstream every second at the current time (s is a unit).

Average inter-packet delay per second (avg _ down _ latency _ time): the average value of the inter-packet delay of all packets transmitted downstream every second at the current time (s is a unit).

Code rate (code ratio): and code rate of video transmission in the video playing process.

Video resolution (Video resolution): the resolution of the video can be analyzed by fixed encoding of the slice header of the transmitted TS (Transport Stream).

The request time may be a time when a get request is locally sent to the target video server. The flow characteristics are obtained by analyzing a data packet generated by capturing a video player playing target video on a network, wherein the data packet is in a pcap format. Specifically, the local IP address and the server IP address in the data packet may be obtained through the source IP and the destination IP of the get request.

By acquiring the diversified flow characteristics for predicting the video blockage situation, the accuracy of the video blockage prediction result can be improved.

The flow characteristics can be directly captured from the network, and the obtained flow characteristics are processed to form a plurality of flow characteristic sequences. Specifically, the time of the first get request to the video server is the start timestamp as the traffic feature extraction from the packet. The flow characteristic data described above is collected for each time interval for model training.

And S120, inputting the flow characteristic sequence into a pre-trained stuck prediction model to obtain a stuck prediction result of the target video in a later time interval of a set moment, wherein the duration of the time period is greater than or equal to the duration of the time interval.

The Katon prediction model is a pre-trained neural network model. The stuck prediction result is used for judging whether the video is stuck in a future time interval. The time interval after the set time is a time interval after the set time, and the duration of the normal time interval is equal to or less than the time period. The time interval is used as a unit time for evaluating video calories.

The characteristic data in the time period with long duration is selected to predict the pause condition in the time interval with short duration, so that the accuracy of the pause prediction result can be improved.

Optionally, before inputting the flow characteristic sequence into a pre-trained katon prediction model, the method further includes: acquiring flow characteristics of a sampled video in at least one time interval and a pause evaluation result of a time interval next to the at least one time interval; generating a training sample according to the stuck evaluation result in each time interval and the flow characteristics in each time interval; and training a neural network model according to the plurality of training samples to obtain a Katon prediction model.

The sampling video can be videos in various large video websites, and particularly videos played in real time.

The time interval is used for data acquisition of the flow characteristics and the katon evaluation result as a unit time of sample acquisition. The flow characteristics and the katon's evaluation results may be combined to form a plurality of training samples.

Traffic characteristics within a time interval to characterize network traffic within the time interval. And evaluating the video jam within the time interval to describe whether the video jam within the time interval.

The training samples are used to train a neural network model to form a katon prediction model. Illustratively, the neural network model may be a recurrent neural network model, a convolutional neural network, a deep neural network, or the like.

Specifically, the manner of generating the training samples may specifically be: and combining the flow characteristic with a preceding time interval and a succeeding time interval of the stuck evaluation result, and combining the flow characteristic in one adjacent time interval and the stuck evaluation result in one adjacent time interval to form a training sample. Or combining at least two flow characteristics which are continuous in time to form a flow characteristic of a time period, and combining the flow characteristic with the katon evaluation result of a time interval adjacent to the time-last time interval to form a training sample. For example, the flow characteristics of t consecutive time intervals may be combined in a splicing manner, and the evaluation result of the next t +1 time interval is used as a katon label to form a training sample. In addition, there are other sample forming methods, and the embodiment of the present invention is not particularly limited thereto.

The method comprises the steps of obtaining flow characteristics of a plurality of time intervals and blocking evaluation results of the time intervals with the same duration, combining to form a training sample, training a neural network model to form a blocking prediction model, and accurately obtaining a blocking prediction result.

Optionally, the generating a training sample according to the hiton evaluation result in each time interval and the flow characteristic in each time interval includes: and generating a training sample according to the pause evaluation result of the time interval after the target time and the flow characteristics of at least one time interval before the target time.

The target time is used as a time point at which the stuck evaluation result and the flow characteristic are combined, and specifically may refer to a start time corresponding to a time interval of the stuck evaluation result to be combined and an end time corresponding to a time interval of a time-last flow characteristic in the flow characteristic to be combined.

Illustratively, the time interval is 1 second, and for a flow characteristic combination including only one time interval, if the sequence length is 1. For the 1 st second, i.e., the sequence of flow signatures, only the 1 st second flow signature is included, and the 2 nd second katon evaluation result is used as a label for this sample. Similarly, for the 2 nd second, i.e., the sequence formed by the flow characteristics, only the flow characteristics of the 2 nd second are included, and the 3 rd second katon evaluation result is taken as the label of the sample. And so on to form training samples.

For the flow characteristic combination comprising t (t is more than or equal to 2) continuous time intervals, wherein the video playing time length is more than or equal to t time intervals, the flow characteristics of the t continuous time intervals can be directly obtained to form a flow characteristic sequence, and the stuck evaluation result corresponding to the t +1 th time interval after the t time interval is taken as the stuck evaluation result of the flow characteristic sequence. If the video playing time length is less than t time intervals, the available flow characteristics of n (n is less than t) continuous time intervals can be obtained, the flow characteristics of the (n + 1) th to t-th time intervals are filled with zero (padding) to form a flow characteristic sequence, and the pause evaluation result corresponding to the (n + 1) th time interval is used as the pause evaluation result of the flow characteristic sequence.

By analyzing the time corresponding to the time interval of the Kanton evaluation result and the flow characteristic, the target time is determined and combined to form a training sample, so that the diversity of the sample and the representativeness of the sample are increased.

Optionally, the neural network model is a neural network model in which a convolutional neural network and a cyclic neural network are fused.

Specifically, the convolutional neural network is a multi-layer supervised learning neural network, and the structure of the convolutional neural network mainly comprises an input layer, a convolutional layer, a pooling layer, a full-connection layer and the like. The method comprises the steps of performing convolution operation on input samples through a convolution kernel, extracting features to obtain a Feature map (Feature map), performing Feature downsampling (Subsample) through pooling operation (such as average pooling, maximum pooling and the like), and outputting the Subsample through a full-connection layer.

The recurrent neural network is a time-based feed-forward neural network, the purpose of which is to process time-series data. In the traditional neural network model, from an input layer to a hidden layer to an output layer, all layers are connected, and nodes between every two layers are not connected. In a recurrent neural network, the current output of the network is also related to the previous output. The concrete expression is that the network memorizes the previous information and applies the previous information to the calculation of the current output, namely, the nodes between the hidden layers are connected. And the input of the hidden layer not only contains the information of the input layer, but also comprises the information of the hidden layer at the last moment.

By adopting the neural network model integrating the convolutional neural network and the cyclic neural network, the advantages of the convolutional neural network and the cyclic neural network can be integrated, and the accuracy of the Katon prediction is further improved. Specifically, the convolutional neural network and the cyclic neural network respectively perform feature vector extraction on the preprocessed flow feature data to form feature vectors with the same Shape (Shape), perform feature fusion, perform two-layer full connection, perform two-class (where 0 is not stuck and 1 is stuck) prediction by using a classifier (Softmax function), and output a final result.

In a specific example, as shown in fig. 1b, the structure of the katon prediction model includes a recurrent neural network feature extraction layer, a convolutional neural network feature extraction layer, feature fusion, a fully-connected network, and a Softmax classifier. The cyclic neural network feature extraction layer and the convolutional neural network feature extraction layer extract features from input data. The features extracted by the cyclic neural network feature extraction layer and the convolutional neural network feature extraction layer are feature vectors with the same shape, and are fused and input to a full connection layer (such as two layers) and a Softmax classifier to obtain a classified result, namely an output result.

Illustratively, the Recurrent Neural Network (RNN) is a type of Recurrent Neural Network (RNN) in which time series (sequence) data is used as an input, recursion is performed in the evolution direction of the sequence, and all nodes (Recurrent units) are connected in a chain manner, and has the characteristics of memorability and parameter sharing, but the RNN has a problem of gradient disappearance. The long-short term memory network (LSTM) can be used for selecting partial information to be reserved and partial information to be forgotten. The flow characteristics generated in the video playing process are a time sequence, and whether the video is stuck at the next time or not is associated with the flow data at the previous time. Therefore, a recurrent neural network can be selected to solve the problem of the stuck prediction.

As shown in FIG. 1c, the structure of the long-short term memory network comprises two LSTMs, and a Dropout layer is arranged between the two LSTMs. The input data actually forms a feature vector by passing through two layers of LSTM based on historical traffic features (kb/s, packets/s, etc.) and current traffic features (kb/s, packets/s, etc.). The Dropout layer is used for preventing the model from being over-fitted and improving the generalization performance of the model.

As shown in fig. 1d, the convolutional neural network structure includes convolutional layers and max-pooling layers. The convolutional neural network splices a plurality of flow characteristic sequences into a matrix, extracts the historical flow characteristics (kb/s, packets/s and the like) and the characteristics 0, 1, 2 and 3 … … in the current flow characteristics (kb/s, packets/s and the like) through the convolution operation of a convolutional kernel, and finally performs dimension reduction through a maximum pooling layer to form a second characteristic.

Performing feature fusion on the first features obtained by the long-short term memory network and the second features obtained by the convolutional neural network to form a fusion feature vector, which can be calculated based on the following formula:

Concat_Features＝Concat(features_1，features_2)

wherein, Concat _ Features is a fusion feature vector, Features _1 is a first feature, and Features _2 is a second feature.

And as shown in fig. 1b, the fused feature vectors pass through two layers of fully-connected networks and are recombined into a complete feature map, the output result finally passing through the Softmax classifier corresponds to the final predicted stuck prediction result (for example, 0: not stuck, 1: stuck), the output result actually describes the probability of different classes, and the class corresponding to the maximum probability is taken as the stuck prediction class. And calculating loss between the output result and the real result by using the cross entropy, minimizing a loss function by using an Adam optimizer, and simultaneously optimizing parameters in the whole network model to obtain a final prediction model.

On the basis of the LSTM, the representativeness of the extracted features is increased and the accuracy of feature extraction is improved by fusing a convolutional neural network, so that the prediction accuracy of the Katon prediction model is improved.

Example two

Fig. 2 is a flowchart of a video stuck prediction method according to a second embodiment of the present invention, which is embodied on the basis of the first embodiment. The method of the embodiment specifically includes:

s210, acquiring an image frame formed by intercepting the sampling video according to a time interval.

The time interval, the video, the traffic characteristic, the katon evaluation result, the target time, the training sample, the neural network model, the katon prediction model, the traffic characteristic sequence, the current system time, the katon prediction result, and the time period in the embodiment of the present invention may refer to the description of the foregoing embodiment.

In practice, video includes a process of continuously and rapidly switching a plurality of images. An image frame refers to the corresponding images of a video at different times. The change of the image frame with time is used for evaluating whether the video is jammed or not.

The image frames may be obtained by capturing a video played in real time. Generally, the page where the video player is located also includes other content unrelated to the video, and only the image of the area where the video is played may be intercepted as the image frame.

In order to reduce the processing data amount of the image frame, the sampling video can be captured periodically according to a preset time interval.

S220, comparing every two adjacent image frames, and determining a Kanton evaluation result of the time interval matching associated with the acquisition time of the two adjacent image frames.

The acquisition time refers to the screenshot time of the image frame in the sampling video playing process. Illustratively, the acquisition time instant is the temporal position of the image frame in the sampled video.

Two adjacent image frames refer to image frames adjacent at the time of acquisition.

And comparing every two adjacent image frames, and actually judging whether the two adjacent image frames are the same or whether a similar condition is met. If the two adjacent image frames are the same or very similar, it indicates that the two adjacent image frames have not changed, that is, the sampled video is stuck at the position where the sampled video is played to the acquisition time of the two adjacent image frames.

And the pause evaluation result of the matching time interval associated with the acquisition time of the two adjacent image frames indicates whether the sampled video is paused at the position played to the acquisition time of the two adjacent image frames. The stuck evaluation results include stuck and not stuck.

The time interval associated with the acquisition time of two adjacent image frames is used to determine whether the sampled video is played back in this time interval. And if the result of the pause evaluation that the time intervals associated with the acquisition moments of the adjacent image frames are matched is pause, indicating that the sampled video is paused in the time interval associated with the acquisition moments of the two image frames. And if the result of the pause evaluation matched with the time interval associated with the acquisition time of the adjacent image frames is that the image frames are not paused, indicating that the sampled video is smoothly played in the time interval associated with the acquisition time of the two image frames.

By recording the acquisition time of each image frame in the sampling video, the time length between the acquisition time of every two adjacent image frames is a time interval, if the two adjacent image frames are consistent, the pause evaluation result of the time interval matching associated with the acquisition time of the two adjacent image frames is marked as pause, otherwise, the pause evaluation result is marked as not pause.

And S230, acquiring packet capturing data of the sampling video, analyzing the packet capturing data to obtain flow characteristics, wherein the packet capturing data comprises pcap packet data.

And the packet capturing data is used for acquiring traffic characteristics. Illustratively, the packet capture data is a pcap packet.

The method can directly capture the pcap data packets generated in the video playing process of the video website from the network, each pcap packet corresponds to one video sample, and the pcap packets are analyzed to obtain the flow characteristics.

In fact, when a user plays a video using a video player in a video playing page, the generated original traffic includes video resources, and other content, which may be content unrelated to the video resources. At this time, some data unrelated to the video resource, including advertisements or pictures, etc., needs to be filtered out. Filtering content other than video assets may be accomplished by analyzing the requested link data. Illustratively, for a picture, a pcap packet includes a link with a picture format, and a video resource pcap packet corresponds to a link of a TS fragment, so that traffic data of a video resource can be filtered out.

Optionally, the flow characteristics include at least one of: byte number per second, byte number per second uplink, byte number per second downlink, packet number per second uplink, packet number per second downlink, byte number per second jitter, byte number per second uplink jitter, byte number per second downlink jitter, packet number per second uplink jitter, packet number per second downlink jitter, maximum byte number per second uplink, maximum byte number per second downlink, maximum packet number per second uplink, maximum packet number per second downlink, minimum packet number per second uplink, minimum packet number per second downlink, total packet delay per second uplink, total packet delay per second downlink, average packet delay per second downlink, code rate, and video resolution.

There are a plurality of flow characteristics, and at least one of them can be selected as a flow characteristic to be formed into a flow characteristic sequence.

And S240, taking the time interval associated with the moment of acquiring the video data of the first acquisition request in the packet capturing data as the time interval matched with the flow characteristic.

The request time of the packet capture data may refer to the time when a request for acquiring the addresses of the requester and the responder of the traffic characteristics is issued. Specifically, whether the video is stuck in the time interval between every two adjacent image frames is determined by judging whether every two adjacent image frames are the same or not, and the result is used as a stuck evaluation result. If the two are the same, the two are stuck, otherwise, the two are not stuck.

The determination of the requestor and the responder (server) is often required in traffic characterization. Specifically, the IP of the requester and the IP of the responder can be obtained through the get request, and then data transmitted from the IP of the requester to the IP of the responder is defined as uplink data, and data transmitted from the IP of the responder to the IP of the requester is defined as downlink data.

It will be appreciated that the traffic characteristics actually characterize the network quality over a certain period of time. For example, the number of bytes per second in the upstream indicates the amount of data sent by the requesting IP to the responding IP after one second from a certain time. Specifically, the request time for sending the get may be used as a start time of the traffic characteristic, and the end time may be determined according to the start time and the time interval and used as the end time of the traffic characteristic.

And S250, generating a training sample according to the pause evaluation result of the time interval after the target time and the flow characteristics of at least one time interval before the target time.

Specifically, a training sample is formed by combining a continuously preset number of flow characteristics before the target time with a stuck evaluation result after the target time. The target time is the ending time corresponding to the time interval matched with the flow characteristic with the last time in the flow characteristics with the continuous preset quantity, and meanwhile, the target time is the starting time corresponding to the time interval matched with the katon evaluation result. The flow characteristic splicing combination for the continuous t time intervals comprises a flow characteristic sequence formed by training samples for the flow characteristics of the continuous t time intervals and the Kanton evaluation result.

In a specific example, for example, t is defined as 10, the sequence length is 10, a traffic feature sequence including a time interval of the first 10 seconds needs to be formed within 10 seconds, if a video is currently played only up to the 1 st second, the traffic feature data of the first 1 second (the corresponding time interval is 0 second to 1 second) is acquired, meanwhile, the traffic feature data of the following 9 seconds (the corresponding time interval is 2 seconds to 10 seconds) is filled with 0 in a splicing manner and is recorded as seq _ len being 1, and the hiton evaluation result of the 2 nd second is taken as the hiton evaluation result of the traffic feature sequence; if the video is played only to the 2 nd second currently, acquiring the flow characteristic data of the first 2 seconds (the corresponding time interval is 0-2 seconds), splicing and filling the flow characteristic data of the next 8 seconds (the corresponding time interval is 3-10 seconds) with 0, and recording the result as seq _ len being 2, and taking the morton evaluation result of the 3 rd second as the morton evaluation result of the flow characteristic combination; by analogy, until the video is played for the 10 th second, the traffic feature data of the previous 10 seconds (for example, the 12 th second, the traffic feature data of the corresponding time interval of 3 seconds to 12 seconds) adjacent to the current time are spliced and recorded as seq _ len being 10, and the hiton evaluation result in the time interval corresponding to the next one second is taken as the hiton evaluation result of the traffic feature combination. And by analogy, constructing a training sample.

That is, the flow rate feature of 10 seconds in succession before the arbitrary time t forms a flow rate feature sequence, and less than 10 seconds is filled with 0 as an input X, and the katton prediction result of the corresponding arbitrary time t +1 second is used as an output Y, and seq _ len is recorded. In fact, the benefit of processing the katon prediction result using label (0 or 1 identifies the classification result, e.g., 0 is not katon and 1 is katon) is that each sample datum is independent of the other and the training does not affect each other. When the data is used, normalization processing is required, and sampling equalization operation needs to be carried out on samples because the proportion of acquired data which is stuck to data which are not stuck is unbalanced. Furthermore, employing binary classification requires onehot encoding of label.

Specifically, in practice, before training the model, the flow characteristics and the matched stuck evaluation result need to be obtained, and preprocessing, such as zero padding processing, normalization processing, sample equalization processing, and the like, is performed to form a training sample, so that the representativeness of the training sample is improved, and redundant data is reduced.

For example, a Python script program may be used to correspond the traffic characteristics to the stuck evaluation results one-to-one at time intervals, where the stuck evaluation results are marked as 1 and not marked as 0, and the results are saved locally.

And S260, training a neural network model according to the plurality of training samples to obtain a Canton prediction model, wherein the neural network model is a neural network model formed by fusing a convolutional neural network and a cyclic neural network.

S270, obtaining the flow characteristics of the target video in the previous time period of the current system moment to form a flow characteristic sequence, wherein the target video is a video played in real time.

As mentioned above, the processing method of the flow feature sequence in the training sample is to keep consistent when performing the pause prediction of playing the video in real time at the later stage.

It should be noted that the flow characteristic sequence corresponding to the training sample includes a katon evaluation result. And the flow characteristic sequence to be predicted formed by the flow characteristics of the video played in real time does not comprise a stuck evaluation result.

The process of forming the flow characteristic sequence to be predicted by the flow characteristic also includes data preprocessing, such as the foregoing padding processing, standard normalization processing, sample equalization processing, and the like.

And S280, inputting the flow characteristic sequence into a pre-trained stuck prediction model to obtain a stuck prediction result of the real-time played video in a time interval after the current system moment, wherein the time length of the time period is greater than or equal to the time length of the time interval.

According to the embodiment of the invention, the packet capturing data of the sampled video on the network during playing and the blocking condition of the sampled video during playing are obtained to form the training sample, the fusion neural network model is trained to obtain the blocking prediction model, and the blocking condition of the target video in the time interval after the current system moment is predicted according to the flow characteristic of the target video played in real time, so that the video blocking in the future time period is predicted, and the instantaneity of video blocking prediction is improved.

EXAMPLE III

Fig. 3 is a schematic diagram of a video stuck prediction apparatus according to a third embodiment of the present invention. The third embodiment is a corresponding apparatus for implementing the video stuck prediction method provided by the above embodiments of the present invention, and the apparatus may be implemented in a software and/or hardware manner, and may be generally integrated into a computer device.

Accordingly, the apparatus of the present embodiment may include:

a flow characteristic sequence generating module 310, configured to obtain a flow characteristic of the target video in a time period before a set time to form a flow characteristic sequence;

the video stuck prediction module 320 is configured to input the flow characteristic sequence into a pre-trained stuck prediction model to obtain a stuck prediction result of the target video in a time interval after a set time, where a duration of the time period is greater than or equal to a duration of the time interval.

Further, the video stuck prediction apparatus further includes: the model training module is used for acquiring the flow characteristics of the sampled video in at least one time interval and the pause evaluation result in the next time interval of at least one time interval before the flow characteristic sequence is input into a pre-trained pause prediction model; generating a training sample according to the stuck evaluation result in each time interval and the flow characteristics in each time interval; and training a neural network model according to the plurality of training samples to obtain a Katon prediction model.

Further, the model training module includes: and the training sample generating unit is used for generating a training sample according to the katton evaluation result of the time interval after the target time and the flow characteristic of at least one time interval before the target time.

Further, the model training module includes: the characteristic acquisition unit is used for acquiring image frames formed by intercepting the sampling video according to time intervals; comparing every two adjacent image frames, and determining a stuck evaluation result of time interval matching associated with the acquisition time of the two adjacent image frames; acquiring packet capturing data of the sampling video, and analyzing to obtain flow characteristics, wherein the packet capturing data comprises pcap packet data; and taking the time interval associated with the moment of acquiring the video data associated with the first acquisition request in the packet capturing data as the time interval matched with the flow characteristic.

Further, the neural network model is a neural network model formed by fusing a convolutional neural network and a cyclic neural network.

Further, the target video is a video played in real time, and the set time is the current system time.

Further, the flow characteristics include at least one of: byte number per second, byte number per second uplink, byte number per second downlink, data packet number per second uplink, data packet number per second downlink, byte number per second jitter, byte number per second uplink, byte number per second downlink jitter, number of packets per second uplink jitter, number of packets per second downlink jitter, number of packets per second maximum byte number per second uplink, maximum byte number per second downlink, maximum packet number per second uplink, maximum packet number per second downlink, minimum packet number per second uplink, minimum packet number per second downlink, total packet delay per second uplink, total packet delay per second downlink, average packet delay per second downlink, code rate, and video resolution.

The device can execute the method provided by the embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.

Example four

Fig. 4 is a schematic structural diagram of a computer device according to a fourth embodiment of the present invention. FIG. 4 illustrates a block diagram of an exemplary computer device 12 suitable for use in implementing embodiments of the present invention. The computer device 12 shown in FIG. 4 is only one example and should not bring any limitations to the functionality or scope of use of embodiments of the present invention.

As shown in FIG. 4, computer device 12 is in the form of a general purpose computing device. The components of computer device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16. The computer device 12 may be a device that is attached to a bus.

Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and PerIPheral Component Interconnect (PCI) bus.

Computer device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computer device 12 and includes both volatile and nonvolatile media, removable and non-removable media.

The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)30 and/or cache memory 32. Computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 4, and commonly referred to as a "hard drive"). Although not shown in FIG. 4, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a Compact disk Read-Only Memory (CD-ROM), Digital Video disk (DVD-ROM), or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. System memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.

A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in system memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally carry out the functions and/or methodologies of the described embodiments of the invention.

Computer device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with computer device 12, and/or with any devices (e.g., network card, modem, etc.) that enable computer device 12 to communicate with one or more other computing devices. Such communication may be through an Input/Output (I/O) interface 22. Also, computer device 12 may communicate with one or more networks (e.g., Local Area Network (LAN), Wide Area Network (WAN)) via Network adapter 20. As shown, Network adapter 20 communicates with other modules of computer device 12 via bus 18. it should be understood that although not shown in FIG. 4, other hardware and/or software modules may be used in conjunction with computer device 12, including but not limited to microcode, device drivers, Redundant processing units, external disk drive Arrays of Inesponsive Disks (RAID) systems, tape drives, and data backup storage systems, among others.

The processing unit 16 executes various functional applications and data processing, such as implementing the methods provided by any of the embodiments of the present invention, by executing programs stored in the system memory 28.

EXAMPLE five

An embodiment five of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a video stuck prediction method according to any embodiment of the present invention:

that is, the program when executed by the processor implements: acquiring flow characteristics of a target video in a time period before a set moment to form a flow characteristic sequence; and inputting the flow characteristic sequence into a pre-trained stuck prediction model to obtain a stuck prediction result of the target video in a time interval after a set moment, wherein the time length of the time period is greater than or equal to the time length of the time interval.

Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a RAM, a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a portable CD-ROM, an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, Radio Frequency (RF), etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a LAN or a WAN, or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A method for video stuck prediction, comprising:

2. The method of claim 1, further comprising, prior to inputting the flow signature sequence into a pre-trained katon prediction model:

acquiring flow characteristics of a sampled video in at least one time interval and a pause evaluation result of a time interval next to the at least one time interval;

generating a training sample according to the stuck evaluation result in each time interval and the flow characteristics in each time interval;

and training a neural network model according to the plurality of training samples to obtain a Katon prediction model.

3. The method of claim 2, wherein generating training samples according to the katon evaluation results in each time interval and the flow characteristics in each time interval comprises:

and generating a training sample according to the pause evaluation result of the time interval after the target time and the flow characteristics of at least one time interval before the target time.

4. The method of claim 2, wherein said obtaining a flow characteristic of the sampled video for at least one of said time intervals and a stuck evaluation result for a time interval subsequent to at least one of said time intervals comprises:

acquiring an image frame formed by intercepting a sampling video according to a time interval;

comparing every two adjacent image frames, and determining a stuck evaluation result of time interval matching associated with the acquisition time of the two adjacent image frames;

acquiring packet capturing data of the sampling video, and analyzing to obtain flow characteristics, wherein the packet capturing data comprises pcap packet data;

and taking the time interval associated with the moment of acquiring the video data of the first acquisition request in the packet capturing data as the time interval matched with the flow characteristic.

5. The method of claim 2, wherein the neural network model is a neural network model that is a fusion of a convolutional neural network and a recurrent neural network.

6. The method of claim 1, wherein the target video is a real-time video, and the set time is a current system time.

7. The method of claim 1, wherein the flow characteristics comprise at least one of: byte number per second, byte number per second uplink, byte number per second downlink, data packet number per second uplink, data packet number per second downlink, byte number per second jitter, byte number per second uplink jitter, byte number per second downlink jitter, number of packets per second uplink jitter, number of packets per second downlink jitter, maximum byte number per second uplink, maximum byte number per second downlink, maximum packet number per second uplink, maximum packet number per second downlink, minimum packet number per second uplink, minimum packet number per second downlink, total packet delay per second, average packet delay per second uplink, total packet delay per second uplink, average packet delay per second downlink, code rate, and video resolution.

8. A video stuck prediction apparatus, comprising:

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the video calton prediction method of any of claims 1-7.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the video calton prediction method according to any one of claims 1 to 7.