WO2022116811A1

WO2022116811A1 - Method and device for predicting definition of video having encrypted traffic

Info

Publication number: WO2022116811A1
Application number: PCT/CN2021/130890
Authority: WO
Inventors: 王赟; 侯贺明; 曾伟
Original assignee: 武汉绿色网络信息服务有限责任公司
Priority date: 2020-12-04
Filing date: 2021-11-16
Publication date: 2022-06-09
Also published as: CN112203136B; CN112203136A

Abstract

Disclosed in the present invention are a method and device for predicting the definition of a video having encrypted traffic. The method comprises: capturing TCP stream data packets for playback of network videos having encrypted traffic and a playback log; according to video code numbers in the playback log, labelling the definitions of the captured network videos having encrypted traffic; detecting data blocks from the TCP stream data packets; extracting features corresponding to the definitions in the data blocks and feature average values, and forming feature sets of the definitions of known data packets; establishing a model by means of correspondences between definitions labeled on known video files and the feature sets of the data blocks, training the model, performing feature extraction on a TCP stream data packet for an video having encrypted traffic to be predicted, and according to the correspondences between the feature sets and the definitions in the model, predicting the definition of the video file having encrypted traffic to be predicted. When video transmission is encrypted and the content of the video file cannot be obtained, the definition of a video file having encrypted traffic to be predicted is predicted by building a model.

Description

A method and device for predicting sharpness of encrypted traffic video

technical field

The invention belongs to the field of computer servers, and more particularly, relates to a method and device for predicting the definition of encrypted traffic video.

Background technique

When a video website uses HTTP to transmit video, DPI (Deep Packet Inspection) manufacturers can extract the transmitted video file from the network traffic. The header information of the video file contains the video encoding, definition, Video bit rate, video picture size and other information; in recent years, almost all large websites have deployed digital certificates. When interacting with clients, they use the HTTPS transmission protocol, and video websites are no exception. By 2020, China's mainstream video platforms will use HTTPS encrypted transmission when watching videos in browsers.

In the case of video transmission encryption, the DPI manufacturer cannot obtain the content of the video file, which makes it impossible to analyze the video definition.

SUMMARY OF THE INVENTION

In view of the above defects or improvement requirements of the prior art, the present invention provides a method and device for predicting the definition of encrypted traffic video. The model is trained with the feature set of the known data packet definition, and then the model is used to predict the definition of the encrypted video file of unknown definition, thus solving the technical problem that the DPI manufacturer cannot analyze the video definition when the video transmission is encrypted.

In order to achieve the above object, in the first aspect, the present invention provides a method for predicting the clarity of encrypted traffic video, and the method for predicting the clarity of the video includes:

Capture TCP stream packets and playback logs of network video playback with HTTPS encrypted traffic;

Mark the clarity of the captured encrypted traffic network video according to the video encoding number in the playback log;

detecting data blocks from the TCP stream data packets;

Extracting the feature corresponding to the definition and the feature average value in the data block to form a feature set of known data packet definition;

The model is established by using the corresponding relationship between the clarity of the known video file annotation and the feature set of the data block, and the model is trained. After the model training is completed, the feature extraction is performed on the TCP stream data packets of the encrypted traffic video to be tested. According to the feature set in the model and the clarity The corresponding relationship between the degrees predicts the clarity of the encrypted traffic video file to be tested.

As a further improvement and supplement to the above solution, the present invention also includes the following additional technical features.

Preferably, the method for collecting data of network video playback with HTTPS encrypted traffic includes:

On the browser, request the database that stores video information to play videos with HTTPS encrypted traffic, and select at least two video files with different definition and video content.

Preferably, each encoding method corresponds to a unique video definition number in the play log, and when the browser plays the video, the video encoding number and the corresponding definition in the encrypted traffic network video play log are simultaneously recorded and collected.

Preferably, the data block is detected according to the ACK field of the TCP message, and specifically includes:

Determine whether all the packets of a TCP stream are HLS video streams;

Parse the TLS message, remove the TLS handshake message, and retain the message for transmitting data;

Determine the uplink and downlink packets, and process the downlink packets;

Classify the ACK value of the downlink message;

Messages with the same ACK value are recorded as a data block;

The message whose ACK value has changed is recorded as a new data block.

Preferably, the characteristics corresponding to the definition and the average value of the characteristics in the data block are extracted from the TCP stream file in the video transmission, wherein:

Described feature comprises: one or more in data block size, data packet number, first byte arrival time, data block download time, data block idle time, data block transmission time and data transmission rate;

The characteristic average value includes one of: average data block size, average data packet number, average first byte arrival time, average data block download time, average data block idle time, average data block transfer time, and average data transfer rate. item or multiple items.

Preferably, the features in the data block, the feature average value and the data block of the current TCP flow file are combined into a feature set sample with a known data packet definition, and the model receives at least one feature set sample for training, using known The video coding number in the video playback log of the definition verifies the predicted definition result of the model. If the accuracy of the prediction result of the model is higher than the preset value, the model is successfully trained.

Preferably, when the model predicts the encrypted traffic video of unknown definition, in the mobile communication network, the model prediction trained by the Android platform is used, and in the traditional fixed network environment, the model prediction trained by the PC platform is used.

Preferably, a random oversampling method is used to balance the sharpness categories on the sample set.

Preferably, when filtering out the target traffic with HTTPS encrypted traffic network video from the network traffic, by comparing the SNI field and the string in the domain name, if it completely matches the preset SNI field and the preset string, it means that the traffic is HLS Transport mode video traffic, and can become a video file that requires predictive definition encrypted traffic.

In a second aspect, the present invention also provides a device for predicting the definition of encrypted traffic video, the device comprising:

at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being programmed to perform the first aspect The above-mentioned method for predicting sharpness of encrypted traffic video.

In general, compared with the prior art, the above technical solutions conceived by the present invention have the following beneficial effects:

Under the premise that the video transmission encryption cannot obtain the content of the video file, DPI manufacturers can build a model and use the feature set in the video file of known definition to predict the definition of the encrypted traffic video file to be tested.

Description of drawings

1 is a process of training a model in Embodiment 1 of the present invention;

Fig. 2 is the definition that utilizes the video coding number in the play log to mark the corresponding definition in Embodiment 1 of the present invention;

3 is a process of detecting a data block in the ACK field of a TCP message in Embodiment 1 of the present invention;

Fig. 4 compares the SNI field and character string in the domain name in the first embodiment of the present invention, and filters out the encrypted traffic network video to be tested;

FIG. 5 is a schematic structural diagram of an apparatus for predicting sharpness of encrypted traffic video according to an embodiment of the present invention.

Detailed ways

In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention. In addition, the technical features involved in the various embodiments of the present invention described below can be combined with each other as long as they do not conflict with each other.

In the present invention, unless otherwise expressly specified and limited, a first feature "on" or "under" a second feature may include the first and second features in direct contact, or may include the first and second features Not directly but through additional features between them. Also, the first feature being "above", "over" and "above" the second feature includes the first feature being directly above and obliquely above the second feature, or simply means that the first feature is level higher than the second feature. The first feature is "below", "below" and "below" the second feature includes the first feature being directly below and diagonally below the second feature, or simply means that the first feature has a lower level than the second feature.

Example 1:

When a video website uses HTTP to transmit video, DPI manufacturers can extract the transmitted video file from the network traffic. The header information of the video file contains the video encoding, definition, video bit rate, video picture size and other information; In recent years, almost all large websites have deployed digital certificates. When interacting with clients, the HTTPS transmission protocol is used, and video websites are no exception. By 2020, China's mainstream video platforms will be Tencent Video, iQiyi Video, and Youku Video; all three video platforms use HTTPS encrypted transmission when watching videos in browsers.

In the case of video transmission encryption, the DPI manufacturer cannot obtain the content of the video file, resulting in the inability to analyze the video quality. In order to solve this problem, the first embodiment provides a method for predicting the clarity of the encrypted traffic video. For Tencent Video For example, as shown in Figure 1 and Figure 3, the prediction method for video resolution includes the following steps:

In step 101, a Transmission Control Protocol (Transmission Control Protocol, abbreviated as: TCP) stream data packet and a playback log of network video playback with HTTPS encrypted traffic are captured.

On the WEB side, that is, the browser to watch Tencent videos, use HTTPS encrypted transmission; the WEB side also has a P2P (Peer-to-Peer, abbreviated as: peer-to-peer network) transmission mechanism, and the P2P part uses the User Datagram Protocol (User Datagram Protocol, abbreviated as: UDP) transmission, this part of the traffic is not encrypted;

The PC side and the mobile phone side use HTTP and UDP transmission, both of which are transmitted in plain text and are not encrypted; UDP transmission accounts for more than 95%;

The first embodiment is to identify the clarity of the HTTPS encrypted traffic video.

Tencent Video will use different transmission modes according to the client type and different video types; for example, for long videos such as TV dramas and movies, dynamic bit rate adaptation technology (HTTP Live Streaming, abbreviated as: HLS) is used for block transmission or use MP4 is transmitted in blocks; for videos and short videos uploaded by users, MP4 is used for whole file transmission; for advertising videos, MP4 is used for block transmission.

In addition, the encoding method and transmission mode of Tencent Video are also constantly evolving. At present, the most mainstream transmission mode is the HLS transmission mode; in this embodiment 1, only the HLS transmission mode is introduced, and other transmission modes are not described. mode is introduced.

In step 102, according to the video code number in the play log, the captured network video of encrypted traffic is marked with clarity.

Tencent Video divides the video into 4 resolutions, namely 270p, 480p, 720p and 1080p. When encoding a specific video for each resolution, there are multiple encoding methods, but each encoding method has a unique number. , the number can correspond to the specific definition.

In step 103, data blocks are detected from the TCP stream data packets.

There are the following steps in the Tencent Video transmission process:

In step 1031, in a TCP stream, download multiple video data blocks, according to the keep-alive mechanism of the HTTP protocol, after statistics, the number of data blocks transmitted by a TCP stream is one or more, and the maximum can reach several Ten; in the same TCP stream, the resolution of the downloaded video data blocks is the same; if you want to switch the resolution, the TCP connection will be interrupted and another TCP connection will be used to download.

In step 104, the feature corresponding to the definition in the data block is extracted to form a feature set of known data packet definition. When extracting features, first extract the relevant features of each data block, then calculate an average of the features of all data blocks in the same TCP connection, and finally use the features extracted from the data blocks in the first step to be the same as the second step. The feature average of all data blocks of the TCP stream is combined as a feature set of a data block.

In step 105, a model is established and trained by using the corresponding relationship between the definition of the known video file annotation and the data block feature set. After the model training is completed, the feature extraction is performed on the TCP stream data packets of the encrypted traffic video to be tested. The correspondence between the feature set and the sharpness predicts the sharpness of the encrypted traffic video file to be tested.

Tencent Video will encode a video into multiple files of different resolutions, and then perform segmentation and block processing for each file, and record the video block information in an index file. Tencent Video is in HLS transmission mode. , the index file corresponding to the video definition will be downloaded first. The index file divides the video according to a certain length of time, and each video segment has a unique URL. The client downloads the videos one by one according to the URL in the index file. Fragment.

When Tencent Video performs HLS transmission, the video is processed in two stages: segmentation and block. First, the Tencent Video server performs segmentation processing on a complete video and divides it into a data segment every 5 minutes or so, and each segment is named 1. .ts, 2.ts, 3.ts, etc.; Tencent Video Server then performs block processing for each data segment, and divides it into a data block every 10 seconds or so, and the name of each block is counted from 0.

A typical video clip URL format is as follows:

00_b0033m9le2c.321002.1.ts? index=0&start=0&end=7000&brs=0&bre=222967&ver=4

The first field is the downloaded video clip file name 00_b0033m9le2c.321002.1.ts, and each field is explained as follows:

00 represents the data block index number;

b0033m9le2c represents the video ID;

321002 represents the video encoding label;

1.ts represents the segment index;

The parameters of the URL are explained as follows:

index=0: data block index number;

start=0&end=7000: start and end time;

brs=0&bre=222967: The start and end of the data volume, this offset is for this video, the value of 1.ts switching to 2.ts will start from 0.

When a browser watches Tencent Video, the browser first downloads an index file, and then downloads each video data block according to the URL of each video data block in the content of the index file. Tencent Video uses the HLS transmission mode, which essentially divides the video into many video data blocks, and the client requests the data blocks one by one.

Although the transport layer security protocol (Transport Layer Security, referred to as: TLS) protocol encryption is used in the video transmission process, the DPI manufacturer cannot obtain the content of the video file itself, and cannot identify the video clarity from the video content, but due to TLS Encryption will not change the length of the data, so some length-based features at the TCP layer will not change. According to the length of the request and response, each video data block can still be identified, and each video data block can also be calculated. The size of the video data block and related characteristics. Since the blocks of the video file are divided according to the time length, a model can be constructed to predict the definition of the video according to the length information of the data block and the information related to the length of the data block.

In the first embodiment, the random forest classification algorithm is used to construct a model, firstly, the feature sets of the video data blocks of known definition are collected, and then the model is trained and tested with the feature sets, and the results are compared with the clarity of the known video, such as If the accuracy rate exceeds the preset value, such as 80% or 70%, the model can be considered to be well trained. After curing the model, apply the trained model to the definition prediction of the encrypted video traffic to be tested.

After testing with experimental data, the prediction accuracy of this data block model can reach at least 70%, which largely solves the problem of DPI manufacturers encrypting traffic video files of unknown resolution.

For the method of collecting data of network video playback with HTTPS encrypted traffic described in Embodiment 1:

On the browser, request the database that stores video information to play videos with HTTPS encrypted traffic, and select at least two video files with different definition and video content. The more video files the more samples will be collected.

For each encoding method described in the first embodiment, the unique video definition number in the play log corresponds to the video encoding number and the corresponding definition in the encrypted traffic network video play log when the browser plays the video. . Moreover, Tencent Video divides the video into 4 resolutions, namely 270p, 480p, 720p and 1080p. For each resolution, at least one video file is required as a sample. As shown in Figure 2, the resolution and the corresponding video encoding number The relationship is listed, and the relevant definition can be obtained by referring to the video code number in the play log.

For the described preprocessing of encrypted traffic data in the first embodiment to obtain the TCP stream file, the data block is detected according to the TCP message acknowledgment character (Acknowledge Character, abbreviated as: ACK) field, in the first embodiment, the video is unidirectional from the server. It is transmitted to the client, and it is transmitted in blocks. During the video transmission process, the client does not send any message to the server; until a data block is transmitted, the client will send a request message to the server to request the next data. piece. In the process of data block transmission, the ACK field of the TCP message sent by the server to the client remains unchanged. After the client sends an HTTP request, the ACK value increases in value, and the increased value is the value of the client. The message length of the HTTP request sent by the endpoint.

As shown in Figure 3, the steps to detect data blocks are as follows:

In step 201, first judge whether a TCP stream is an HLS video stream of Tencent Video; the method is whether the SNI field in the Client Hello packet of the TLS message is a specific domain name of Tencent Video; these domain names include ltsbsy.qq.com , ltscsy.qq.com, ltssjy.qq.com, ltsws.qq.com, stsbsy.qq.com;

In step 202, parse the TLS message, remove the TLS handshake message, and extract the transmission data message;

During TLS transmission, handshake first, and then data is transmitted; according to the TLS specification protocol analysis, you can know which are the handshake packets and which are the packets that transmit data;

In step 203, determine the uplink and downlink packets;

In step 204, the ACK value of the downlink message is processed to detect whether the ACK value changes;

The above step 204 can be specifically implemented as the following

steps

2041 and 2042.

In step 2041, the message with the same ACK value is recorded as a data block;

In step 2042, the message whose ACK value has changed is re-recorded as a new data block.

Data packets with the same ACK value are recorded as data block 1, that is, data block 1 is a set containing many individual data packets, and all data packets have the same ACK value; according to the time of the data packets To process the data packets in turn, the next data blocks are recorded as data blocks 2, 3, 4....N in turn. All video data blocks belonging to the same TCP stream have the same video definition. When the model predicts sharpness, the data blocks all belong to the same TCP stream.

In the present embodiment one, the TCP stream file extracts the characteristic and the characteristic mean value thereof corresponding to the definition in the described data block in the video transmission, wherein:

The features include: one or more of the size of the data block, the number of data packets, the arrival time of the first byte, the download time of the data block, the idle time of the data block, the data block transmission time and the data transmission rate;

When extracting features, it is divided into three steps. First, extract the relevant features of each data block, then calculate an average value of the features of all data blocks in the same TCP connection, and finally calculate the features extracted from the data block in the first step. The feature average of all data blocks of the same TCP stream as in the second step is combined together as a feature set of a data block.

The first step is to extract 7 features for each data block, which are:

1. Data block size;

Chunk_size, the number of bytes of the data block;

2. The number of data packets;

Packet_number, the number of packets of the data block;

3. The arrival time of the first byte;

Time to first byte, referred to as TTFB, the time from when GET is issued to the arrival of the first byte of the response;

GET refers to the time after the HTTP request message is sent from the client. The response refers to the first message that the server replies to the client. The client and the server use the TCP protocol to exchange messages. After the client sends an HTTP request message, the server will reply with a TCP ACK message. The TTFB time refers to the client The time from the end of the HTTP request to the first packet returned by the server.

4. Download time;

Download_time, the time between the first packet and the last packet of the data block;

5. Free time;

Slack_time, the time from the last message of the data block to the next GET request;

6. Data block transmission time;

Duration_time, the time from the GET request to the next GET request;

The data block transmission time is equal to the arrival time of the first byte plus the download time plus the idle time;

7. Transmission rate;

Download_speed, data block bytes divided by download time;

High-definition data blocks are significantly larger than low-definition data blocks in terms of the number of bytes and packets. These two features are of the highest importance, the most obvious meanings, and easier to understand. Several other time-related features can also play a role in distinguishing clarity after statistical analysis.

The second step is to calculate the average feature value for each TCP connection of Tencent Video, which are:

1. Average data block size;

ave_chunk_size, the average of all chunk sizes;

2. Average number of packages;

ave_packet_number, the average of all packets;

3. Average first byte arrival time;

ave_ttfb, the average of all first byte arrival times;

4. Average download time;

ave_download_time, the average of all download times;

5. Average idle time;

ave_slack, the average of all idle times;

6. Average transmission time;

ave_duration_time, the average of all transit times;

7. Average download rate;

ave_download_speed, the average block size, divided by the average download time;

In the third step, for each data block, we extract 7 features from the data block, plus the 7 average features of the overall TCP stream, a total of 14 features, plus the clarity of this data block, it is regarded as a sample.

Example data is as follows:

chunk_size,packet_number,ttfb,download_time,slack,duration_time,download_speed,ave_chunk_size,ave_packet_number,ave_ttfb,ave_download_time,ave_slack,ave_duration_time,ave_download_speed,resolution

2480000,1750,0.012,4.2,3.0,7.212,590476.1904761905,2338589.1333333333,1801.2666666666667,0.019322029749552407,3.6202432473500568,3.8155619303385415,7.455127207438151,645975.6910105784,720p

Although the encryption protocol is used in the video transmission process, the content of the video file itself cannot be directly obtained, and the video definition cannot be identified from the video content, but each video data block can be identified on the TCP layer, and technicians can calculate Come out the size of each video data block, and the characteristic parameters. Since the blocks of video files are divided according to the length of time, the length of high-definition video blocks is generally larger than that of low-definition video blocks, which is affected by many factors, such as a static image. For more videos, the length of the encoded data block is smaller than that of the video with many action pictures; for example, the video of the same definition corresponds to different video encoding on the mobile phone platform and the browser platform, so the length of the data block is also different. Same.

Step 105 in the first embodiment requires model training. The features in the data block, the average value of the features, and the data block of the current TCP flow file are combined into a feature set sample with a known data packet definition, and the model receives at least In the training of a feature set sample, the video coding number in the video playback log of known definition is used to verify the predicted definition result of the model. If the accuracy of the prediction result of the model is higher than the preset value, the model is successfully trained.

After many times of feature collection, data verification, and feature weight adjustment, the model is successfully trained when the accuracy of the model prediction results exceeds a preset value.

All video data blocks belonging to the same TCP stream are of the same video definition, so calculating the average value of the characteristic parameters of all data blocks in the entire stream is to abstract the entire TCP stream into a data block, and all the data obtained After the definition of the block, a "training result optimization" step is implemented. The process is as follows: Count the predicted definition categories of all data blocks in a TCP stream, find the prediction category with the largest proportion, and then put other data blocks. The prediction results of , are all changed to this clarity category, that is, to count all the prediction results of a TCP stream, and to force the correction of the prediction results by using the minority-subordination-majority mechanism.

In the first embodiment, when the model predicts the encrypted traffic video of unknown definition, in the mobile communication network, the model prediction trained by the Android platform is used, and in the traditional fixed network environment, the model prediction trained by the PC platform is used. .

The average size of the video data block generated by the Tencent Video client on the Android mobile platform is smaller than that generated by the Tencent Video client on the PC platform; therefore, the Android platform and the PC platform need to be distinguished when doing training and prediction. When training the model, you can collect traffic for different platforms separately; when using the model for prediction, you can also filter the target traffic. For example, in the 4G mobile communication network, use the model trained on the Android platform to predict, In the traditional fixed network environment, the model trained on the PC platform is used to predict, thereby improving the accuracy of the model prediction.

In the first embodiment, the random oversampling method is used to balance the clarity categories for the sample set, the machine learning-based model uses the random forest algorithm, and all parameters are kept by default; The distribution is not uniform, so it is necessary to perform category balancing processing on the sample set. In this embodiment 1, the Random Oversampler random oversampling method is used to perform category balancing processing. Random oversampling is a standard process, which specifically refers to random copying, repeating Minority class samples, and finally make the number of minority class and majority class the same to obtain a new balanced dataset.

In the first embodiment, when the target traffic with HTTPS encrypted traffic network video is filtered from the network traffic, the SNI field and the string in the domain name are compared. If it completely matches the preset SNI field and the preset string, it means that The traffic is the video traffic in the HLS transmission mode, and can be a video file that needs to predict the definition of encrypted traffic.

As shown in Figure 4, in actual use, technicians only have TLS-encrypted traffic and do not know the plaintext URL. Therefore, they first need to filter out the target traffic from the network traffic, that is, the HLS transmission mode is used in the Tencent video traffic. Encrypted traffic.

HTTPS is the HTTP protocol carried over the TLS protocol. In the handshake message of the TLS protocol, there is an extension field called SNI in the ClientHello message. This field indicates the domain name of the server to be connected. You can compare the SNI field with the Tencent Video server's domain name to match the corresponding traffic. For the Tencent Video server in HLS transmission mode, the domain name must meet the following format "lts***.qq.com", or "sts***.qq.com", only need to compare the SNI field and the lts or sts string, If it matches exactly, it means that it is the Tencent video traffic in HLS transmission mode, and can become the encrypted traffic video file of the to-be-predicted definition.

Although there are various factors, it is found through actual tests that the method of identifying the sharpness through video data block information still has a high accuracy rate, which is a practical method. Using the method of machine learning, the information of the video data block is collected, trained, and then the trained model is applied to the definition prediction of the encrypted traffic video to be tested.

Embodiment 2:

As shown in FIG. 5 , it is a schematic structural diagram of an apparatus for performing definition prediction on encrypted traffic video according to an embodiment of the present invention. The apparatus for predicting the sharpness of the encrypted traffic video in this embodiment includes one or more processors 21 and a memory 22 . Among them, one processor 21 is taken as an example in FIG. 5 .

The processor 21 and the memory 22 may be connected by a bus or in other ways, and the connection by a bus is taken as an example in FIG. 5 .

As a non-volatile computer-readable storage medium, the memory 22 can be used to store non-volatile software programs and non-volatile computer-executable programs, such as the method for sharpness prediction of encrypted traffic video in Embodiment 1 . The processor 21 executes the method of sharpness prediction for encrypted traffic video by running non-volatile software programs and instructions stored in the memory 22 .

Memory 22 may include high speed random access memory, and may also include nonvolatile memory, such as at least one magnetic disk storage device, flash memory device, or other nonvolatile solid state storage device. In some embodiments, the memory 22 may optionally include memory located remotely from the processor 21, and these remote memories may be connected to the processor 21 through a network. Examples of such networks include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.

The program instructions/modules are stored in the memory 22, and when executed by the one or more processors 21, execute the method for predicting the resolution of encrypted traffic video in the above Embodiment 1, for example, execute the above Describe the various steps shown in Figures 1 and 3.

It is worth noting that the information exchange, execution process and other contents between the modules and units in the above-mentioned device and the system are based on the same concept as the processing method embodiments of the present invention. For details, please refer to the descriptions in the method embodiments of the present invention. , and will not be repeated here.

Those of ordinary skill in the art can understand that all or part of the steps in the various methods of the embodiments can be completed by instructing relevant hardware through a program, and the program can be stored in a computer-readable storage medium, and the storage medium can include: Read memory (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or CD, etc.

The above descriptions are only preferred embodiments of the present invention and are not intended to limit the present invention. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present invention shall be included in the protection of the present invention. within the range.

Claims

A method for predicting the sharpness of encrypted traffic video, characterized in that the method for predicting the sharpness of the video comprises:

Capture TCP stream packets and playback logs of network video playback with HTTPS encrypted traffic;

According to the video coding number in the playback log, the captured encrypted traffic network video is marked with clarity;

detecting data blocks from the TCP stream data packets;

Extracting the feature corresponding to the definition and the feature average value in the data block to form a feature set of known data packet definition;

The model is established by using the corresponding relationship between the clarity of the known video file annotation and the feature set of the data block, and the model is trained. After the model training is completed, the feature extraction is performed on the TCP stream data packets of the encrypted traffic video to be tested. According to the feature set in the model and the clarity The corresponding relationship between the degrees predicts the clarity of the encrypted traffic video file to be tested.
The method for predicting the definition of encrypted traffic video as claimed in claim 1, wherein, the method for collecting data of network video playback with HTTPS encrypted traffic comprises:

On the browser, request the database that stores video information to play videos with HTTPS encrypted traffic, and select at least two video files with different definition and video content.
The method for predicting the definition of encrypted traffic video according to claim 1, wherein each encoding method corresponds to a unique video definition number in the playback log, and when the browser plays the video, the encrypted traffic network video plays The video code numbers and corresponding resolutions in the log are recorded and collected at the same time.
The method for predicting the definition of encrypted traffic video according to claim 1, wherein the data block is detected according to the ACK field of the TCP message, and specifically includes:

Determine whether all the packets of a TCP stream are HLS video streams;

Parse the TLS message, remove the TLS handshake message, and retain the message for transmitting data;

Determine the uplink and downlink packets, and process the downlink packets;

Classify the ACK value of the downlink message;

Messages with the same ACK value are recorded as a data block;

The message whose ACK value has changed is recorded as a new data block.
The method for predicting sharpness of encrypted traffic video according to claim 4, characterized in that, the feature corresponding to sharpness and the average value of the feature in the data block are extracted from the TCP stream file during video transmission, wherein :

The features include: one or more of the size of the data block, the number of data packets, the arrival time of the first byte, the download time of the data block, the idle time of the data block, the data block transmission time and the data transmission rate;

The characteristic average value includes one of: average data block size, average data packet number, average first byte arrival time, average data block download time, average data block idle time, average data block transfer time, and average data transfer rate. item or multiple items.
The method for predicting the clarity of encrypted traffic video according to claim 5, wherein the features in the data block, the average value of the features and the data block of the current TCP flow file are combined into a known data packet with clarity The model receives at least one feature set sample for training, and uses the video code number in the video playback log of the known definition to verify the model's predicted definition result. If the accuracy of the model's prediction result is higher than the preset value , the model is successfully trained.
The method for predicting the definition of encrypted traffic video according to claim 1, wherein when the model predicts the encrypted traffic video of unknown definition, in the mobile communication network, a model trained on the Android platform is used to predict , in the traditional fixed network environment, using the model prediction trained on the PC platform.
The method for sharpness prediction of encrypted traffic video according to claim 6, characterized in that a random oversampling method is used to perform sharpness class balance processing on the sample set.
The method for predicting the definition of encrypted traffic video according to claim 1, wherein, when screening out the target traffic with HTTPS encrypted traffic network video from network traffic, by comparing the SNI field in the domain name and the character string , if it completely matches the preset SNI field and the preset string, it means that the traffic is video traffic in the HLS transmission mode, and can become a video file that requires a predicted definition to encrypt traffic.
A device for predicting sharpness of encrypted traffic video, characterized in that the device comprises:

at least one processor; and, a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being programmed to perform claims 1- 9. Any of the described methods for predicting the sharpness of encrypted traffic video.