CN114025203B - Sequence similarity-based encrypted video flow content analysis method - Google Patents
Sequence similarity-based encrypted video flow content analysis method Download PDFInfo
- Publication number
- CN114025203B CN114025203B CN202111302590.XA CN202111302590A CN114025203B CN 114025203 B CN114025203 B CN 114025203B CN 202111302590 A CN202111302590 A CN 202111302590A CN 114025203 B CN114025203 B CN 114025203B
- Authority
- CN
- China
- Prior art keywords
- video
- sequence
- encrypted
- analyzed
- traffic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 52
- 239000012634 fragment Substances 0.000 claims abstract description 36
- 238000000034 method Methods 0.000 claims abstract description 30
- 238000006243 chemical reaction Methods 0.000 claims abstract description 10
- 238000004590 computer program Methods 0.000 claims description 5
- 238000012545 processing Methods 0.000 claims description 5
- 238000013480 data collection Methods 0.000 claims description 4
- 238000013459 approach Methods 0.000 claims description 3
- 238000012544 monitoring process Methods 0.000 abstract description 4
- 230000005540 biological transmission Effects 0.000 description 8
- 238000012360 testing method Methods 0.000 description 8
- 230000002452 interceptive effect Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- BUGBHKTXTAQXES-UHFFFAOYSA-N Selenium Chemical compound [Se] BUGBHKTXTAQXES-UHFFFAOYSA-N 0.000 description 1
- 230000001010 compromised effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005315 distribution function Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000006798 recombination Effects 0.000 description 1
- 238000005215 recombination Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 229910052711 selenium Inorganic materials 0.000 description 1
- 239000011669 selenium Substances 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
- H04N21/2347—Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving video stream encryption
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
- H04N21/23418—Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
- H04N21/44008—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
- H04N21/4408—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving video stream encryption, e.g. re-encrypting a decrypted video stream for redistribution in a home network
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/50—Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate
Abstract
The invention discloses an encrypted video flow content analysis method based on sequence similarity, which comprises the following steps: collecting video flow data with labels from a network, and classifying and managing the video flow data to form a video flow database; for the encrypted video traffic to be analyzed, converting the encrypted video traffic to be analyzed into a video clip sequence through a video clip sequence conversion module; performing sequence similarity analysis on the video fragment sequence, firstly calculating the Levenstein distance between the video fragment sequence and each record in a video flow database, and then selecting one record content information with smaller Levenstein distance with the video fragment sequence as an analysis result; and if the analysis result is verified later, adding the analysis result into the video flow database. The method can be applied to real-time monitoring of the playing condition of illegal videos in network monitoring, and has important significance for guaranteeing the content security of network space.
Description
Technical Field
The invention belongs to the field of network traffic detection, and particularly relates to an encrypted video traffic content analysis method based on sequence similarity.
Background
With the development of internet technology, network traffic has shown an explosive growth. Meanwhile, the proportion of video traffic in the total traffic of the whole network is also increasingly larger. How to identify illegal video content from a large amount of video traffic is an important research direction. However, the wide application of traffic encryption technologies represented by the secure transport layer protocol (Transport Layer Security, TLS) makes video traffic increasingly transported in an encrypted manner, which presents a great challenge to network supervision.
To enhance the user experience, network video website operators use dynamic adaptive streaming over HTTP (Dynamic Adaptive Streaming over HTTP, DASH) mechanisms to enhance quality of service. The DASH basic principle is shown in fig. 1, and the basic idea is to divide a video file into video segments with the same time at a server side, and encode the video segments with different code rates. When video transmission is carried out, firstly, a server side sends an MPD description file to a client side, and then the client side requests video clips with different code rates to the server side according to network environment and user preference, and the video transmission is carried out by using HTTP.
But just because DASH is used, the timing information of the video clips related to the content is compromised. This is because DASH uses a variable bit rate coding scheme, and there is a correlation between the size of the coded video segment and the complexity of the video content. By analyzing the video clip information, content identification of the encrypted video traffic can be achieved.
While there is currently some work associated with the analysis of encrypted video traffic, more is focused on the identification of encrypted video traffic video parameters for the purpose of feeding back results to operators to enhance the user experience. The analysis of the encrypted video traffic content is not efficient. The current encrypted video traffic content analysis is based on a technical route of supervised classification, and the identification model is not sufficiently extensive, so that unknown types of network flows cannot be processed.
Disclosure of Invention
In order to solve the problems, the invention designs an encrypted video flow content analysis method based on sequence similarity, which specifically comprises the following steps:
and collecting the video flow data with the labels from the network, and carrying out classification management to form a video flow database, wherein the video flow database comprises a plurality of records.
And for the encrypted video traffic to be analyzed, converting the encrypted video traffic to be analyzed into a video clip sequence through a video clip sequence conversion module.
And carrying out sequence similarity analysis on the video fragment sequence, firstly calculating the Levenstein distance between the video fragment sequence and each record in a video flow database, and then selecting one record content information with smaller Levenstein distance with the video fragment sequence as an analysis result.
If the analysis result is verified later, adding the analysis result to the video flow database;
and if all the calculated distances from the video clip sequence do not meet the homology condition determined by the empirical threshold, considering that the content information of the encrypted video flow to be analyzed is not recorded in the video flow database.
The length of the encrypted video traffic to be analyzed is variable, as is the sequence of video segments that is ultimately produced. The similarity comparison is carried out on the video fragment sequences with the indefinite length by adopting the Levenstein distance, so that the measurement of the similarity of video content can be realized, and the content analysis of the encrypted video flow is realized.
Further, the collecting video traffic data with labels from the network, specifically collecting video traffic data with network sniffing, specifically collecting video traffic data with Scapy. Scapy is a section of interactive network packet processing tool, packet generator, network scanner, network discovery tool and network sniffing tool written in Python. It provides various kinds of interactive data packet or data packet collection generation, data packet operation, data packet transmission, data packet sniffing, response and feedback matching functions. In combination with automated test tools, automated collection of video traffic may be achieved using a multi-threaded programming approach.
The video fragment sequence conversion module specifically comprises:
first, the encrypted video traffic to be analyzed is divided into a plurality of video segments of the same time.
Each of the video clips is then subdivided into a plurality of ApplicationData according to the TLS protocol.
And finally, forming the number of the application data of each video clip into an indefinite length sequence to obtain a video clip sequence.
The Levenstein distance is lev a,b (|a|,|b|),
Wherein lev is a,b (i, j) is the Lychnian distance between the first i characters of a and the first j characters of b, a is the video clip sequence, |a| is the video clip sequence length, b is the record in the video traffic database, |b| is the length of the record,is an indication function when a i ≠b j It is 0 and otherwise 1.
When the video traffic database is insufficient, a threshold method can also be used to determine whether two encrypted video traffic to be analyzed belong to a video source of the same title, where the threshold method is as follows: when the distance between the standardized rebates Wen Sitan of the two encrypted video flows to be analyzed is smaller than a preset threshold value, the two encrypted video flows belong to the same video source; otherwise, belong to different video sources.
The normalized le Wen Sitan distance is normalized for the levenstein distance,
wherein, normal_ld (a, b) is the normalized ly Wen Sitan distance, and LD (a, b) is the levenstein distance.
The preset threshold is determined using a gamma distribution fitting method.
In order to achieve the above method, the present invention further provides an encrypted video traffic content analysis device based on sequence similarity, which specifically includes:
and the data collection module is used for collecting the video flow data with the labels from the network, classifying and managing the video flow data to form a video flow database, wherein the video flow database comprises a plurality of records.
The video segment sequence conversion module converts the encrypted video traffic to be analyzed into a video segment sequence, specifically: dividing the encrypted video flow to be analyzed into a plurality of video fragments with the same time, dividing each video fragment into a plurality of application data according to a TLS protocol, and forming the number of the application data of each video fragment into an indefinite length sequence to obtain a video fragment sequence.
And the sequence similarity analysis module is used for carrying out sequence similarity analysis on the video fragment sequence, firstly calculating the Levenstein distance between the video fragment sequence and each record in the video flow database, and then selecting the content information of one record with smaller Levenstein distance with the video fragment sequence as an analysis result.
The analysis result module is used for adding the analysis result into the video flow database if the analysis result is verified later; if all the distances from the video segment sequences to the calculated Lei Wen Sitan do not meet the homology condition determined by the empirical threshold, the content information of the encrypted video flow to be analyzed is considered to be not recorded in the video flow database; when the video traffic database is insufficient, determining whether two encrypted video traffic to be analyzed belong to video sources of the same title by using a threshold method, wherein the threshold method is as follows: when the standardized Levenstein distance of two encrypted video flows to be analyzed is smaller than a preset threshold value, the two encrypted video flows belong to the same video source, or else belong to different video sources, and the standardized Levenstein distance is subjected to standardized processing.
To achieve the above object, the present invention also proposes a computer-readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the method described above.
The method has the advantages that the encrypted video flow to be analyzed is converted into the video clip sequence, the time sequence information related to the content in the video flow is fully utilized, and the analysis result is accurate, efficient and reliable. Meanwhile, the Levenstein distance which can calculate the similarity of sequences with different lengths is applied to the content analysis of the encrypted video flow, the problem of unsupervised analysis of the content of the encrypted video flow is solved, and the Levenstein distance calculation has low complexity, is convenient and quick, and accords with the condition of large-scale calculation.
The method can be applied to real-time monitoring of the playing condition of illegal videos in network monitoring work, and has important significance for guaranteeing the content security of network space.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to the structures shown in these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a DASH schematic.
Fig. 2 is a block diagram of an encrypted video traffic content analysis method based on sequence similarity.
Fig. 3 is a basic flow chart of TCP packet reassembly.
Fig. 4 is a network transmission process diagram of a DASH video stream.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
The invention designs an encrypted video flow content analysis method based on sequence similarity, wherein a frame diagram of the method is shown in fig. 2, and the method specifically comprises the following steps:
the first step, collecting video flow data with labels from a network, classifying and managing the video flow data to form a video flow database, wherein the video flow database comprises a plurality of records.
In order to realize automatic collection of encrypted video traffic, an automatic test technology is first required to be used as a support to realize automatic video playing. The automatic test is a process of converting the test behavior driven by human into machine execution, and aims to save manpower, time or hardware resources and improve the test efficiency. Different automatic test frames are needed to be used on different platforms, and then the automatic acquisition of video flow data is realized by combining with a network sniffing packet capturing tool. The Selenium automated test tool used at the Web end can be directly operated in a browser as if it were truly user operated.
The encrypted video traffic data collection may utilize network sniffing, while Scapy is an interactive network packet processing tool, packet generator, network scanner, network discovery tool, and network sniffing tool written in Python. It provides various kinds of interactive data packet or data packet collection generation, data packet operation, data packet transmission, data packet sniffing, response and feedback matching functions. Therefore, for the traffic generated by video frequency playing, the traffic capturing can be realized by adapting to the Scapy. In combination with automated test tools, automated collection of video traffic may be achieved using a multi-threaded programming approach.
And secondly, converting the encrypted video traffic to be analyzed into a video clip sequence through a video clip sequence conversion module.
The video fragment sequence conversion module specifically comprises:
firstly, dividing the encrypted video traffic to be analyzed into a plurality of video clips with the same time;
then, according to the TLS protocol, each video clip is subdivided into a plurality of application data;
and finally, forming the number of the application data of each video clip into an indefinite length sequence to obtain a video clip sequence.
Because the encrypted video traffic to be analyzed is encrypted using TLS, the network monitor cannot recover the transmitted video clip content. But after packet reassembly through the TCP layer, each encrypted video segment can be recovered. First, a TCP session for video transmission is extracted from the traffic according to five tuples (source IP, destination IP, source port, destination port, protocol). And then, according to the TCP field, realizing the recombination of the message. The basic flow of TCP message reassembly is shown in FIG. 3.
The network transmission process of DASH video stream is shown in fig. 4. The transmission procedure is the same as the "request-corresponding" mode of HTTP. The video clips transmitted each time are divided into a plurality of application data again according to the TLS protocol standard, and the number of application data in the encrypted video clips transmitted each time represents the size of each video clip.
And thirdly, carrying out sequence similarity analysis on the video fragment sequence.
The length of the video stream is variable and the resulting sequence of video segments is also variable. How to compare the similarity of the video fragment sequences with different lengths, so as to measure the similarity of the video content, and is also an important method for analyzing the encrypted video content.
Aiming at the characteristic of the variable length of the video flow, the Lewenstein distance of the video fragment sequence is used as a method for measuring the similarity of video content.
The lycenstant distance for strings a and b of lengths |a| and |b| respectively is defined as follows:
wherein the method comprises the steps ofIs an indication function when a i ≠b j Is 0 in value and is 1 in other cases. lev (Lev) a,b (i, j) is the lycenstan distance between the first i characters of a and the first j characters of b.
The smaller the sequence distance from the target video traffic (recorded data in the video traffic database), i.e., the greater the sequence similarity, the greater the content similarity of the two videos is explained in the face of the video traffic to be analyzed. Conversely, if the sequence distance of two videos is larger, the sequence similarity thereof is smaller, i.e., the content difference of the two videos is larger.
Fourthly, if the analysis result is verified later, adding the analysis result into the video flow database; and if all the calculated distances from the video clip sequence do not meet the homology condition determined by the empirical threshold, considering that the content information of the encrypted video flow to be analyzed is not recorded in the video flow database.
When the video traffic database is insufficient or a large number of video traffic lacks known tag information, whether the two video traffic to be analyzed belong to the video source of the same title can also be judged by a threshold method. This threshold is based on the normalized lycenstan distance: when the distance between the standardized Lai Wen Sitan of the two video flows to be analyzed is smaller than the threshold value, the two video flows to be analyzed are determined to belong to the same video source; and when the standardized le Wen Sitan distance of the two video traffic to be analyzed is greater than the threshold value, the two video traffic to be analyzed are determined to belong to different video sources. The threshold value is an empirical value calculated according to the existing video flow data, and the judging accuracy rate can reach 90%.
The Lewenstein distance has a certain relation with the length of the character string, and the maximum value is the maximum length of the two character strings to be compared. To reduce this correlation, the lycenstant distance may be normalized, which is defined as follows:
let two random variables X 1 And X is 2 The distance between video streams of the same video source and the distance between video streams of different video sources are respectively represented. It was verified that both random variables have the characteristics of gamma distribution. The threshold for this video stream distance may be determined using a gamma distribution fit.
Random variable X 1 And X is 2 The distribution can be expressed as:
X 1 ~Ga(α 1 ,β 1 ),X 2 ~Ga(α 2 ,β 2 )
wherein alpha is 1 And alpha 2 All belong to alpha, beta 1 And beta 2 All belong to beta.
The probability density function of the gamma distribution and the probability distribution function are defined as follows:
thus, assuming the distance threshold value as x, it can be demonstrated that when f 1 (x)=f 2 (x) And when the video stream homology judgment is performed according to the threshold value, the correct probability of the video stream homology judgment is maximum. According to equation f 1 (x)=f 2 (x) The method can obtain:
the two numerical solutions of the equation can be obtained by numerical analysis methods such as dichotomy or newton interpolation. Wherein a numerical solution lying between 0 and 1 is the selected threshold.
The invention also provides an encrypted video flow content analysis device based on the sequence similarity, which specifically comprises:
the data collection module is used for collecting video flow data with labels from a network, classifying and managing the video flow data to form a video flow database, wherein the video flow database comprises a plurality of records;
the video segment sequence conversion module converts the encrypted video traffic to be analyzed into a video segment sequence;
the sequence similarity analysis module is used for carrying out sequence similarity analysis on the video fragment sequence, firstly calculating the Levenstein distance between the video fragment sequence and each record in a video flow database, and then selecting the content information of one record with smaller Levenstein distance with the video fragment sequence as an analysis result;
the analysis result module is used for adding the analysis result into the video flow database if the analysis result is verified later; and if all the calculated distances from the video clip sequence do not meet the homology condition determined by the empirical threshold, considering that the content information of the encrypted video flow to be analyzed is not recorded in the video flow database.
The invention also proposes a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the method described above.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.
Claims (8)
1. An encrypted video flow content analysis method based on sequence similarity is characterized by comprising the following steps:
collecting video flow data with labels from a network, and classifying and managing the video flow data to form a video flow database, wherein the video flow database comprises a plurality of records;
for the encrypted video traffic to be analyzed, converting the encrypted video traffic to be analyzed into a video clip sequence by a video clip sequence conversion module, wherein the video clip sequence conversion module specifically comprises: dividing the encrypted video flow to be analyzed into a plurality of video fragments with the same time, dividing each video fragment into a plurality of application data according to a TLS protocol, and forming the number of the application data of each video fragment into an indefinite length sequence to obtain a video fragment sequence;
performing sequence similarity analysis on the video fragment sequence, firstly calculating the Levenstein distance between the video fragment sequence and each record in a video flow database, and then selecting one record content information with smaller Levenstein distance with the video fragment sequence as an analysis result;
if the analysis result is verified later, adding the analysis result to the video flow database; if all the distances from the video segment sequences to the calculated Lei Wen Sitan do not meet the homology condition determined by the empirical threshold, the content information of the encrypted video flow to be analyzed is considered to be not recorded in the video flow database; when the video traffic database is insufficient, determining whether two encrypted video traffic to be analyzed belong to video sources of the same title by using a threshold method, wherein the threshold method is as follows: when the standardized Levenstein distance of two encrypted video flows to be analyzed is smaller than a preset threshold value, the two encrypted video flows belong to the same video source, or else belong to different video sources, and the standardized Levenstein distance is subjected to standardized processing.
2. The method according to claim 1, wherein the tagged video traffic data is collected from a network, in particular by means of network sniffing.
3. The method of claim 2, wherein the network sniffing is Scapy.
4. The method of claim 1, wherein the levenstein distance is lev a,b (|a|,|b|),
Wherein lev is a,b (i, j) is the Lychnian distance between the first i characters of a and the first j characters of b, a is the video clip sequence, |a| is the video clip sequence length, b is the record in the video traffic database, |b| is the length of the record,is an indication function when a i ≠b j Is 0 in value and is 1 in other cases.
5. The method of claim 1, wherein the predetermined threshold is determined using a gamma distribution fitting approach.
6. The method of claim 1, wherein for the normalized Levenstat distance,
wherein LD (a, b) is the Levenstent distance.
7. An encrypted video traffic content analysis device based on sequence similarity, comprising:
the data collection module is used for collecting video flow data with labels from a network, classifying and managing the video flow data to form a video flow database, wherein the video flow database comprises a plurality of records;
the video segment sequence conversion module converts the encrypted video traffic to be analyzed into a video segment sequence, specifically: dividing the encrypted video flow to be analyzed into a plurality of video fragments with the same time, dividing each video fragment into a plurality of application data according to a TLS protocol, and forming the number of the application data of each video fragment into an indefinite length sequence to obtain a video fragment sequence;
the sequence similarity analysis module is used for carrying out sequence similarity analysis on the video fragment sequence, firstly calculating the Levenstein distance between the video fragment sequence and each record in a video flow database, and then selecting the content information of one record with smaller Levenstein distance with the video fragment sequence as an analysis result;
the analysis result module is used for adding the analysis result into the video flow database if the analysis result is verified later; if all the distances from the video segment sequences to the calculated Lei Wen Sitan do not meet the homology condition determined by the empirical threshold, the content information of the encrypted video flow to be analyzed is considered to be not recorded in the video flow database; when the video traffic database is insufficient, determining whether two encrypted video traffic to be analyzed belong to video sources of the same title by using a threshold method, wherein the threshold method is as follows: when the standardized Levenstein distance of two encrypted video flows to be analyzed is smaller than a preset threshold value, the two encrypted video flows belong to the same video source, or else belong to different video sources, and the standardized Levenstein distance is subjected to standardized processing.
8. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111302590.XA CN114025203B (en) | 2021-11-04 | 2021-11-04 | Sequence similarity-based encrypted video flow content analysis method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111302590.XA CN114025203B (en) | 2021-11-04 | 2021-11-04 | Sequence similarity-based encrypted video flow content analysis method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114025203A CN114025203A (en) | 2022-02-08 |
CN114025203B true CN114025203B (en) | 2024-01-23 |
Family
ID=80061338
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111302590.XA Active CN114025203B (en) | 2021-11-04 | 2021-11-04 | Sequence similarity-based encrypted video flow content analysis method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114025203B (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109275045A (en) * | 2018-09-06 | 2019-01-25 | 东南大学 | Mobile terminal encrypted video ad traffic recognition methods based on DFI |
CN109391627A (en) * | 2018-11-20 | 2019-02-26 | 东南大学 | A method of identification tls protocol encrypted transmission YouTube DASH video |
CN109905696A (en) * | 2019-01-09 | 2019-06-18 | 浙江大学 | A kind of recognition methods of the Video service Quality of experience based on encryption data on flows |
CN110598014A (en) * | 2019-09-27 | 2019-12-20 | 腾讯科技(深圳)有限公司 | Multimedia data processing method, device and storage medium |
CN110620937A (en) * | 2019-10-21 | 2019-12-27 | 电子科技大学 | Dynamic self-adaptive encrypted video traffic identification method based on HTTP |
CN111182254A (en) * | 2020-01-03 | 2020-05-19 | 北京百度网讯科技有限公司 | Video processing method, device, equipment and storage medium |
CN111356014A (en) * | 2020-02-18 | 2020-06-30 | 南京中新赛克科技有限责任公司 | Youtube video identification and matching method based on automatic learning |
CN112036518A (en) * | 2020-11-05 | 2020-12-04 | 中国人民解放军国防科技大学 | Application program flow classification method based on data packet byte distribution and storage medium |
-
2021
- 2021-11-04 CN CN202111302590.XA patent/CN114025203B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109275045A (en) * | 2018-09-06 | 2019-01-25 | 东南大学 | Mobile terminal encrypted video ad traffic recognition methods based on DFI |
CN109391627A (en) * | 2018-11-20 | 2019-02-26 | 东南大学 | A method of identification tls protocol encrypted transmission YouTube DASH video |
CN109905696A (en) * | 2019-01-09 | 2019-06-18 | 浙江大学 | A kind of recognition methods of the Video service Quality of experience based on encryption data on flows |
CN110598014A (en) * | 2019-09-27 | 2019-12-20 | 腾讯科技(深圳)有限公司 | Multimedia data processing method, device and storage medium |
CN110620937A (en) * | 2019-10-21 | 2019-12-27 | 电子科技大学 | Dynamic self-adaptive encrypted video traffic identification method based on HTTP |
CN111182254A (en) * | 2020-01-03 | 2020-05-19 | 北京百度网讯科技有限公司 | Video processing method, device, equipment and storage medium |
CN111356014A (en) * | 2020-02-18 | 2020-06-30 | 南京中新赛克科技有限责任公司 | Youtube video identification and matching method based on automatic learning |
CN112036518A (en) * | 2020-11-05 | 2020-12-04 | 中国人民解放军国防科技大学 | Application program flow classification method based on data packet byte distribution and storage medium |
Non-Patent Citations (1)
Title |
---|
基于距离相关图的音频相似性度量方法;李超;熊璋;朱成军;;北京航空航天大学学报(02);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN114025203A (en) | 2022-02-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111277570A (en) | Data security monitoring method and device, electronic equipment and readable medium | |
CN110691070B (en) | Network abnormity early warning method based on log analysis | |
CN110334105B (en) | Stream data abnormity detection method based on Storm | |
CN112491917B (en) | Unknown vulnerability identification method and device for Internet of things equipment | |
CN115967504A (en) | Encrypted malicious traffic detection method and device, storage medium and electronic device | |
CN109376797B (en) | Network traffic classification method based on binary encoder and multi-hash table | |
CN109275045B (en) | DFI-based mobile terminal encrypted video advertisement traffic identification method | |
CN111191720B (en) | Service scene identification method and device and electronic equipment | |
CN114025203B (en) | Sequence similarity-based encrypted video flow content analysis method | |
CN111291028A (en) | High-speed industrial field oriented data acquisition system and method | |
CN114726526B (en) | Terminal sensor data encryption method and system based on Internet of things platform | |
CN112733188B (en) | Sensitive file management method | |
Whalen et al. | Hidden markov models for automated protocol learning | |
CN114205855A (en) | Feeder automation service network anomaly detection method facing 5G slices | |
CN114205151A (en) | HTTP/2 page access flow identification method based on multi-feature fusion learning | |
CN114679606B (en) | Video flow identification method, system, electronic equipment and storage medium based on Burst characteristics | |
CN115766204B (en) | Dynamic IP equipment identification system and method for encrypted traffic | |
CN117241071B (en) | Method for sensing video katon quality difference based on machine learning algorithm | |
CN116089520B (en) | Fault identification method based on blockchain and big data and general computing node | |
CN115766204A (en) | Dynamic IP equipment identification system and method for encrypted flow | |
CN117527446B (en) | Network abnormal flow refined detection method | |
Liu et al. | Video traffic identification with a distribution distance-based feature selection | |
CN113836457B (en) | Mobile internet terminal cache management method, system and storage medium based on information identification and analysis | |
CN115085992B (en) | Detection system and detection method for malicious HTTPS secret channel | |
CN106776794A (en) | A kind of method and system for processing mass data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |