CN116488943B - Multimedia data leakage tracing detection method, device and equipment - Google Patents

Multimedia data leakage tracing detection method, device and equipment Download PDF

Info

Publication number
CN116488943B
CN116488943B CN202310728716.2A CN202310728716A CN116488943B CN 116488943 B CN116488943 B CN 116488943B CN 202310728716 A CN202310728716 A CN 202310728716A CN 116488943 B CN116488943 B CN 116488943B
Authority
CN
China
Prior art keywords
data stream
flow
leakage
data
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310728716.2A
Other languages
Chinese (zh)
Other versions
CN116488943A (en
Inventor
王滨
王晶晶
管晓宏
何承润
王星
王伟
李超豪
张峰
李志强
夏松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hikvision Digital Technology Co Ltd
Original Assignee
Hangzhou Hikvision Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hikvision Digital Technology Co Ltd filed Critical Hangzhou Hikvision Digital Technology Co Ltd
Priority to CN202310728716.2A priority Critical patent/CN116488943B/en
Publication of CN116488943A publication Critical patent/CN116488943A/en
Application granted granted Critical
Publication of CN116488943B publication Critical patent/CN116488943B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/12Discovery or management of network topologies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The application provides a multimedia data leakage tracing detection method, a device and equipment, wherein the method can comprise the following steps: determining a leakage data stream and an associated data stream corresponding to the leakage data stream from all multimedia data streams; determining a target leakage path corresponding to the leakage data flow, and generating a data flow map based on a background flow map and the target leakage path; the background flow map comprises topological relations among all hosts in an intranet, and the target leakage path comprises an associated data flow and a plurality of candidate hosts through which the leakage data flow passes; marking a plurality of candidate hosts on a background flow chart, and marking node attributes corresponding to each candidate host on the background flow chart to obtain a data flow chart; inputting the data flow map to a trained map anomaly detection model to obtain anomaly scores corresponding to each candidate host, and determining the anomaly hosts based on the anomaly scores corresponding to each candidate host. By the scheme of the application, more accurate abnormal host tracing can be realized.

Description

Multimedia data leakage tracing detection method, device and equipment
Technical Field
The present application relates to the field of network security technologies, and in particular, to a method, an apparatus, and a device for detecting multimedia data leakage tracing.
Background
The internet of things equipment comprises IPC (Internet Protocol Camera, a network camera), DVR (Digital Video Recorder, a hard disk video recorder), NVR (Network Video Recorder, a network video recorder) and the like, along with the rapid development of internet of things technology, the internet of things equipment in the network is more and more, the data of the internet of things equipment is more and more leaked, the information quantity of multimedia data (such as voice, pictures, videos and the like) of the internet of things equipment is very rich, and once the multimedia data is leaked, a large amount of private data can be leaked.
When the multimedia data is leaked, it is required to timely detect which internet of things device is leaked, and then safety management (such as upgrading safety software, shutting down the device, etc.) is performed on the internet of things device, so that the multimedia data is prevented from being leaked again. However, how to detect which internet of things device leaks multimedia data is not an effective detection method in the related art, and it is not possible to accurately detect which internet of things device leaks multimedia data, which results in unavoidable leakage of multimedia data.
Disclosure of Invention
In view of the above, the application provides a method, a device and equipment for detecting multimedia data leakage tracing, which can accurately detect which internet of things equipment is subject to multimedia data leakage, and ensure the safety of data.
In one aspect, the present application provides a method for detecting leakage trace source of multimedia data, the method comprising:
determining a leakage data stream and associated data streams corresponding to the leakage data stream from all multimedia data streams, wherein the leakage data stream is a data stream from an intranet host to an external network, the associated data stream is a data stream from the intranet host to the intranet host, and the flow characteristics of the leakage data stream and the flow characteristics of the associated data stream meet a similarity condition;
determining a target leakage path corresponding to the leakage data flow, and generating a data flow map based on a background flow map and the target leakage path; the background flow map comprises a topological relation among hosts in an intranet, and the target leakage path comprises a plurality of candidate hosts through which the associated data flow and the leakage data flow pass; marking the plurality of candidate hosts on the background flow map, and marking node attributes corresponding to each candidate host on the background flow map to obtain the data flow map;
and inputting the data flow map to a trained map anomaly detection model to obtain anomaly scores corresponding to each candidate host, and determining an anomaly host based on the anomaly scores corresponding to each candidate host.
In another aspect, the present application provides a multimedia data leakage tracing detection apparatus, the apparatus including:
the determining module is used for determining a leakage data stream and an associated data stream corresponding to the leakage data stream from all the multimedia data streams; the leakage data flow is a data flow from an intranet host to an extranet, the association data flow is a data flow from the intranet host to the intranet host, and the flow characteristics of the leakage data flow and the flow characteristics of the association data flow meet a similarity condition;
the generation module is used for determining a target leakage path corresponding to the leakage data flow and generating a data flow map based on a background flow map and the target leakage path; the background flow map comprises a topological relation among hosts in an intranet, and the target leakage path comprises a plurality of candidate hosts through which the associated data flow and the leakage data flow pass; marking the plurality of candidate hosts on the background flow map, and marking node attributes corresponding to each candidate host on the background flow map to obtain the data flow map;
the processing module is used for inputting the data flow map to the trained map anomaly detection model to obtain anomaly scores corresponding to each candidate host, and determining the anomaly hosts based on the anomaly scores corresponding to each candidate host.
In another aspect, the present application provides an electronic device, including: a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor; the processor is used for executing the machine executable instructions to realize the multimedia data leakage tracing detection method.
In another aspect, the application provides a machine-readable storage medium storing machine-executable instructions executable by a processor; the processor is configured to execute the machine executable instruction to implement the above-mentioned multimedia data leakage tracing detection method.
In another aspect, the present application provides a computer program stored in a machine-readable storage medium, which when executed by a processor, causes the processor to implement the above-described multimedia data leakage trace-source detection method.
According to the technical scheme, in the embodiment of the application, the target leakage path is determined based on the leakage data flow and the associated data flow corresponding to the leakage data flow, the data flow map is generated based on the background flow map and the target leakage path, and then the abnormal host (such as the abnormal Internet of things equipment) is determined based on the data flow map, namely, the abnormal host is detected through the data flow map, so that more accurate abnormal host tracing is realized. By aggregating data transmission events associated with data leakage events, drawing a data flow map, and detecting an abnormal host on a target leakage path, the abnormal host on the entire target leakage path can be located.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the following description will briefly describe the drawings required to be used in the embodiments of the present application or the description in the prior art, and it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings may be obtained according to these drawings of the embodiments of the present application for a person having ordinary skill in the art.
Fig. 1 is a flowchart of a multimedia data leakage trace-source detection method according to an embodiment of the present application;
FIG. 2 is a schematic view of an application scenario in an embodiment of the present application;
FIG. 3 is a schematic view of an application scenario in another embodiment of the present application;
fig. 4 is a flowchart of a multimedia data leakage trace-source detection method according to an embodiment of the present application;
FIGS. 5A-5C are schematic diagrams of data flow patterns in one embodiment of the present application;
fig. 6 is a block diagram of a multimedia data leakage trace-source detection apparatus according to an embodiment of the present application;
fig. 7 is a hardware configuration diagram of an electronic device in an embodiment of the application.
Detailed Description
The terminology used in the embodiments of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to any or all possible combinations including one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used in embodiments of the present application to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the application. Depending on the context, furthermore, the word "if" used may be interpreted as "at … …" or "at … …" or "in response to a determination".
The embodiment of the application provides a multimedia data leakage tracing detection method, which can be applied to any type of equipment, and is shown in fig. 1, and is a flow diagram of the method, and the method can comprise the following steps:
and 101, determining a leakage data stream and an associated data stream corresponding to the leakage data stream from all the multimedia data streams. The leakage data flow may be a data flow from an intranet host to an external network (i.e., an external network device), and the associated data flow may be a data flow from one intranet host to another intranet host, where a flow characteristic of the leakage data flow and a flow characteristic of the associated data flow satisfy a similarity condition.
Step 102, determining a target leakage path corresponding to the leakage data flow, and generating a data flow map based on the background flow map and the target leakage path. The background traffic map may include a topological relationship between hosts in an intranet, and the target leakage path may include a plurality of candidate hosts through which the associated data stream and the leakage data stream pass. The background traffic map may be used to mark a plurality of candidate hosts, and the node attribute corresponding to each candidate host may be marked on the background traffic map, so as to obtain the data flow map.
Step 103, inputting the data flow map to a trained map anomaly detection model to obtain anomaly scores corresponding to each candidate host, and determining an anomaly host based on the anomaly scores corresponding to each candidate host.
Illustratively, prior to step 101, a multimedia data stream may also be determined from all network traffic. For example, the multimedia data stream may be determined as follows: determining a second network flow corresponding to the first network flow according to each acquired first network flow, wherein the destination address of the second network flow is the same as the source address of the first network flow, and the source address of the second network flow is the same as the destination address of the first network flow; and if the absolute value of the difference value between the byte number corresponding to the first network flow and the byte number corresponding to the second network flow is larger than a first threshold value, determining the first network flow as the multimedia data flow.
Illustratively, for step 101, determining the leakage data stream and the associated data stream corresponding to the leakage data stream from all the multimedia data streams may include, but is not limited to: for each multimedia data stream, if the source address of the multimedia data stream is located in a configured address field, the address field may include addresses of all hosts in the intranet, and the destination address of the multimedia data stream is not located in the address field, determining that the multimedia data stream is a leakage data stream; if the source address of the multimedia data stream is located in the address field and the destination address of the multimedia data stream is located in the address field, determining that the multimedia data stream is a candidate data stream.
For each leakage data stream, selecting an associated data stream corresponding to the leakage data stream from all candidate data streams based on the flow characteristics of the leakage data stream and the flow characteristics of each candidate data stream; wherein the traffic characteristics include acquisition time and byte count; the absolute value of the difference between the acquisition time of the associated data stream and the acquisition time of the leakage data stream is smaller than a second threshold value, and the absolute value of the difference between the byte number of the associated data stream and the byte number of the leakage data stream is smaller than a third threshold value.
Illustratively, selecting the associated data stream corresponding to the leakage data stream from all candidate data streams based on the traffic characteristics of the leakage data stream and the traffic characteristics of each candidate data stream may include, but is not limited to: calculating the similarity between the flow characteristics of the leakage data flow and the flow characteristics of each candidate data flow; based on the similarity between the flow characteristics, a clustering algorithm can be adopted to determine a clustering set taking the leakage data flow as a clustering center, the clustering set can comprise associated data flows corresponding to the leakage data flow, and the similarity between the flow characteristics of the leakage data flow and the flow characteristics of the associated data flow meets a similarity condition; if the similarity is a euclidean distance, the satisfaction of the similarity condition means that: the Euclidean distance between the flow characteristics of the compromised data stream and the flow characteristics of the associated data stream is less than a preset distance threshold.
Illustratively, for step 102, determining a target leakage path corresponding to the leakage data stream may include, but is not limited to: selecting a target associated data stream from all associated data streams corresponding to the leakage data stream; the acquisition time of the target associated data stream may be earlier than the acquisition time of the leakage data stream; determining a data stream arrangement sequence based on the source address of the leaked data stream, the source address and the destination address of each target associated data stream; wherein the destination address of the previous target associated data stream is the source address of the next target associated data stream, and the destination address of the last target associated data stream is the source address of the revealing data stream; and determining a target leakage path corresponding to the leakage data flow based on the data flow arrangement sequence.
Illustratively, for step 103, the data flow graph is input to a trained graph anomaly detection model to obtain an anomaly score corresponding to each candidate host, which may include, but is not limited to: if the anomaly detection model comprises an encoder and a decoder, inputting the data flow map to the encoder, and converting the data flow map into hidden layer characteristics through the encoder; inputting the hidden layer characteristics to a decoder, and reconstructing an adjacent matrix and a characteristic matrix of the data flow map based on the hidden layer characteristics by the decoder to obtain an adjacent reconstruction matrix and a characteristic reconstruction matrix; wherein, the adjacency matrix represents the topological structure of the data flow graph, and the feature matrix represents the node attribute of each candidate host in the data flow graph; and determining the anomaly score corresponding to each candidate host based on the difference value between the adjacent reconstruction matrix and the adjacent matrix and the difference value between the characteristic reconstruction matrix and the characteristic matrix.
According to the technical scheme, in the embodiment of the application, the target leakage path is determined based on the leakage data flow and the associated data flow corresponding to the leakage data flow, the data flow map is generated based on the background flow map and the target leakage path, and then the abnormal host (such as the abnormal Internet of things equipment) is determined based on the data flow map, namely, the abnormal host is detected through the data flow map, so that more accurate abnormal host tracing is realized. By aggregating data transmission events associated with data leakage events, drawing a data flow map, and detecting an abnormal host on a target leakage path, the abnormal host on the entire target leakage path can be located.
The technical scheme of the embodiment of the application is described below with reference to specific application scenarios.
When the multimedia data is leaked, which intranet host computer needs to be detected in time to leak the multimedia data, and then the intranet host computer is safely managed, so that the multimedia data is prevented from being leaked again. However, how to detect which intranet host has the multimedia data leakage is not an effective detection method in the related art, and it is not possible to accurately detect which intranet host has the multimedia data leakage.
Aiming at the discovery, the embodiment of the application provides a multimedia data leakage tracing detection method, which can determine an abnormal host (namely an intranet host generating multimedia data leakage) based on a data flow map, namely detect the abnormal host through the data flow map, thereby realizing more accurate tracing of the abnormal host.
Referring to fig. 2, an application scenario of the method is shown, where the intranet includes a large number of hosts (such as hosts 21-26, of course, only 6 hosts are taken as an example here, and the number of hosts may be more or less), and these hosts may interact with each other through a gateway device, or may interact with an external network device through a gateway device. The host may be an internet of things device, or may be other types of devices, such as a personal computer, a notebook computer, a smart phone, etc., which is not limited. The gateway device may be a router, a switch, etc.
The multimedia data leakage tracing detection method can be applied to gateway equipment, the gateway equipment can acquire network traffic interacted between hosts and network traffic interacted between the hosts and external network equipment, and the multimedia data leakage tracing detection is realized based on the network traffic. Or the multimedia data leakage tracing detection method can be applied to intelligent analysis equipment, and is shown in fig. 3, and is another application scene schematic diagram of the method, the intelligent analysis equipment is connected with gateway equipment, the gateway equipment can acquire network traffic interacted between hosts and external network equipment, the network traffic is sent to the intelligent analysis equipment, and the intelligent analysis equipment realizes multimedia data leakage tracing detection based on the network traffic.
In the above application scenario, taking an example that the multimedia data leakage tracing detection method is applied to the intelligent analysis device, a flow schematic diagram of the multimedia data leakage tracing detection method may be shown in fig. 4.
Step 401, obtaining network traffic for an intranet host, where the network traffic may include network traffic interacted between the intranet host and the intranet host, and network traffic interacted between the intranet host and an external network device.
The intelligent analysis device may acquire the network traffic of the host in a traffic bypass manner, for example, the gateway device copies a copy of the network traffic to the intelligent analysis device on the basis of forwarding the network traffic each time the gateway device receives the network traffic, so that the intelligent analysis device may acquire the network traffic.
For each network flow, the intelligent analysis device may further determine information such as an acquisition time, a flow identifier, a source address, a destination address, and a byte number corresponding to the network flow. The acquisition time represents the time when the intelligent analysis device obtains the network flow. The flow identifier represents the unique identifier of the network traffic and is allocated to the network traffic by the intelligent analysis device. The source address may include a source IP address and a source port, or the source address may be a source IP address. The destination address may include a destination IP address and a destination port, or the destination address may be a destination IP address. The number of bytes represents the number of streaming bytes of the network traffic, i.e. the size of the network traffic.
Step 402, determining the multimedia data stream from all network traffic.
For each acquired network traffic, the network traffic may be recorded as a first network traffic, and based on a source address and a destination address corresponding to each network traffic, the intelligent analysis device may determine a second network traffic corresponding to the first network traffic, a destination address of the second network traffic being the same as the source address of the first network traffic, and a source address of the second network traffic being the same as the destination address of the first network traffic.
For example, assuming that the first network traffic is the network traffic sent by the host 21 to the host 22, the second network traffic is the network traffic sent by the host 22 to the host 21, i.e. the source address of the first network traffic is the address of the host 21, the destination address of the first network traffic is the address of the host 22, the destination address of the second network traffic is the address of the host 21, and the source address of the second network traffic is the address of the host 22.
For example, based on the number of bytes corresponding to each network traffic, if the absolute value of the difference between the number of bytes corresponding to the first network traffic and the number of bytes corresponding to the second network traffic is greater than a first threshold (which may be empirically configured), the first network traffic is determined to be a multimedia data stream. And if the absolute value of the difference value between the byte number corresponding to the first network flow and the byte number corresponding to the second network flow is not greater than a first threshold value, prohibiting the first network flow from being determined as the multimedia data flow. Obviously, after the above-described processing is performed for each first network traffic, the multimedia data stream can be determined from all the network traffic.
For example, the multimedia data stream may include, but is not limited to, voice, picture, video, etc., when the multimedia data leak occurs, the intranet host (e.g. host 23) sends the multimedia data stream to the extranet device, and the extranet device sends the non-multimedia data stream (e.g. control command, etc.) to the intranet host, so if the first network traffic is the multimedia data stream sent by the intranet host to the extranet device, and the second network traffic is the non-multimedia data stream sent by the extranet device to the intranet host, the number of bytes corresponding to the first network traffic will be far greater than the number of bytes corresponding to the second network traffic (the number of bytes of voice, picture, video is very large).
Based on the above principle, if the difference between the number of bytes corresponding to the first network traffic and the number of bytes corresponding to the second network traffic is large (i.e. the absolute value of the difference is greater than the first threshold), it can be determined that the first network traffic is a multimedia data stream. Otherwise, if the difference between the number of bytes corresponding to the first network traffic and the number of bytes corresponding to the second network traffic is smaller, it may be determined that the first network traffic is not the multimedia data stream.
In the above process, whether the network traffic is the multimedia data stream is determined based on the byte number corresponding to the network traffic, instead of analyzing the network traffic to determine whether the network traffic is the multimedia data stream (for example, analyzing a certain protocol field of the network traffic, where the protocol field is used to indicate whether the network traffic is the multimedia data stream), so that whether the network traffic is the multimedia data stream can be obtained even for the encrypted network traffic.
Step 403, determining, for each multimedia data stream, a traffic characteristic corresponding to the multimedia data stream, where the traffic characteristic includes, but is not limited to, at least one of the following: acquisition time and number of bytes.
For example, the intelligent analysis device has obtained an acquisition time of each network traffic, and when a certain network traffic is determined as a multimedia data stream, the acquisition time of the network traffic is the corresponding acquisition time of the multimedia data stream, and the acquisition time represents the time when the intelligent analysis device obtains the multimedia data stream.
The intelligent analysis device has obtained the byte number of each network traffic, and when a certain network traffic is determined as a multimedia data stream, the byte number of the network traffic is the byte number corresponding to the multimedia data stream, and the byte number represents the streaming byte number of the multimedia data stream, namely the size of the multimedia data stream.
Step 404, determining a leakage data stream from all the multimedia data streams, wherein the leakage data stream may be a multimedia data stream from an intranet host to an extranet (i.e. an extranet device).
For example, an address field may be preconfigured that may include addresses (e.g., IP addresses, etc.) of all hosts in the intranet, but not addresses of devices in the extranet. Based on this, for each multimedia data stream, if the source address of the multimedia data stream is located in the address field and the destination address of the multimedia data stream is not located in the address field, that is, the multimedia data stream is a multimedia data stream from the intranet host to the extranet device, it is determined that the multimedia data stream is a leakage data stream. Or if the source address of the multimedia data stream is located in the address field and the destination address of the multimedia data stream is located in the address field, that is, the multimedia data stream is a multimedia data stream from one intranet host to another intranet host, determining that the multimedia data stream is not a leakage data stream can determine that the multimedia data stream is a candidate data stream.
For example, if there is a leakage data stream in all the multimedia data streams, that is, there is a multimedia data stream from the intranet host to the extranet device, alarm information may be generated, for example, alarm information is sent to the designated device, where the alarm information is used to indicate that there is a data leakage event from the intranet host to the extranet device.
Step 405, for each leakage data stream, determining an associated data stream corresponding to the leakage data stream from all multimedia data streams, where the associated data stream is a data stream from one intranet host to another intranet host, and the flow characteristics of the leakage data stream and the flow characteristics of the associated data stream meet a similarity condition.
For each leakage data stream, the associated data stream corresponding to the leakage data stream can be selected from all candidate data streams based on the flow characteristics of the leakage data stream and the flow characteristics of each candidate data stream. Since each candidate data stream is a multimedia data stream from one intranet host to another intranet host, when the associated data stream is selected from all the candidate data streams, the associated data stream is also a multimedia data stream from one intranet host to another intranet host. When the associated data stream is selected from all the candidate data streams, the flow characteristics of the associated data stream (i.e. the selected candidate data stream) and the flow characteristics of the leakage data stream need to meet the similarity condition, and the selection mode is not limited as long as the flow characteristics of the associated data stream and the flow characteristics of the leakage data stream meet the similarity condition.
Illustratively, the flow characteristics may include an acquisition time and a number of bytes, and the flow characteristics of the leakage data stream satisfy a similarity condition with the flow characteristics of the associated data stream, may include, but is not limited to, at least one of: the absolute value of the difference between the acquisition time of the associated data stream and the acquisition time of the leakage data stream is less than a second threshold (which may be empirically configured), and the absolute value of the difference between the number of bytes of the associated data stream and the number of bytes of the leakage data stream is less than a third threshold (which may be empirically configured).
For example, for a certain candidate data stream, if the absolute value of the difference between the acquisition time of the candidate data stream and the acquisition time of the leaked data stream is smaller than the second threshold value and the absolute value of the difference between the byte number of the candidate data stream and the byte number of the leaked data stream is smaller than the third threshold value, the candidate data stream is used as the associated data stream corresponding to the leaked data stream, otherwise, the candidate data stream is not used as the associated data stream corresponding to the leaked data stream.
In one possible implementation manner, based on the flow characteristics of the leakage data stream and the flow characteristics of each candidate data stream, in order to select an associated data stream corresponding to the leakage data stream from all candidate data streams, the following manner may be adopted: calculating the similarity between the flow characteristics of the leakage data flow and the flow characteristics of each candidate data flow; based on the similarity between the flow characteristics of the leakage data stream and the flow characteristics of each candidate data stream, a clustering algorithm may be used to determine a clustering set with the leakage data stream as a clustering center, where the clustering set may include associated data streams corresponding to the leakage data stream, and the similarity between the flow characteristics of the leakage data stream and the flow characteristics of the associated data stream satisfies a similarity condition.
For example, the flow characteristics of the leakage data stream may be converted into a feature matrix, the flow characteristics of the candidate data stream may be converted into a feature matrix, and then the similarity between the two feature matrices may be calculated, where the similarity may be a euclidean distance, a cosine similarity, or the like, and the type of the similarity is not limited.
The clustering algorithm is to divide a data set into different classes or clusters according to a specific standard (such as distance), so that the similarity of data objects in the same cluster is as large as possible, and meanwhile, the difference of data objects not in the same cluster is also as large as possible, namely, the data in the same class after clustering are gathered together as much as possible, and the data in different classes are separated as much as possible, such as a K-means clustering algorithm, a density-based clustering algorithm and the like.
The input of the clustering algorithm is the flow characteristics of a plurality of leakage data flows (such as K leakage data flows) and the flow characteristics of all candidate data flows, K clustering sets (i.e. clustering clusters) corresponding to the K leakage data flows can be determined based on the clustering algorithm, and for each clustering set corresponding to the leakage data flows, the clustering center of the clustering set is the leakage data flow (such as the flow characteristics of the leakage data flows), and the clustering set also comprises at least one candidate data flow (such as the flow characteristics of the candidate data flows) which are used as associated data flows corresponding to the leakage data flows. Obviously, when clustering is performed based on a clustering algorithm, a clustering set taking the leaked data stream as a clustering center is determined based on the similarity between the flow characteristics of the leaked data stream and the flow characteristics of each candidate data stream, and the similarity between the flow characteristics of the candidate data stream and the flow characteristics of the leaked data stream in the clustering set meets a similarity condition, so that the clustering process is not limited.
For example, if the similarity is a euclidean distance, the similarity between the flow characteristics of the leakage data stream and the flow characteristics of the associated data stream meeting the similarity condition means that: the Euclidean distance between the flow characteristics of the compromised data stream and the flow characteristics of the associated data stream is less than a preset distance threshold.
For example, in the clustering process based on the clustering algorithm, if the euclidean distance between the flow characteristic of the candidate data stream and the flow characteristic of a certain leakage data stream is smaller than a preset distance threshold, the candidate data stream may be added to a cluster set with the leakage data stream as a cluster center.
In summary, the related events of the data leakage event can be summarized, that is, the leaked data stream corresponds to the data leakage event, the related data stream corresponding to the leaked data stream (that is, the related event of the data leakage event) can be determined from all the multimedia data streams, and obviously, since the flow characteristics generated by the same data leakage event have higher similarity, the multimedia data streams belonging to the same data leakage event in unit time can be collected together through a clustering algorithm, and the multimedia data streams are related events.
Step 406, for each leakage data stream, determining a target leakage path corresponding to the leakage data stream, where the target leakage path may include a plurality of candidate hosts through which the associated data stream and the leakage data stream pass.
For example, the target associated data stream may be selected from all associated data streams corresponding to the leaked data stream, and the acquisition time of the target associated data stream may be earlier than the acquisition time of the leaked data stream.
For example, since the leakage data stream is a multimedia data stream from the intranet host to the extranet device, that is, a multimedia data stream from the last node of the intranet to the extranet device, if the acquisition time of the associated data stream is earlier than the acquisition time of the leakage data stream, the associated data stream may belong to the same leakage path, and if the acquisition time of the associated data stream is later than the acquisition time of the leakage data stream, the associated data stream does not belong to the same leakage path.
Determining the data stream arrangement sequence based on the source address of the leakage data stream, the source address and the destination address of each target associated data stream, wherein the destination address of the former target associated data stream is the source address of the latter target associated data stream, and the destination address of the last target associated data stream is the source address of the leakage data stream. For example, assuming that the source address of the leakage data stream is 111, the source address of the target associated data stream A1 is 222, the destination address is 111, the source address of the target associated data stream A2 is 333, and the destination address is 222, the data stream arrangement order is the target associated data stream A2, the target associated data stream A1, and the leakage data stream.
And determining a target leakage path corresponding to the leakage data flow based on the data flow arrangement sequence. Referring to fig. 3, assuming that the address of the host 21 is 111, the address of the host 22 is 222, the address of the host 23 is 333, the address of the host 24 is 444, the address of the host 25 is 555, the address of the host 26 is 666, the source address of the target associated data stream A2 is 333, the source address of the corresponding host 23, the source address of the target associated data stream A1 is 222, the corresponding host 22, the source address of the leaking data stream is 111, and the corresponding host 21, it is possible to determine that the target leak path is the host 23 (address 333) -the host 22 (address 222) -the host 21 (address 111).
In summary, the target leakage path of the multimedia data stream may be reshaped based on the source address of the leakage data stream, the source address and the destination address of each target associated data stream, for example, a transmits the multimedia data stream to B, and B transmits the multimedia data stream to the external network device, and then there is a leakage path of a- > B- > external network.
Step 407, generating a data flow map based on the background flow map and the target leakage path. The background traffic map may include a topological relationship (i.e., a connection relationship) between hosts in the intranet. In order to generate the data flow map, a plurality of candidate hosts may be marked on the background traffic map, and node attributes corresponding to each candidate host may be marked on the background traffic map, so as to obtain the data flow map.
For example, referring to fig. 5A, the background traffic map may be generated based on all network traffic, if there is network traffic from the host 21 to the host 22, there is a connection relationship between the host 21 and the host 22, and if there is no network traffic from the host 21 to the host 26, there is no connection relationship between the host 21 and the host 26, or may be preconfigured, which is not limited.
Referring to fig. 5B, assuming that the target leakage path is host 23-host 22-host 21, these candidate hosts (i.e., host 23, host 22, host 21) may be marked on the background traffic, i.e., represent hosts that are traversed by the target leakage path, and the direction of the target leakage path is marked on the background traffic, and obviously, hosts that are not marked are not hosts traversed by the target leakage path. In fig. 5B, the candidate hosts are represented by ellipses, i.e., marked, or marked in other ways.
Referring to fig. 5C, the node attribute corresponding to each candidate host may be further marked on the background traffic map, and of course, the node attribute corresponding to each non-candidate host may also be marked on the background traffic map. The node attributes may include, but are not limited to, at least one of: whether the target leakage path is included (for example, the candidate host corresponds to a first value, and the non-candidate host corresponds to a second value, wherein the first value is indicative of the target leakage path, and the second value is indicative of the non-target leakage path); node type (e.g., host belongs to edge node, host belongs to non-edge node); traffic characteristics (i.e., traffic characteristics of multimedia data streams with source addresses that are either candidate hosts or non-candidate hosts); traffic anomalies (e.g., whether the multimedia data stream of a candidate host or a non-candidate host is anomalous). Of course, the above are just a few examples of node attributes, and there is no limitation on such node attributes.
In fig. 5C, the type of node attribute is denoted by f1, f2, f3, f4, the value of the node attribute is denoted by x11-x14, x21-x24, x31-x34, x41-x44, x51-x54, x61-x64, if x11 has a value of 1, the host 21 belongs to the target leakage path, x21 has a value of 1, the host 22 belongs to the target leakage path, x41 has a value of 0, the host 24 does not belong to the target leakage path, and so on.
The background traffic map of labeled candidate hosts and node attributes may be referred to as a data-flow map, also referred to as a multimedia data-flow map, which may be seen in fig. 5C.
Step 408, inputting the data flow map to the trained map anomaly detection model to obtain anomaly scores corresponding to each candidate host, and determining an anomaly host based on the anomaly scores corresponding to each candidate host.
The graph anomaly detection model may be a graph learning-based self encoder (AutoEncoder), for example, or may be another type of network model, and is not limited thereto. Wherein the self-encoder is an unsupervised learning model, based on a back propagation algorithm and an optimization method (such as gradient descent method), the input data X is used as supervision to guide the model to attempt to learn a mapping relationship, thereby obtaining a reconstruction output X R . In a time series anomaly detection scenario, anomalies are a minority of normal if reconstructed output X from the encoder is used R If the difference from the input data X exceeds a certain threshold, the original time sequence is abnormal. In addition, the drawing study refers to drawingThe machine learning can directly convert the graph data into the output of the graph learning architecture without mapping the graph to a low-dimensional space by mapping the features of the graph to feature vectors with the same dimension in the embedded space. Deep learning techniques can encode and represent the graph data as vectors, the output vector of the graph learning being in continuous space with the goal of extracting the ideal features of the graph.
Taking the example that the graph anomaly detection model is a self-encoder, the graph anomaly detection model can comprise an encoder and a decoder, and the training process of the graph anomaly detection model is not limited. Based on the trained graph anomaly detection model, the data-flow graph can be input to an encoder, and the data-flow graph can be converted into hidden layer features, which can also be called intermediate features, by the encoder. And then, inputting the hidden layer characteristics to a decoder, and reconstructing an adjacent matrix and a characteristic matrix of the data flow map based on the hidden layer characteristics by the decoder so as to obtain an adjacent reconstruction matrix corresponding to the adjacent matrix and a characteristic reconstruction matrix corresponding to the characteristic matrix.
For example, the data flow map includes a topological relation among hosts in an intranet, a target leakage path, and candidate hosts through which the target leakage path passes, and these information reflect the topological structure of the data flow map, so that the topological structure of the data flow map can be converted into a vector by the graph anomaly detection model, and the vector is called an adjacency matrix of the data flow map, and the adjacency matrix is used for representing the topological structure of the data flow map.
The data flow graph further comprises node attributes corresponding to each candidate host and node attributes corresponding to each non-candidate host, and the information reflects the node attributes of each host in the data flow graph, so that the node attributes of the data flow graph can be converted into vectors through the graph anomaly detection model, the vectors are called as feature matrixes of the data flow graph, and the feature matrixes are used for representing the node attributes of each host in the data flow graph, such as the node attributes of each candidate host and the node attributes corresponding to each non-candidate host.
After the hidden layer feature is input to the decoder, the hidden layer feature can be processed by the decoder, so that an adjacent reconstruction matrix corresponding to the adjacent matrix and a feature reconstruction matrix corresponding to the feature matrix are obtained.
And determining the anomaly score corresponding to each candidate host based on the difference value between the adjacent reconstruction matrix and the adjacent matrix and the difference value between the characteristic reconstruction matrix and the characteristic matrix. For example, assuming that the target leakage path is host 23-host 22-host 21, that is, host 23, host 22, and host 21 are candidate hosts, the adjacency reconstruction matrix includes adjacency reconstruction values of host 21, the adjacency matrix includes adjacency values of host 21, a difference between the adjacency reconstruction values and the adjacency values can be calculated, the feature reconstruction matrix includes feature reconstruction values of host 21, the feature matrix includes feature values of host 21, a difference between the feature reconstruction values and the feature values can be calculated, and the anomaly score corresponding to host 21 is determined based on the two differences. Obviously, if the sum of the two differences is larger, the reconstruction error is indicated to be larger, and the anomaly score corresponding to the host 21 is larger, that is, the probability that the host 21 is an anomaly node is larger. Similarly, the anomaly score corresponding to the host 22 and the anomaly score corresponding to the host 23 may be determined.
After the anomaly score corresponding to each candidate host is obtained, the candidate host with the largest anomaly score can be used as the anomaly host. Or based on the anomaly score corresponding to each candidate host, if the anomaly score corresponding to the candidate host is greater than a preset score threshold, the candidate host is used as the anomaly host, and if the anomaly score corresponding to the candidate host is not greater than the preset score threshold, the candidate host is not used as the anomaly host.
In summary, the method can determine the intranet abnormal host based on the data flow map, and detect the intranet abnormal host by combining the network structure and the node attribute of the target leakage path in the background flow map. In order to find an abnormal host which is obviously deviated from a normal host, the abnormal host in the network is measured by utilizing the error of the reconstruction of the network structure and the node attribute, the network structure and the node attribute are input, a model is constructed based on a self-encoder, the implicit characteristics of the network structure and the node attribute are learned by using the encoder, the network structure and the node attribute are reconstructed by using a decoder, and the abnormal host in the intranet is determined based on the abnormal value.
In the process, data transmission events related to the data leakage events are aggregated, a data flow map is drawn, an intranet abnormal host of the data leakage events is detected based on the data flow map, the abnormal host is adaptively inferred according to the network structure and the node attribute, and the abnormal host on the whole data leakage flow path is positioned.
Based on the same application concept as the above method, an embodiment of the present application provides a multimedia data leakage tracing detection device, as shown in fig. 6, which is a schematic structural diagram of the device, where the device includes:
A determining module 61, configured to determine a leakage data stream and an associated data stream corresponding to the leakage data stream from all multimedia data streams; the leakage data flow is a data flow from an intranet host to an extranet, the association data flow is a data flow from the intranet host to the intranet host, and the flow characteristics of the leakage data flow and the flow characteristics of the association data flow meet a similarity condition; a generating module 62, configured to determine a target leakage path corresponding to the leakage data flow, and generate a data flow map based on a background flow map and the target leakage path; the background flow map comprises a topological relation among hosts in an intranet, and the target leakage path comprises a plurality of candidate hosts through which the associated data flow and the leakage data flow pass; marking the plurality of candidate hosts on the background flow map, and marking node attributes corresponding to each candidate host on the background flow map to obtain the data flow map; the processing module 63 is configured to input the data flow graph to the trained graph anomaly detection model, obtain anomaly scores corresponding to each candidate host, and determine an anomaly host based on the anomaly scores corresponding to each candidate host.
The determining module 61 is further configured to determine, for each of the acquired first network traffic, a second network traffic corresponding to the first network traffic, where a destination address of the second network traffic is the same as a source address of the first network traffic, and a source address of the second network traffic is the same as a destination address of the first network traffic; and if the absolute value of the difference value between the byte number corresponding to the first network flow and the byte number corresponding to the second network flow is larger than a first threshold value, determining the first network flow as a multimedia data flow.
Illustratively, the determining module 61 is specifically configured to, when determining a leakage data stream from all multimedia data streams: for each multimedia data stream, if the source address of the multimedia data stream is located in a configured address segment, the address segment comprises the addresses of all hosts in the intranet, and the destination address of the multimedia data stream is not located in the address segment, determining that the multimedia data stream is a leakage data stream; if the source address of the multimedia data stream is located in the address segment and the destination address of the multimedia data stream is located in the address segment, determining that the multimedia data stream is a candidate data stream.
The determining module 61 is specifically configured to, when determining, from all multimedia data streams, an associated data stream corresponding to the leakage data stream: for each leakage data stream, selecting an associated data stream corresponding to the leakage data stream from all candidate data streams based on the flow characteristics of the leakage data stream and the flow characteristics of each candidate data stream; wherein the traffic characteristics include acquisition time and byte count; the absolute value of the difference between the acquisition time of the associated data stream and the acquisition time of the leakage data stream is smaller than a second threshold value, and the absolute value of the difference between the byte number of the associated data stream and the byte number of the leakage data stream is smaller than a third threshold value.
The determining module 61 is specifically configured to, when selecting, from all candidate data flows, an associated data flow corresponding to the leaked data flow based on the flow characteristics of the leaked data flow and the flow characteristics of each candidate data flow: calculating the similarity between the flow characteristics of the leakage data flow and the flow characteristics of each candidate data flow; based on the similarity, determining a clustering set taking the leakage data stream as a clustering center by adopting a clustering algorithm, wherein the clustering set comprises associated data streams corresponding to the leakage data stream, and the similarity between the flow characteristics of the leakage data stream and the flow characteristics of the associated data stream meets a similarity condition; wherein, if the similarity is a euclidean distance, satisfying the similarity condition means: the Euclidean distance between the flow characteristics of the compromised data stream and the flow characteristics of the associated data stream is less than a preset distance threshold.
Illustratively, the generating module 62 is specifically configured to, when determining the target leakage path corresponding to the leakage data flow: selecting a target associated data stream from all associated data streams corresponding to the leakage data stream; wherein the acquisition time of the target associated data stream is earlier than the acquisition time of the leakage data stream; determining a data stream arrangement sequence based on the source address of the leaked data stream, the source address and the destination address of the target associated data stream; the destination address of the previous target associated data stream is the source address of the next target associated data stream, and the destination address of the last target associated data stream is the source address of the revealing data stream; and determining a target leakage path corresponding to the leakage data flow based on the data flow arrangement sequence.
For example, the processing module 63 inputs the data flow map to the trained graph anomaly detection model, and is specifically configured to, when obtaining the anomaly score corresponding to each candidate host: if the anomaly detection model comprises an encoder and a decoder, inputting the data flow map to the encoder, and converting the data flow map into hidden layer characteristics through the encoder; inputting the hidden layer characteristics to the decoder, and reconstructing an adjacent matrix and a feature matrix of the data flow map based on the hidden layer characteristics by the decoder to obtain an adjacent reconstruction matrix and a feature reconstruction matrix; wherein the adjacency matrix represents the topological structure of the data flow graph, and the feature matrix represents the node attribute of each candidate host in the data flow graph; and determining the anomaly score corresponding to each candidate host based on the difference value between the adjacent reconstruction matrix and the adjacent matrix and the difference value between the characteristic reconstruction matrix and the characteristic matrix.
Based on the same application concept as the above method, an embodiment of the present application proposes an electronic device, referring to fig. 7, including a processor 71 and a machine-readable storage medium 72, the machine-readable storage medium 72 storing machine-executable instructions executable by the processor 71; the processor 71 is configured to execute machine executable instructions to implement the above-mentioned multimedia data leakage tracing detection method.
Based on the same application concept as the above method, the embodiment of the application further provides a machine-readable storage medium, on which a plurality of computer instructions are stored, which when executed by a processor, can implement the multimedia data leakage tracing detection method of the above example.
Wherein the machine-readable storage medium may be any electronic, magnetic, optical, or other physical storage device that can contain or store information, such as executable instructions, data, or the like. For example, a machine-readable storage medium may be: RAM (Radom Access Memory, random access memory), volatile memory, non-volatile memory, flash memory, a storage drive (e.g., hard drive), a solid state drive, any type of storage disk (e.g., optical disk, dvd, etc.), or a similar storage medium, or a combination thereof.
The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer entity or by an article of manufacture having some functionality. A typical implementation device is a computer, which may be in the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email device, game console, tablet computer, wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in the same piece or pieces of software and/or hardware when implementing the present application.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the application may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Moreover, these computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.

Claims (10)

1. The multimedia data leakage tracing detection method is characterized by comprising the following steps:
determining a leakage data stream and associated data streams corresponding to the leakage data stream from all multimedia data streams, wherein the leakage data stream is a data stream from an intranet host to an external network, the associated data stream is a data stream from the intranet host to the intranet host, and the flow characteristics of the leakage data stream and the flow characteristics of the associated data stream meet a similarity condition;
Determining a target leakage path corresponding to the leakage data flow, and generating a data flow map based on a background flow map and the target leakage path; the background flow map comprises a topological relation among hosts in an intranet, and the target leakage path comprises a plurality of candidate hosts through which the associated data flow and the leakage data flow pass; marking the plurality of candidate hosts on the background flow map, and marking node attributes corresponding to each candidate host on the background flow map to obtain the data flow map;
and inputting the data flow map to a trained map anomaly detection model to obtain anomaly scores corresponding to each candidate host, and determining an anomaly host based on the anomaly scores corresponding to each candidate host.
2. The method of claim 1, wherein before determining the leakage data stream and the associated data stream corresponding to the leakage data stream from all multimedia data streams, the method further comprises:
determining a second network flow corresponding to the first network flow according to each acquired first network flow, wherein the destination address of the second network flow is the same as the source address of the first network flow, and the source address of the second network flow is the same as the destination address of the first network flow;
And if the absolute value of the difference value between the byte number corresponding to the first network flow and the byte number corresponding to the second network flow is larger than a first threshold value, determining the first network flow as a multimedia data flow.
3. The method of claim 1, wherein the step of determining the position of the substrate comprises,
the determining the leakage data stream from all the multimedia data streams comprises the following steps:
for each multimedia data stream, if the source address of the multimedia data stream is located in a configured address segment, the address segment comprises the addresses of all hosts in the intranet, and the destination address of the multimedia data stream is not located in the address segment, determining that the multimedia data stream is a leakage data stream;
if the source address of the multimedia data stream is located in the address segment and the destination address of the multimedia data stream is located in the address segment, determining that the multimedia data stream is a candidate data stream.
4. The method of claim 3, wherein the step of,
the determining the associated data stream corresponding to the leakage data stream from all the multimedia data streams comprises the following steps:
for each leakage data stream, selecting an associated data stream corresponding to the leakage data stream from all candidate data streams based on the flow characteristics of the leakage data stream and the flow characteristics of each candidate data stream;
Wherein the traffic characteristics include acquisition time and byte count; the absolute value of the difference between the acquisition time of the associated data stream and the acquisition time of the leakage data stream is smaller than a second threshold value, and the absolute value of the difference between the byte number of the associated data stream and the byte number of the leakage data stream is smaller than a third threshold value.
5. The method of claim 4, wherein the step of determining the position of the first electrode is performed,
and selecting the associated data stream corresponding to the leakage data stream from all the candidate data streams based on the flow characteristics of the leakage data stream and the flow characteristics of each candidate data stream, wherein the method comprises the following steps:
calculating the similarity between the flow characteristics of the leakage data flow and the flow characteristics of each candidate data flow;
based on the similarity, determining a clustering set taking the leaked data stream as a clustering center by adopting a clustering algorithm, wherein the clustering set comprises associated data streams corresponding to the leaked data stream, and the similarity between the flow characteristics of the leaked data stream and the flow characteristics of the associated data stream meets a similarity condition;
wherein, if the similarity is a euclidean distance, satisfying the similarity condition means: the Euclidean distance between the flow characteristics of the compromised data stream and the flow characteristics of the associated data stream is less than a preset distance threshold.
6. The method of claim 1, wherein the step of determining the position of the substrate comprises,
the determining the target leakage path corresponding to the leakage data flow comprises the following steps:
selecting a target associated data stream from all associated data streams corresponding to the leakage data stream; wherein the acquisition time of the target associated data stream is earlier than the acquisition time of the leakage data stream;
determining a data stream arrangement sequence based on the source address of the leaked data stream, the source address and the destination address of the target associated data stream; the destination address of the previous target associated data stream is the source address of the next target associated data stream, and the destination address of the last target associated data stream is the source address of the revealing data stream;
and determining a target leakage path corresponding to the leakage data flow based on the data flow arrangement sequence.
7. The method of claim 1, wherein inputting the data flow graph to a trained graph anomaly detection model to obtain anomaly scores for each candidate host comprises:
if the anomaly detection model comprises an encoder and a decoder, inputting the data flow map to the encoder, and converting the data flow map into hidden layer characteristics through the encoder;
Inputting the hidden layer characteristics to the decoder, and reconstructing an adjacent matrix and a feature matrix of the data flow map based on the hidden layer characteristics by the decoder to obtain an adjacent reconstruction matrix and a feature reconstruction matrix; wherein the adjacency matrix represents the topological structure of the data flow graph, and the feature matrix represents the node attribute of each candidate host in the data flow graph;
and determining the anomaly score corresponding to each candidate host based on the difference value between the adjacent reconstruction matrix and the adjacent matrix and the difference value between the characteristic reconstruction matrix and the characteristic matrix.
8. A multimedia data leakage trace-source detection device, the device comprising:
the determining module is used for determining a leakage data stream and an associated data stream corresponding to the leakage data stream from all the multimedia data streams; the leakage data flow is a data flow from an intranet host to an extranet, the association data flow is a data flow from the intranet host to the intranet host, and the flow characteristics of the leakage data flow and the flow characteristics of the association data flow meet a similarity condition;
the generation module is used for determining a target leakage path corresponding to the leakage data flow and generating a data flow map based on a background flow map and the target leakage path; the background flow map comprises a topological relation among hosts in an intranet, and the target leakage path comprises a plurality of candidate hosts through which the associated data flow and the leakage data flow pass; marking the plurality of candidate hosts on the background flow map, and marking node attributes corresponding to each candidate host on the background flow map to obtain the data flow map;
The processing module is used for inputting the data flow map to the trained map anomaly detection model to obtain anomaly scores corresponding to each candidate host, and determining the anomaly hosts based on the anomaly scores corresponding to each candidate host.
9. The apparatus of claim 8, wherein the device comprises a plurality of sensors,
the determining module is further configured to determine, for each acquired first network traffic, a second network traffic corresponding to the first network traffic, where a destination address of the second network traffic is the same as a source address of the first network traffic, and a source address of the second network traffic is the same as a destination address of the first network traffic; if the absolute value of the difference value between the byte number corresponding to the first network flow and the byte number corresponding to the second network flow is larger than a first threshold value, determining the first network flow as a multimedia data flow;
the determining module is specifically configured to, when determining a leakage data stream from all multimedia data streams: for each multimedia data stream, if the source address of the multimedia data stream is located in a configured address segment, the address segment comprises the addresses of all hosts in the intranet, and the destination address of the multimedia data stream is not located in the address segment, determining that the multimedia data stream is a leakage data stream; if the source address of the multimedia data stream is located in the address segment and the destination address of the multimedia data stream is located in the address segment, determining that the multimedia data stream is a candidate data stream;
The determining module is specifically configured to, when determining an associated data stream corresponding to the leakage data stream from all the multimedia data streams: for each leakage data stream, selecting an associated data stream corresponding to the leakage data stream from all candidate data streams based on the flow characteristics of the leakage data stream and the flow characteristics of each candidate data stream; wherein the traffic characteristics include acquisition time and byte count; the absolute value of the difference between the acquisition time of the associated data stream and the acquisition time of the leakage data stream is smaller than a second threshold value, and the absolute value of the difference between the byte number of the associated data stream and the byte number of the leakage data stream is smaller than a third threshold value;
the determining module is specifically configured to, when selecting an associated data stream corresponding to the leakage data stream from all candidate data streams based on the flow characteristics of the leakage data stream and the flow characteristics of each candidate data stream: calculating the similarity between the flow characteristics of the leakage data flow and the flow characteristics of each candidate data flow; based on the similarity, determining a clustering set taking the leakage data stream as a clustering center by adopting a clustering algorithm, wherein the clustering set comprises associated data streams corresponding to the leakage data stream, and the similarity between the flow characteristics of the leakage data stream and the flow characteristics of the associated data stream meets a similarity condition; wherein, if the similarity is a euclidean distance, satisfying the similarity condition means: the Euclidean distance between the flow characteristics of the leakage data flow and the flow characteristics of the associated data flow is smaller than a preset distance threshold;
The generating module is specifically configured to, when determining the target leakage path corresponding to the leakage data flow: selecting a target associated data stream from all associated data streams corresponding to the leakage data stream; wherein the acquisition time of the target associated data stream is earlier than the acquisition time of the leakage data stream; determining a data stream arrangement sequence based on the source address of the leaked data stream, the source address and the destination address of the target associated data stream; the destination address of the previous target associated data stream is the source address of the next target associated data stream, and the destination address of the last target associated data stream is the source address of the revealing data stream; determining a target leakage path corresponding to the leakage data flow based on the data flow arrangement sequence;
the processing module inputs the data flow map to a trained map anomaly detection model, and is specifically used for obtaining anomaly scores corresponding to each candidate host when the anomaly scores are obtained: if the anomaly detection model comprises an encoder and a decoder, inputting the data flow map to the encoder, and converting the data flow map into hidden layer characteristics through the encoder; inputting the hidden layer characteristics to the decoder, and reconstructing an adjacent matrix and a feature matrix of the data flow map based on the hidden layer characteristics by the decoder to obtain an adjacent reconstruction matrix and a feature reconstruction matrix; wherein the adjacency matrix represents the topological structure of the data flow graph, and the feature matrix represents the node attribute of each candidate host in the data flow graph; and determining the anomaly score corresponding to each candidate host based on the difference value between the adjacent reconstruction matrix and the adjacent matrix and the difference value between the characteristic reconstruction matrix and the characteristic matrix.
10. An electronic device, comprising: a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor; the processor is configured to execute machine executable instructions to implement the method of any of claims 1-7.
CN202310728716.2A 2023-06-19 2023-06-19 Multimedia data leakage tracing detection method, device and equipment Active CN116488943B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310728716.2A CN116488943B (en) 2023-06-19 2023-06-19 Multimedia data leakage tracing detection method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310728716.2A CN116488943B (en) 2023-06-19 2023-06-19 Multimedia data leakage tracing detection method, device and equipment

Publications (2)

Publication Number Publication Date
CN116488943A CN116488943A (en) 2023-07-25
CN116488943B true CN116488943B (en) 2023-08-25

Family

ID=87218130

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310728716.2A Active CN116488943B (en) 2023-06-19 2023-06-19 Multimedia data leakage tracing detection method, device and equipment

Country Status (1)

Country Link
CN (1) CN116488943B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20130050865A (en) * 2011-11-08 2013-05-16 주식회사 제이컴정보 Caused by the use of smart device internal confidential data leakage prevention & trace system and method
CN110909384A (en) * 2019-11-19 2020-03-24 支付宝(杭州)信息技术有限公司 Method and device for determining business party revealing user information
US10826927B1 (en) * 2020-03-05 2020-11-03 Fmr Llc Systems and methods for data exfiltration detection
WO2022135308A1 (en) * 2020-12-21 2022-06-30 华为云计算技术有限公司 Method and apparatus for detecting media data
CN114760104A (en) * 2022-03-21 2022-07-15 上海电力大学 Distributed abnormal flow detection method in Internet of things environment
CN115146304A (en) * 2021-03-31 2022-10-04 奇安信科技集团股份有限公司 Method and device for detecting file leakage behavior
CN115906135A (en) * 2022-12-28 2023-04-04 深圳乐信软件技术有限公司 Tracing method and device for target data leakage path, electronic equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10404727B2 (en) * 2016-03-25 2019-09-03 Cisco Technology, Inc. Self organizing learning topologies

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20130050865A (en) * 2011-11-08 2013-05-16 주식회사 제이컴정보 Caused by the use of smart device internal confidential data leakage prevention & trace system and method
CN110909384A (en) * 2019-11-19 2020-03-24 支付宝(杭州)信息技术有限公司 Method and device for determining business party revealing user information
US10826927B1 (en) * 2020-03-05 2020-11-03 Fmr Llc Systems and methods for data exfiltration detection
WO2022135308A1 (en) * 2020-12-21 2022-06-30 华为云计算技术有限公司 Method and apparatus for detecting media data
CN115146304A (en) * 2021-03-31 2022-10-04 奇安信科技集团股份有限公司 Method and device for detecting file leakage behavior
CN114760104A (en) * 2022-03-21 2022-07-15 上海电力大学 Distributed abnormal flow detection method in Internet of things environment
CN115906135A (en) * 2022-12-28 2023-04-04 深圳乐信软件技术有限公司 Tracing method and device for target data leakage path, electronic equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
移动终端通讯系统中可溯源异常检测技术研究;王琦;;信息通信(第10期);全文 *

Also Published As

Publication number Publication date
CN116488943A (en) 2023-07-25

Similar Documents

Publication Publication Date Title
US10140575B2 (en) Sports formation retrieval
WO2021129435A1 (en) Method for training video definition evaluation model, video recommendation method, and related device
BR112016028586B1 (en) COMPUTER STORAGE MEDIA AND COMPUTER-IMPLEMENTED METHOD FOR RULES-BASED VIDEO IMPORTANCE ANALYSIS
CN110853033A (en) Video detection method and device based on inter-frame similarity
CN116188821B (en) Copyright detection method, system, electronic device and storage medium
CN110049309B (en) Method and device for detecting stability of image frame in video stream
CN110691259A (en) Video playing method, system, device, electronic equipment and storage medium
CN110113368B (en) Network behavior abnormity detection method based on sub-track mode
CN116778148A (en) Target detection method, target detection device, electronic equipment and storage medium
WO2022156720A1 (en) Method and apparatus for group control account excavation, device, and storage medium
CN111275106A (en) Countermeasure sample generation method and device and computer equipment
CN116488943B (en) Multimedia data leakage tracing detection method, device and equipment
CN116958267A (en) Pose processing method and device, electronic equipment and storage medium
CN112101135A (en) Moving target detection method and device and terminal equipment
Sharma et al. Spatiotemporal deep networks for detecting abnormality in videos
CN115527083B (en) Image annotation method and device and electronic equipment
CN109255238B (en) Terminal threat detection and response method and engine
CN111915713A (en) Three-dimensional dynamic scene creating method, computer equipment and storage medium
US10885343B1 (en) Repairing missing frames in recorded video with machine learning
CN115811585A (en) Scene switching identification method, device, equipment, medium and computer product
CN116501649B (en) Tracker black box attack method and system based on priori information
CN116501176B (en) User action recognition method and system based on artificial intelligence
CN116471441A (en) Screen recording method, device, electronic equipment, storage medium and program product
CN116915720B (en) Internet of things equipment flow identification method and system, electronic equipment and storage medium
CN109450864B (en) Safety detection method, device and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant