WO2020103154A1 - Method, apparatus and system for data analysis - Google Patents

Method, apparatus and system for data analysis

Info

Publication number
WO2020103154A1
WO2020103154A1 PCT/CN2018/117283 CN2018117283W WO2020103154A1 WO 2020103154 A1 WO2020103154 A1 WO 2020103154A1 CN 2018117283 W CN2018117283 W CN 2018117283W WO 2020103154 A1 WO2020103154 A1 WO 2020103154A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
data analysis
sensitive
monitored network
piece
Prior art date
Application number
PCT/CN2018/117283
Other languages
French (fr)
Inventor
Dai Fei Guo
Original Assignee
Siemens Aktiengesellschaft
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Siemens Aktiengesellschaft filed Critical Siemens Aktiengesellschaft
Priority to CN201880099783.XA priority Critical patent/CN113168460A/en
Priority to PCT/CN2018/117283 priority patent/WO2020103154A1/en
Publication of WO2020103154A1 publication Critical patent/WO2020103154A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/20Network architectures or network communication protocols for network security for managing network security; network security policies in general

Definitions

  • the present invention relates to techniques of data analysis, and more particularly to a method, apparatus, system and computer-readable storage media and a computer program for data analysis.
  • a data collecting device is usually deployed at a customer’s side to collect information from the monitored network.
  • a network traffic monitoring device can be deployed to monitor security situation. It can capture network traffic data from node (s) in the monitored network, and have the captured data checked based on multiple predefined rules.
  • network traffic data is collected and can be transferred to a remote monitoring center (optionally, a cloud based center) for deeper data analysis.
  • a sensitive data masking device can be deployed at a monitored network, masking sensitive data before sending them out. But this solution cannot fully eliminate concerns of the sensitive data leakage since the masked sensitive data still have to be sent out, once it is unmasked, there is still possibility of sensitive data leakage.
  • Non-sensitive data (which is considered as not sensitive data) can be extracted (optionally based on a white list mechanism) and sent out to an external data analysis center, with sensitive data left to be analyzed at the monitored network.
  • a final result of data analysis can be got optionally by correlating results of local and external analysis.
  • a method for data analysis at a monitored network includes:
  • a data analysis task where the data analysis task is generated based on a first data analysis result on the non-sensitive data and indicates to make further data analysis on sensitive data of the piece of data;
  • a method for data analysis at a data analysis center includes:
  • an apparatus for data analysis at a monitored network includes:
  • At least one processor coupled to the at least one memory, and upon execution of the executable instructions, configured to:
  • the data analysis task is generated based on a data analysis first result on the non-sensitive data and indicates to make further data analysis on sensitive data of the piece of data;
  • an apparatus for data analysis at a data analysis center includes:
  • At least one processor coupled to the at least one memory and upon execution of the executable instructions, configured to:
  • a computer-readable medium storing executable instructions is presented, upon execution by a computer, it enables the computer to execute the method according to the first or second aspect of the present disclosure.
  • a system for data analysis includes:
  • sensitive data can be filtered, only non-sensitive data will be sent out of the monitored network for data analysis.
  • the non-sensitive data will be analyzed, and if necessary, a data analysis task will be generated to indicate further data analysis on the sensitive data and sent to the monitored network.
  • the monitored network receives the data analysis task, it will make further analysis based on the sensitive data to get a final result of data analysis. So in the present disclosure, sensitive data will not be sent out of the monitored network, which prevents data leakage effectively. What’s more, with data analysis distributed on the monitored network and the data analysis center, and sensitive data is analyzed at the monitored network, with whole set of data, deep data analysis can be made without data leakage.
  • the non-sensitive data is extracted from the piece of data based on a white list mechanism.
  • a white list mechanism a clear definition can be made on non-sensitive data, which can prevent that some suspicious data is considered as non-sensitive data and sent out of the monitored network.
  • the data analysis center it is determined, based on the first data analysis result, whether to generate a data analysis task.
  • the first data analysis result indicates that the sensitive data of the piece of data is also needed to make analysis on, then it is determined that the data analysis task is generated.
  • the data analysis task was executed with following steps:
  • the data analysis task was executed with following steps:
  • a first mark is made on the non-sensitive data and the non-sensitive data with the first mark is sent out of the monitored network, where the first mark indicates that the non-sensitive data is part of the piece of data.
  • a second mark is made on the sensitive data and the sensitive data is stored with the second mark, where the second mark indicates that the sensitive data is part of the piece of data.
  • FIG. 1 depicts a system for data analysis of the present disclosure.
  • FIG. 2 depicts a flow chart for data analysis of the present disclosure.
  • FIG. 3 depicts a flow chart for data analysis at a monitored network of the present disclosure.
  • FIG. 4A and FIG. 4B depict 2 options for a step of executing a data analysis task at a monitored network of the present disclosure.
  • FIG. 5 depicts a flow chart for data analysis at a data analysis center of the present disclosure.
  • FIG. 6 depicts a block diagram displaying an exemplary embodiment of an apparatus for data analysis at a monitored network of the present disclosure.
  • FIG. 7 depicts a block diagram displaying an exemplary embodiment of an apparatus for data analysis at a data analysis center of the present disclosure.
  • FIG. 8 depicts a block diagram displaying an exemplary embodiment of a system for data analysis of the present disclosure.
  • the articles “a” , “an” , “the” and “said” are intended to mean that there are one or more of the elements.
  • the terms “comprising” , “including” and “having” are intended to be inclusive and mean that there may be additional elements other than the listed elements.
  • FIG. 1 depicts a system 100 for data analysis of the present disclosure.
  • the system 100 can include: a monitored network 10 and a data analysis center 20.
  • the monitored network 10 can be an industrial network, such as a network deployed in a factory, a traditional IT network, or any other kind of network deployed at a customer’s side.
  • the data analysis center 20 can be a network/server outside of the monitored network 10, configured to make data analysis on data in the monitored network 10 and/or other monitored networks.
  • the expression “data transmitted in the monitored network 10” includes while not limited to:
  • network traffic can be captured and identified of network protocol in an industrial control network.
  • a Syslog server can be applied to collect system logs via Syslog protocol.
  • a configuration collecting module can automatically logon a target device to collect configuration information about system, networks and security.
  • data in a monitored network usually will be sent out to a data analysis center for deeper analysis, which as mentioned above may bring risks of data leakage.
  • the method 200 can include following steps:
  • - S201 acquiring, at the monitored network 10, a piece of data 30 in the monitored network 10.
  • the piece of data 30 can be an application layer PDU (Protocol Data Unit) , a transport layer PDU or a network layer PDU, or several PDUs. At certain circumstances, some PDUs can be reorganized as the piece of data 30 for analysis. There is usually sensitive data inside, such as an end user’s name, password, private family address, IP address, etc. a customer doesn’t expect leakage of these kinds of sensitive data. For network traffic, the piece of data 30 may include MAC address, IP PDU, TCP/UDP PDU, application PDU (HTTP/FTP/S7/ModBus) , etc.
  • the non-sensitive data 30a can be extracted from the piece of data 30 based on a white list mechanism.
  • a non-sensitive data white list can be defined to extract non-sensitive data from raw data and sent to the data analysis center 20.
  • the white list can include the specific network data package and some defined network field.
  • the white list can define all the ARP (Address Resolution Protocol) , TCP (Transmission Control Protocol) SYN/ACK connection network package as the non-sensitive data. It can also define the some package of industrial control protocol such as OPC UA (OLE for Process Control Unified Architecture) AE (Alarms &Events) data, Modbus DIAGNOSIS as the non-sensitive data.
  • OPC UA OPC for Process Control Unified Architecture
  • AE Alarms &Events
  • Modbus DIAGNOSIS Modbus DIAGNOSIS
  • the White list can be defined as anyone or combination of the following fields:
  • Protocol e.g. ICMP (Internet Control Message Protocol) , ARP, etc.
  • sensitive data 30b can be filtered and saved for possible further analysis.
  • the data analysis center 20 receives the non-sensitive data 30a, it makes analysis on it and generate the first data analysis result 51.
  • - S206 determining, at the data analysis center 20, based on the first data analysis result 51, whether to generate a data analysis task 40 indicating to make further data analysis on the sensitive data 30a of the piece of data 30. For example, if the first data analysis result 51 indicates that the sensitive data 30b is also needed to make analysis on, then the data analysis center 20 it can be determined at the data analysis center 20 that the data analysis task 40 can be generated, otherwise, the first data analysis result 51 can be considered as the final data analysis resulton the piece of data 30.
  • step S209 Following are 2 embodiments of step S209:
  • the step S209 may include following sub-steps:
  • the step S209 may include following sub-steps:
  • a first mark 61 can be made on the non-sensitive data 30a at the monitored network 10, which indicates that the non-sensitive data 30a is part of the piece of data 30, and in the step S204, the first mark 61 can be sent with the non-sensitive data 30a (they can be sent in one message or in separated related messages) .
  • a second mark 62 can be made on the sensitive data 30b and stored with the sensitive data 30b, where the second mark 62 indicates that the sensitive data 30b is part of the piece of data 30.
  • the first mark 61 can be sent with the data analysis task 40 from the data analysis center 20 to the monitored network 10 (they can be sent in one message or in separated related messages) , so that when the first mark 61 is received at the monitored network 10 with the data analysis task, the sensitive data 30b can be decided based on the first mark 61 and the second mark 62. Then further analysis can be made on the sensitive data 30b.
  • the first mark 61 can be a sequence number of the related non-sensitive data 30a
  • the second mark 62 can be used to indicate the sensitive data 30b, e.g. the sequence of the sensitive data 30b, which can include but not limit to: data type, data format, data sequence.
  • the marks is a data link
  • the sensitive data 30b and non-sensitive data 30a are connected with a data link with the following data link: data ID, data packet load, next data ID, etc.
  • a two-tier analysis mechanism is used.
  • data analysis is first made on the non-sensitive data at the data analysis center 20 (optionally, data analysis is also made on the sensitive data at the monitored network 10) , and then the first data analysis result 51 can be made and sent to the monitored network 10.
  • the results of both sides will be correlated based on a second tier of analysis.
  • the final data analysis result 52 can be got and optionally sent to the data analysis center 20.
  • data analysis is made on both sides, information such as volume, frequency, protocol type, port, protocol command and attack packet content of the abnormal traffic can be analyzed by both sides.
  • information such as volume, frequency, protocol type, port, protocol command and attack packet content of the abnormal traffic can be analyzed by both sides.
  • sensitive data such as user/password, production data exchange, they can be analyzed at the monitored network 10 to avoid sensitive data leakage.
  • the first data analysis result 51 made by the data analysis center will be sent to the monitored network.
  • the fist data analysis result 51 may contain IP address, port, or protocol type etc. which are discovered at the data analysis center 20.
  • some abnormal login behavior may be found, such as production data upload/download, configuration modification from the sensitive data 30b of the network traffic or system log, etc.
  • a correlation analysis will be made at the monitored network 10, to find possible abnormal behavior (s) , based on results from both sides. For example, IP address, port, protocol type can be combined with login behavior, to find possible abnormal attack behavior on a critical control system.
  • FIG. 3 depicts a flow chart for data analysis at a monitored network of the present disclosure.
  • the method 300 may includes following steps:
  • FIG. 2 Other embodiments of the method 300 can be referred to FIG. 2 and corresponding description of method 200 on the monitored network 10.
  • FIG. 5 depicts a flow chart for data analysis at a data analysis center of the present disclosure.
  • the method 500 may includes following steps:
  • FIG. 2 Other embodiments of the method 500 can be referred to FIG. 2 and corresponding description of method 200 on the data analysis center 20.
  • FIG. 6 depicts a block diagram displaying an exemplary embodiment of an apparatus for data analysis at a monitored network of the present disclosure.
  • the apparatus 600 can include:
  • At least one memory 601 configured to store instructions
  • At least one processor 602 coupled to the at least one memory 601, and upon execution of the executable instructions, configured to:
  • - receive a data analysis task 40, where the data analysis task is generated based on a first data analysis result 51 on the non-sensitive data 30a and indicates to make further data analysis on sensitive data 30b of the piece of data 30;
  • the at least one processor 602 is further, upon execution of the executable instructions, configured to extract, based on a white list mechanism, the non-sensitive data 30a from the piece of data 30 when extracting non-sensitive data 30a from the piece of data 30 .
  • the at least one processor 602 is further, upon execution of the executable instructions and upon configured to :
  • the at least one processor 602 is further, upon execution of the executable instructions, configured to :
  • At least one processor 602 is further, upon execution of the executable instructions, configured to:
  • the data analysis task 40 get according to the first mark and the second mark the sensitive data 30b.
  • the apparatus 600 may also include a communication module 603, configured to transmit data, indications etc. to the data analysis center 20 and/or receive data, indications from the data analysis center 20.
  • the at least one processor 602, the at least one memory 601 and the communication module 603 can be connected via a bus, or connected directly to each other.
  • FIG. 7 depicts a block diagram displaying an exemplary embodiment of an apparatus for data analysis at a data analysis center of the present disclosure.
  • the apparatus 700 may include:
  • At least one processor 702 coupled to the at least one memory 601 and upon execution of the executable instructions, configured to:
  • the at least one processor 702 is further, upon execution of the executable instructions, configured to:
  • the first result 51 indicates that the sensitive data 30b of the piece of data 30 is also needed to make analysis on, determine to generate the data analysis task 40.
  • the at least one processor 702 is further, upon execution of the executable instructions, configured to:
  • non-sensitive data 30a of a piece of data 30 in the monitored network 10 receive the non-sensitive data 30a with a first mark 61, where the first mark 61 indicates that the non-sensitive data 30a is part of the piece of data 30;
  • the apparatus 700 may also include a communication module 703, configured to transmit data, indications etc. to the monitored network 10 and/or receive data, indications from the monitored network 10.
  • the at least one processor 702, the at least one memory 701 and the communication module 703 can be connected via a bus, or connected directly to each other.
  • FIG. 8 depicts a block diagram displaying an exemplary embodiment of a system for data analysis of the present disclosure, where:
  • the data collecting and sensitive data filtering subsystem 101 can be deployed at the monitored network 10, which helps to collect data (such as a piece of data 30 by a data collecting module 1011) for detection of possible attacks over the network traffic or system log data.
  • the sensitive data 30b can be filtered before the piece of data 30 is sent to the data analysis center 20 by the sensitive data filtering module 1012.
  • the non-sensitive data 30a can be sent to the data analysis center 20 via a security communication module 103 and the sensitive data 30b can be stored in the sensitive data DB 1026 at the local monitored network 10.
  • Data can be collected from a switch in the monitored network 10, a port of the switch can be configured to be in mirror-mode, which results in mapping the network traffic to this port.
  • the data collecting module 1011 can be attached to this port and captures data transmitted in the monitored network 10.
  • the sensitive data filtering module 1012 can perform a basic network security scanning, the sensitive data 30b can be filtered before the piece of data 30 is sent to the data analysis center based on the above mentioned white list mechanism. Only the non-sensitive data 30a will be sent out, and the sensitive data 30b will be stored in the sensitive data DB 1026.
  • the non-sensitive data 30a can be received via a security communication module 201 and stored in the big data DB 204.
  • a data analysis module 202 can be used to train a analysis engine so that rule (s) can be generated and stored in the data analysis rule DB 207.
  • normal traffic can be used to train normal behavior (s)
  • malicious behavior mode can be identified to generate abnormal behavior detection rule (s) .
  • a distributed analysis division module 205 can generate 2 data analysis tasks, one indicating analysis to be made on the non-sensitive data 30a at the data analysis center 20, the other indicating analysis to be made on the sensitive data 30b (which can be the data analysis task 40 mentioned above) .
  • the distributed analysis division module 205 can generate task parameter (s) and targeted sensitive data 30b.
  • the task parameters can include analysis type and analysis rule, e.g. make the analysis of brute force password guessing, rule is repeat time>5 in one second.
  • the distributed analysis division module 205 indicates the sensitive data analysis task generator 203 to generate the analysis task 40 on the sensitive data 30b, and the sensitive data analysis task generator 203 sends the generated analysis task 40 to the monitored network 10 via the security communication module 201.
  • the data analysis task 40 on the sensitive data 30b will be sent to the sensitive data analysis subsystem 102 deployed at the monitored network 10 via the security communication module 201 and the security communication module 103. These two security communication modules ensure secure communications between the monitored network 10 and the data analysis center 20.
  • a sensitive data analysis module 1024 in the sensitive data analysis subsystem 102 can make analysis on the sensitive data DB 1026, based on rule (s) provided by a sensitive data analysis rule DB 1023, and generate a third data analysis result 53 (S2091) .
  • the first data analysis result 51 by the data analysis center 20 can be sent to a correlation analysis module 1025 in the sensitive data analysis subsystem 102.
  • the correlation analysis module 1025 can make correlation analysis based on the correlation rule (s) provided by a correlation rule DB 1027 to generate a final data analysis result, that is, the above mentioned second data analysis result 52 .
  • the final data analysis result can be sent to the data analysis center for further analysis.
  • the data analysis task 40 is received by the sensitive data analysis module 1024.
  • the sensitive data analysis module 1024 decides what kind of sensitive data is needed to make a correlation analysis and gets and sends to the correlation analysis module 1025 the needed sensitive data 30b.
  • the correlation analysis module 1025 make a correlation analysis on the received first result 51 and the received sensitive data 30b, to generate a final data analysis result (second result 52) , similarly, also based on rule (s) provided by the correlation rule DB 1027.
  • the distributed analysis division module 205 at the data analysis center can generate two tasks: one indicates analysis on the non-sensitive data 30a at the data analysis center 20, for example, analysis on the ICMP (Internet Control Message Protocol) ping traffic or TCP port scan traffic to detect scanning abnormal behavior.
  • the detection result i.e. the first data analysis result 51
  • the other task i.e. the data analysis task 40
  • the sensitive data analysis module 1024 On receiving the data analysis task 40, the sensitive data analysis module 1024 will analyze the user/name crack behavior based on the sensitive data 30b, and the correlation analysis module 1025 will get a final data analysis result 52 of attack by combining the data analysis result 51 of ICMP ping traffic or TCP port scan traffic with the user/name crack behavior. Furthermore, the correlation analysis module 1025 can analyze the no permissive configuration operation on the PLC to find the further attack behavior. The final data analysis result 52 can be sent to the data analysis center 20.
  • the non-sensitive data 30a can be extracted from the piece of data 30 based on a white list mechanism.
  • a white list generator 1013 can be used to automatically generate a white list based on generation rule (s) provided by a white list generator configuration DB 1014.
  • a Modbus or OPC command packet can be considered as non-sensitive data if these commands never be used in a predefined length of period, because these commands may be employed in attacks if they are seldom used in normal executions.
  • Those commands can be configured in a data white list configuration DB 1015, which can be used as rule (s) to extract the non-sensitive data 30a from the piece of data 30.
  • rules which can be used by the white list generator 1013:
  • a method, apparatus and system for data analysis are provided in this disclosure.
  • sensitive data can be filtered, only non-sensitive data will be sent out of the monitored network for data analysis.
  • the non-sensitive data will be analyzed, and if necessary, a data analysis task will be generated to indicate further data analysis on the sensitive data and sent to the monitored network.
  • the monitored network receives the data analysis task, it will make further analysis based on the sensitive data to get a final result of data analysis. So in the present disclosure, sensitive data will not be sent out of the monitored network, which prevents data leakage effectively. What’s more, with data analysis distributed on the monitored network and the data analysis center, and sensitive data is analyzed at the monitored network, with whole set of data, deep data analysis can be made without data leakage.
  • a local monitored network generally has limited storage and computing capability, with non-sensitive data being sent to an external data analysis center, performance requirements on local devices and computing efficiency can be reduced.
  • an analysis task can be divided into different sub-tasks and can be computed in a data analysis center and a local monitored network, which can reduce the whole cost of monitoring devices .
  • a computer-readable medium is also provided in the present disclosure, storing executable instructions, which upon execution by a computer, enables the computer to execute any of the methods presented in this disclosure.
  • a computer program which is being executed by at least one processor and performs any of the methods presented in this disclosure.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Bioethics (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

A method, apparatus and system for data analysis are proposed, with which sensitive data does not need to be transferred out of a monitored network. A method (300) for data analysis at a monitored network (10), includes: acquiring (S201), a piece of data (30) in the monitored network (10); extracting (S202) non-sensitive data (30a) from the piece of data (30); sending (S204) the non-sensitive data (30a) out of the monitored network (10); receiving (S208), adata analysis task (40), where the data analysis task is generated based on a first data analysis result (51) on the non-sensitive data (30a) and indicates to make further data analysis on sensitive data (30b) of the piece of data (30); executing (S209), based on the sensitive data (30b), the data analysis task (40) to generate a second data analysis result (52) on the piece of data (30).

Description

Method, apparatus and system for data analysis Technical Field
The present invention relates to techniques of data analysis, and more particularly to a method, apparatus, system and computer-readable storage media and a computer program for data analysis.
Background Art
For cloud based monitoring, such as device status monitoring, or cyber security monitoring, a data collecting device is usually deployed at a customer’s side to collect information from the monitored network. For example, a network traffic monitoring device can be deployed to monitor security situation. It can capture network traffic data from node (s) in the monitored network, and have the captured data checked based on multiple predefined rules.
On one hand, network traffic data is collected and can be transferred to a remote monitoring center (optionally, a cloud based center) for deeper data analysis.
On the other hand, a primary concern is that the content of captured data might contain personal information, commercially sensitive data or other predefined important information, we call these kinds of data “sensitive data” .
Furthermore, in some countries or regions, according to local cyber security related laws and/or regulations, some or all of these kinds of sensitive data is not allowed to transfer across border without security assessment, which may impact data analysis for the purpose of further study, test, tracing, and performing correlation analysis. Even without restriction on data transfer out of a monitored network, there are always risks of sensitive data leakage.
Summary of the Invention
In one solution to solve the problem of sensitive data leakage, a sensitive data masking device can be deployed at a monitored network, masking sensitive data before sending them out. But this solution cannot fully eliminate concerns of the sensitive data leakage since the masked sensitive data still have to be sent out, once it is unmasked, there is still possibility of sensitive data leakage.
A method, apparatus and system for data analysis are proposed in this invention, with which sensitive data does not need to be transferred out of a monitored  network. Non-sensitive data (which is considered as not sensitive data) can be extracted (optionally based on a white list mechanism) and sent out to an external data analysis center, with sensitive data left to be analyzed at the monitored network. A final result of data analysis can be got optionally by correlating results of local and external analysis.
According to a first aspect of the present disclosure, a method for data analysis at a monitored network is presented, it includes:
- acquiring, a piece of data in the monitored network;
- extracting non-sensitive data from the piece of data;
- sending the non-sensitive data out of the monitored network for data analysis;
- receiving, a data analysis task, where the data analysis task is generated based on a first data analysis result on the non-sensitive data and indicates to make further data analysis on sensitive data of the piece of data;
- executing, based on the sensitive data, the data analysis task to generate a second data analysis result.
According to a second aspect of the present disclosure, a method for data analysis at a data analysis center is presented, it includes:
- receiving, from a monitored network, non-sensitive data of a piece of data in the monitored network;
- making analysis on the non-sensitive data to generate a first data analysis result;
- determining, based on the first result, whether to generate a data analysis task indicating to make further data analysis on sensitive data of the piece of data;
- if determining to generate the data analysis task, generating and sending the data analysis task to the monitored network.
According to a third aspect of the present disclosure, an apparatus for data analysis at a monitored network is presented, it includes:
- at least one memory, configured to store instructions;
- at least one processor, coupled to the at least one memory, and upon execution of the executable instructions, configured to:
- acquire a piece of data in the monitored network;
- extract non-sensitive data from the piece of data;
- send the non-sensitive data out of the monitored network for data analysis;
- receive a data analysis task, where the data analysis task is generated based on a data analysis first result on the non-sensitive data and indicates to make further data analysis on sensitive data of the piece of data;
- execute, based on the sensitive data, the data analysis task to generate a second data analysis result on the piece of data.
According to a fourth aspect of the present disclosure, an apparatus for data analysis at a data analysis center is presented, it includes:
- at least one memory, configured to store executable instructions;
- at least one processor, coupled to the at least one memory and upon execution of the executable instructions, configured to:
- receive, from a monitored network, non-sensitive data of a piece of data in the monitored network;
- make analysis on the non-sensitive data to generate a first data analysis result;
- determine, based on the first data analysis result, whether to generate a data analysis task indicating to make further data analysis on sensitive data of the piece of data;
- if determining to generate the data analysis task, generate and send the data analysis task to the monitored network.
According to a fifth aspect of the present disclosure, a computer-readable medium, storing executable instructions is presented, upon execution by a computer, it enables the computer to execute the method according to the first or second aspect of the present disclosure.
According to a sixth aspect of the present disclosure, a system for data analysis is presented. it includes:
- a monitored network, configured to
- acquire a piece of data in the monitored network ,
- extract non-sensitive data from the piece of data, and
- send the non-sensitive data out of the monitored network to a data analysis center;
- the data analysis center, configured to
- receive, from a monitored network, non-sensitive data of a piece data transmitted in the monitored network,
- make analysis on the non-sensitive data to generate a first data analysis result, 
- determine, based on the first data analysis result, whether to generate a data analysis task indicating to make further data analysis on sensitive data of the piece of data, and
- if determining to generate the data analysis task, generate and send the data analysis task to the monitored network;
- the monitored network, further configured to
- receive the data analysis task, and
- execute, based on the sensitive data, the data analysis task to generate a second data analysis result on the piece of data.
With the solutions provided, sensitive data can be filtered, only non-sensitive data will be sent out of the monitored network for data analysis. At the data analysis center, the non-sensitive data will be analyzed, and if necessary, a data analysis task will be generated to indicate further data analysis on the sensitive data and sent to the monitored network. Once the monitored network receives the data analysis task, it will make further analysis based on the sensitive data to get a final result of data analysis. So in the present disclosure, sensitive data will not be sent out of the monitored network, which prevents data leakage effectively. What’s more, with data analysis distributed on the monitored network and the data analysis center, and sensitive data is analyzed at the monitored network, with whole set of data, deep data analysis can be made without data leakage.
In an embodiment of the present disclosure, at the monitored network, the non-sensitive data is extracted from the piece of data based on a white list mechanism. With a white list mechanism, a clear definition can be made on non-sensitive data, which can prevent that some suspicious data is considered as non-sensitive data and sent out of the monitored network.
In an embodiment of the present disclosure, at the data analysis center, it is determined, based on the first data analysis result, whether to generate a data analysis task. Optionally, if the first data analysis result indicates that the sensitive data of the piece of data is also needed to make analysis on, then it is determined that the data analysis task is generated.
In an embodiment of the present disclosure, at the monitored network, the data analysis task was executed with following steps:
- making data analysis on the sensitive data to generate a third data analysis result;
- receiving the first data analysis result ;
- generating the second data analysis result based on the first data analysis result and the third data analysis result.
In an embodiment of the present disclosure, at the monitored network, the data analysis task was executed with following steps:
- receiving the first data analysis result;
- generating the second data analysis result based on the first data analysis result and the sensitive data.
In an embodiment of the present disclosure, at the monitored network, a first mark is made on the non-sensitive data and the non-sensitive data with the first mark is sent out of the monitored network, where the first mark indicates that the non-sensitive data is part of the piece of data. Also at the monitored network, a second mark is made on the sensitive data and the sensitive data is stored with the second mark, where the second mark indicates that the sensitive data is part of the piece of data. Then at the data analysis center, the non-sensitive data with a first mark is received, and if the data analysis task is generated, the data analysis task is sent by the data analysis center to the monitored network with the first mark. Next, at the monitored network, the data analysis task with the first mark is received and the stored sensitive data is got according to the first mark and the second mark. With the first mark and the second mark, the sensitive data and the non-sensitive data and corresponding data analysis results can be connected.
Brief Description of the Drawings
The above mentioned attributes and other features and advantages of the present technique and the manner of attaining them will become more apparent and the present technique itself will be better understood by reference to the following description of embodiments of the present technique taken in conjunction with the accompanying drawings, wherein:
FIG. 1 depicts a system for data analysis of the present disclosure.
FIG. 2 depicts a flow chart for data analysis of the present disclosure.
FIG. 3 depicts a flow chart for data analysis at a monitored network of the present disclosure.
FIG. 4A and FIG. 4B depict 2 options for a step of executing a data analysis task at a monitored network of the present disclosure.
FIG. 5 depicts a flow chart for data analysis at a data analysis center of the present disclosure.
FIG. 6 depicts a block diagram displaying an exemplary embodiment of an apparatus for data analysis at a monitored network of the present disclosure.
FIG. 7 depicts a block diagram displaying an exemplary embodiment of an apparatus for data analysis at a data analysis center of the present disclosure.
FIG. 8 depicts a block diagram displaying an exemplary embodiment of a system for data analysis of the present disclosure.
Reference Numbers:
100, a system for data analysis
10, a monitored network
20, a data analysis center
30, a piece of data transmitted in the monitored network 10
30a, non-sensitive data of the piece of data 30
30b, sensitive data of the piece of data 30
40, a data analysis task
51, a first result of data analysis
52, a second result of data analysis
53, a third result of data analysis
61, a first mark on the non-sensitive data 30a, indicating that the non-sensitive data 30a is part of the piece of data 30
62, a second mark indicating on the sensitive data 30b, indicating that the sensitive data 30b is part of the piece of data 30
200, 300, 500, a method for data analysis
S201, acquiring a piece of data 30 in the monitored network 10
S202, extracting non-sensitive data 30a from the piece of data 30
S203, storing sensitive data 30b of the piece of data 30
S204, sending from the monitored network 10 the non-sensitive data 30a out of the monitored network 10 for data analysis, and receiving the non-sensitive data  30a at the data analysis center 20
S205, making analysis on the non-sensitive data 30a to generate a first data analysis result 51
S206, determining based on the first data analysis result 51 whether to generate a data analysis task 40 indicating to make further data analysis on the sensitive data 30a of the piece of data 30.
S207, generating the data analysis task 40
S208, sending the data analysis task 40 from the data analysis center 20 to the monitored network 10, and receiving the data analysis task 40 at the monitored network 10
S209, executing based on the sensitive data 30b the data analysis task 40 to generate a second data analysis result 52 on the piece of data 30.
S2091, making data analysis on the sensitive data 30b to generate a third data analysis result 53
S2092, receiving the first data analysis result 51
S2093, generating the second data analysis result 52 based on the first data analysis result 51 and the third data analysis result 53
S2091’, receiving the first data analysis result 51
S2092’, generating the second data analysis result 52 based on the first data analysis result 51 and the sensitive data 30b.
600, an apparatus for data analysis at a monitored network 10
601, memory
602, processor
603, communication module
700, an apparatus for data analysis at a data analysis center 20
701, memory
702, processor
703, communication module
101, data collecting and sensitive data filtering subsystem
1011, data collecting module
1012, sensitive data filtering module
1013, white list generator
1014, white list generator configuration DB
1015, data white list configuration DB
103, security communication module
102, sensitive data analysis subsystem
1021, training configuration DB
1022, sensitive data training DB
1023, sensitive data analysis rule DB
1024, sensitive data analysis module
1025, correlation analysis module
1026, sensitive data DB
1027, correlation rule DB
201, security communication module
202, data analysis module
203, sensitive data analysis task generator
204, big data DB
205, distributed analysis division module
206, data analysis task module
207, data analysis rule DB
Detailed Description of Example Embodiments
Hereinafter, above-mentioned and other features of the present technique are described in details. Various embodiments are described with reference to the drawing, where like reference numerals are used to refer to like elements throughout. In the following description, for purpose of explanation, numerous specific details are set forth in order to provide a thorough understanding of one or more embodiments. It may be noted that the illustrated embodiments are intended to explain, and not to limit the invention. It may be evident that such embodiments may be practiced without these specific details.
When introducing elements of various embodiments of the present disclosure, the articles “a” , “an” , “the” and “said” are intended to mean that there are one or more of the elements. The terms “comprising” , “including” and “having” are intended to be inclusive and mean that there may be additional elements other than the listed elements.
The present technique has been described hereinafter in details by referring to FIG. s 1 to 9.
By way of introduction, FIG. 1 depicts a system 100 for data analysis of the present disclosure. The system 100 can include: a monitored network 10 and a data analysis center 20. The monitored network 10 can be an industrial network, such as a network deployed in a factory, a traditional IT network, or any other kind of network deployed at a customer’s side. The data analysis center 20 can be a network/server outside of the monitored network 10, configured to make data analysis on data in the monitored network 10 and/or other monitored networks. The expression “data transmitted in the monitored network 10” includes while not limited to:
1) data transmitted in the monitored network 10;
which includes network traffic transmitted between network elements in the monitored network 10, data transmitted out of the monitored network 10, and/or data received in the monitored network from external network;
2) system log;
3) configuration data of a device, the network 10;
4) any other data which can be used to make analysis on.
For example, network traffic can be captured and identified of network protocol in an industrial control network. A Syslog server can be applied to collect system logs via Syslog protocol. A configuration collecting module can automatically logon a target device to collect configuration information about system, networks and security. In a traditional data analysis scenario, data in a monitored network usually will be sent out to a data analysis center for deeper analysis, which as mentioned above may bring risks of data leakage.
Now referring to FIG. 2, a flow chart for data analysis by the system 100 of the present disclosure is depicted. The method 200 can include following steps:
- S201: acquiring, at the monitored network 10, a piece of data 30 in the monitored network 10.
The piece of data 30 can be an application layer PDU (Protocol Data Unit) , a transport layer PDU or a network layer PDU, or several PDUs. At certain circumstances, some PDUs can be reorganized as the piece of data 30 for analysis. There is usually sensitive data inside, such as an end user’s name, password, private family address, IP address, etc. a customer doesn’t expect leakage of these kinds of sensitive data. For network traffic, the piece of data 30 may include MAC address, IP PDU, TCP/UDP PDU, application PDU (HTTP/FTP/S7/ModBus) , etc.
- S202: extracting, at the monitored network 10, non-sensitive data 30a from the piece of data 30.
In this step, the non-sensitive data 30a can be extracted from the piece of data 30 based on a white list mechanism.
With a white list mechanism, a clear definition can be made on non-sensitive data, which can prevent that some suspicious data is considered as non-sensitive data and sent out of the monitored network.
Take an industrial network as an example, usually data in such a network is relatively constant, a non-sensitive data white list can be defined to extract non-sensitive data from raw data and sent to the data analysis center 20. The white list can include the specific network data package and some defined network field. Fox example, the white list can define all the ARP (Address Resolution Protocol) , TCP (Transmission Control Protocol) SYN/ACK connection network package as the non-sensitive data. It can also define the some package of industrial control protocol such as OPC UA (OLE for Process Control Unified Architecture) AE (Alarms &Events) data, Modbus DIAGNOSIS as the non-sensitive data.
The White list can be defined as anyone or combination of the following fields:
(1) IP address
(2) TCP/UDP (User Datagram Protocol) Port
(3) Protocol, e.g. ICMP (Internet Control Message Protocol) , ARP, etc.
(4) Protocol command
(5) Other field (s)
- S203: storing, at the monitored network 10, sensitive data 30b of the piece of data 30.
With the determined non-sensitive data 30a, sensitive data 30b can be filtered and saved for possible further analysis.
- S204: sending, from the monitored network, the non-sensitive data 30a out of the monitored network (10) for data analysis.
- S205: making analysis, at the data analysis center 20, on the non-sensitive data 30a to generate a first data analysis result 51.
Once the data analysis center 20 receives the non-sensitive data 30a, it makes analysis on it and generate the first data analysis result 51.
- S206: determining, at the data analysis center 20, based on the first data analysis result 51, whether to generate a data analysis task 40 indicating to make further data analysis on the sensitive data 30a of the piece of data 30. For example, if the first data analysis result 51 indicates that the sensitive data 30b is also needed  to make analysis on, then the data analysis center 20 it can be determined at the data analysis center 20 that the data analysis task 40 can be generated, otherwise, the first data analysis result 51 can be considered as the final data analysis resulton the piece of data 30.
- S207, generating, at the data analysis center, the data analysis task 40.
- S208, sending the data analysis task 40 from the data analysis center 20 to the monitored network 10, and receiving the data analysis task 40 at the monitored network 10.
- S209, executing, at the monitored network and based on the sensitive data 30b, the data analysis task 40 to generate a second data analysis result 52 on the piece of data 30.
Following are 2 embodiments of step S209:
In the first embodiment (referring to FIG. 4A) , the step S209 may include following sub-steps:
- S2091, making data analysis on the sensitive data 30b to generate a third data analysis result 53.
- S2092, receiving the first data analysis result 51.
- S2093, generating the second data analysis result 52 based on the first data analysis result 51 and the third data analysis result 53.
In the second embodiment (referring to FIG. 4B) , the step S209 may include following sub-steps:
- S2091’, receiving the first data analysis result 51.
- S2092’, generating the second data analysis result 52 based on the first data analysis result 51 and the sensitive data 30b.
Optionally, in order to make the possible further analysis on the sensitive data 30b, before the step S204 sending the non-sensitive data 30a out of the monitored network 10 for data analysis, a first mark 61 can be made on the non-sensitive data 30a at the monitored network 10, which indicates that the non-sensitive data 30a is part of the piece of data 30, and in the step S204, the first mark 61 can be sent with the non-sensitive data 30a (they can be sent in one message or in separated related messages) . And at the monitored network, a second mark 62 can be made on the sensitive data 30b and stored with the sensitive data 30b, where the second mark 62 indicates that the sensitive data 30b is part of the piece of data 30. In the step S208, the first mark 61 can be sent with the data analysis task 40 from the data analysis center 20 to the monitored network 10 (they can be sent in one message or in separated related messages) , so that when the first mark 61 is received at the  monitored network 10 with the data analysis task, the sensitive data 30b can be decided based on the first mark 61 and the second mark 62. Then further analysis can be made on the sensitive data 30b. Optionally, the first mark 61 can be a sequence number of the related non-sensitive data 30a, and the second mark 62 can be used to indicate the sensitive data 30b, e.g. the sequence of the sensitive data 30b, which can include but not limit to: data type, data format, data sequence.
Another example for the marks is a data link, the sensitive data 30b and non-sensitive data 30a are connected with a data link with the following data link: data ID, data packet load, next data ID, etc.
In the above data analysis procedure, a two-tier analysis mechanism is used. In a first tier of analysis, data analysis is first made on the non-sensitive data at the data analysis center 20 (optionally, data analysis is also made on the sensitive data at the monitored network 10) , and then the first data analysis result 51 can be made and sent to the monitored network 10. The results of both sides will be correlated based on a second tier of analysis. Then the final data analysis result 52 can be got and optionally sent to the data analysis center 20.
If in the first tier of data analysis, data analysis is made on both sides, information such as volume, frequency, protocol type, port, protocol command and attack packet content of the abnormal traffic can be analyzed by both sides. For sensitive data such as user/password, production data exchange, they can be analyzed at the monitored network 10 to avoid sensitive data leakage.
The first data analysis result 51 made by the data analysis center will be sent to the monitored network. For example, the fist data analysis result 51 may contain IP address, port, or protocol type etc. which are discovered at the data analysis center 20.
At the monitored network 10, taking a security analysis as an example, if data analysis is also made on the sensitive data, some abnormal login behavior (s) may be found, such as production data upload/download, configuration modification from the sensitive data 30b of the network traffic or system log, etc.
In the second tier of analysis, taking security analysis as an example, a correlation analysis will be made at the monitored network 10, to find possible abnormal behavior (s) , based on results from both sides. For example, IP address, port, protocol type can be combined with login behavior, to find possible abnormal attack behavior on a critical control system.
FIG. 3 depicts a flow chart for data analysis at a monitored network of the  present disclosure. The method 300 may includes following steps:
- S201: acquiring a piece of data 30 in the monitored network 10.
- S202: extracting non-sensitive data 30a from the piece of data 30.
- S204: sending from the monitored network 10 the non-sensitive data 30a out of the monitored network 10 for data analysis, and receiving the non-sensitive data 30a at the data analysis center 20.
- S208: sending the data analysis task 40 from the data analysis center 20 to the monitored network 10, and receiving the data analysis task 40 at the monitored network 10.
- S209: executing based on the sensitive data 30b the data analysis task 40 to generate a second data analysis result 52 on the piece of data 30.
Other embodiments of the method 300 can be referred to FIG. 2 and corresponding description of method 200 on the monitored network 10.
FIG. 5 depicts a flow chart for data analysis at a data analysis center of the present disclosure. The method 500 may includes following steps:
- S204: receiving from the monitored network 10 the non-sensitive data 30a out of the monitored network 10 for data analysis, and receiving the non-sensitive data 30a.
- S205: making analysis on the non-sensitive data 30a to generate a first data analysis result 51.
- S206: determining based on the first data analysis result 51 whether to generate a data analysis task 40 indicating to make further data analysis on the sensitive data 30a of the piece of data 30.
- S207: generating the data analysis task 40.
- S208: sending the data analysis task 40 from the data analysis center 20 to the monitored network 10, and receiving the data analysis task 40 at the monitored network 10.
Other embodiments of the method 500 can be referred to FIG. 2 and corresponding description of method 200 on the data analysis center 20.
FIG. 6 depicts a block diagram displaying an exemplary embodiment of an apparatus for data analysis at a monitored network of the present disclosure. Referring to FIG. 6, the apparatus 600 can include:
- at least one memory 601 , configured to store instructions;
- at least one processor 602, coupled to the at least one memory 601, and upon  execution of the executable instructions, configured to:
- acquire a piece of data 30 in the monitored network 10;
- extract non-sensitive data 30a from the piece of data 30;
- send the non-sensitive data 30a out of the monitored network 10 for data analysis;
- receive a data analysis task 40, where the data analysis task is generated based on a first data analysis result 51 on the non-sensitive data 30a and indicates to make further data analysis on sensitive data 30b of the piece of data 30;
- execute, based on the sensitive data 30b, the data analysis task 40 to generate a second data analysis result 52 on the piece of data 30.
Optionally, the at least one processor 602 is further, upon execution of the executable instructions, configured to extract, based on a white list mechanism, the non-sensitive data 30a from the piece of data 30 when extracting non-sensitive data 30a from the piece of data 30 .
Optionally, when executing the data analysis task 40, the at least one processor 602 is further, upon execution of the executable instructions and upon configured to :
- make data analysis on the sensitive data 30b to generate a third data analysis result 53;
- receive the first data analysis result 51;
- generate the second data analysis result 52 based on the first data analysis result 51 and the third data analysis result 53.
Optionally, when executing the data analysis task 40, the at least one processor 602 is further, upon execution of the executable instructions, configured to :
- receive the first data analysis result 51;
- generate the second data analysis result 52 based on the first data analysis result 51 and the sensitive data 30b.
Optionally, where the at least one processor 602 is further, upon execution of the executable instructions, configured to:
- before sending the non-sensitive data 30a out of the monitored network 10 for data analysis, make a first mark 61 on the non-sensitive data 30a, where the first mark 61 indicates that the non-sensitive data 30a is part of the piece of data 30;
- when sending the non-sensitive data 30a out of the monitored network 10 for data analysis, send the non-sensitive data 30a with the first mark 61;
- before receiving a data analysis task 40, make a second mark 62 on the sensitive data 30b and store the sensitive data 30b with the second mark 62, where  the second mark 62 indicates that the sensitive data 30b is part of the piece of data 30 ;
- when receiving a data analysis task 40, receive the data analysis task 40 with the first mark 61;
- before executing, in the monitored network 10, the data analysis task 40, get according to the first mark and the second mark the sensitive data 30b.
Optionally, the apparatus 600 may also include a communication module 603, configured to transmit data, indications etc. to the data analysis center 20 and/or receive data, indications from the data analysis center 20. The at least one processor 602, the at least one memory 601 and the communication module 603 can be connected via a bus, or connected directly to each other.
FIG. 7 depicts a block diagram displaying an exemplary embodiment of an apparatus for data analysis at a data analysis center of the present disclosure. Referring FIG. 7, the apparatus 700 may include:
- at least one memory 701, configured to store executable instructions;
- at least one processor 702, coupled to the at least one memory 601 and upon execution of the executable instructions, configured to:
- receive, from a monitored network 10, non-sensitive data 30a of a piece of data 30 in the monitored network 10;
- make analysis on the non-sensitive data 30a to generate a first data analysis result 51;
- determine, based on the first data analysis result 51, whether to generate a data analysis task 40 indicating to make further data analysis on sensitive data 30b of the piece of data 30;
- if determining to generate the data analysis task 40, generate and send the data analysis task 40 to the monitored network 10.
Optionally, when determining, based on the first data analysis result 51, whether to generate a data analysis task 40, the at least one processor 702 is further, upon execution of the executable instructions, configured to:
- if the first result 51 indicates that the sensitive data 30b of the piece of data 30 is also needed to make analysis on, determine to generate the data analysis task 40.
Optionally, the at least one processor 702 is further, upon execution of the executable instructions, configured to:
- when receiving from a monitored network 10, non-sensitive data 30a of a piece of data 30 in the monitored network 10, receive the non-sensitive data 30a  with a first mark 61, where the first mark 61 indicates that the non-sensitive data 30a is part of the piece of data 30;
- when sending the data analysis task 40 to the monitored network 10, send the data analysis task 40 with the first mark 61.
Optionally, the apparatus 700 may also include a communication module 703, configured to transmit data, indications etc. to the monitored network 10 and/or receive data, indications from the monitored network 10. The at least one processor 702, the at least one memory 701 and the communication module 703 can be connected via a bus, or connected directly to each other.
FIG. 8 depicts a block diagram displaying an exemplary embodiment of a system for data analysis of the present disclosure, where:
The data collecting and sensitive data filtering subsystem 101 can be deployed at the monitored network 10, which helps to collect data (such as a piece of data 30 by a data collecting module 1011) for detection of possible attacks over the network traffic or system log data. The sensitive data 30b can be filtered before the piece of data 30 is sent to the data analysis center 20 by the sensitive data filtering module 1012. The non-sensitive data 30a can be sent to the data analysis center 20 via a security communication module 103 and the sensitive data 30b can be stored in the sensitive data DB 1026 at the local monitored network 10.
Data can be collected from a switch in the monitored network 10, a port of the switch can be configured to be in mirror-mode, which results in mapping the network traffic to this port. The data collecting module 1011 can be attached to this port and captures data transmitted in the monitored network 10.
Taking a security analysis as an example, the sensitive data filtering module 1012 can perform a basic network security scanning, the sensitive data 30b can be filtered before the piece of data 30 is sent to the data analysis center based on the above mentioned white list mechanism. Only the non-sensitive data 30a will be sent out, and the sensitive data 30b will be stored in the sensitive data DB 1026.
At the data analysis center 20, the non-sensitive data 30a can be received via a security communication module 201 and stored in the big data DB 204. For history data, a data analysis module 202 can be used to train a analysis engine so that rule (s) can be generated and stored in the data analysis rule DB 207. For example, normal traffic can be used to train normal behavior (s) , then malicious behavior mode can be identified to generate abnormal behavior detection rule (s) . A distributed analysis division module 205 can generate 2 data analysis tasks, one indicating analysis to  be made on the non-sensitive data 30a at the data analysis center 20, the other indicating analysis to be made on the sensitive data 30b (which can be the data analysis task 40 mentioned above) . For example, the distributed analysis division module 205 can generate task parameter (s) and targeted sensitive data 30b. The task parameters can include analysis type and analysis rule, e.g. make the analysis of brute force password guessing, rule is repeat time>5 in one second. The distributed analysis division module 205 indicates the sensitive data analysis task generator 203 to generate the analysis task 40 on the sensitive data 30b, and the sensitive data analysis task generator 203 sends the generated analysis task 40 to the monitored network 10 via the security communication module 201.
The data analysis task 40 on the sensitive data 30b will be sent to the sensitive data analysis subsystem 102 deployed at the monitored network 10 via the security communication module 201 and the security communication module 103. These two security communication modules ensure secure communications between the monitored network 10 and the data analysis center 20.
Corresponding to the option shown in FIG. 4A, on one hand, a sensitive data analysis module 1024 in the sensitive data analysis subsystem 102 can make analysis on the sensitive data DB 1026, based on rule (s) provided by a sensitive data analysis rule DB 1023, and generate a third data analysis result 53 (S2091) . On the other hand, The first data analysis result 51 by the data analysis center 20 can be sent to a correlation analysis module 1025 in the sensitive data analysis subsystem 102. The correlation analysis module 1025 can make correlation analysis based on the correlation rule (s) provided by a correlation rule DB 1027 to generate a final data analysis result, that is, the above mentioned second data analysis result 52 . Optionally, the final data analysis result can be sent to the data analysis center for further analysis.
Corresponding to the option shown in FIG. 4B, the data analysis task 40 is received by the sensitive data analysis module 1024. Based on the task 40, the sensitive data analysis module 1024 decides what kind of sensitive data is needed to make a correlation analysis and gets and sends to the correlation analysis module 1025 the needed sensitive data 30b. Then the correlation analysis module 1025 make a correlation analysis on the received first result 51 and the received sensitive data 30b, to generate a final data analysis result (second result 52) , similarly, also based on rule (s) provided by the correlation rule DB 1027.
Taking a killing chain as an example, an attacker scans a PLC web port and tries a weak user name and password to the web management interface of the PLC  and makes some configuration on the PLC. For the data analysis tasks, the distributed analysis division module 205 at the data analysis center can generate two tasks: one indicates analysis on the non-sensitive data 30a at the data analysis center 20, for example, analysis on the ICMP (Internet Control Message Protocol) ping traffic or TCP port scan traffic to detect scanning abnormal behavior. The detection result (i.e. the first data analysis result 51) and the other task (i.e. the data analysis task 40) will be sent to the monitored network 10. On receiving the data analysis task 40, the sensitive data analysis module 1024 will analyze the user/name crack behavior based on the sensitive data 30b, and the correlation analysis module 1025 will get a final data analysis result 52 of attack by combining the data analysis result 51 of ICMP ping traffic or TCP port scan traffic with the user/name crack behavior. Furthermore, the correlation analysis module 1025 can analyze the no permissive configuration operation on the PLC to find the further attack behavior. The final data analysis result 52 can be sent to the data analysis center 20.
As mentioned above, the non-sensitive data 30a can be extracted from the piece of data 30 based on a white list mechanism. In this exemplary embodiment, in the data collecting and sensitive data filtering subsystem 101 at the monitored network 10, a white list generator 1013 can be used to automatically generate a white list based on generation rule (s) provided by a white list generator configuration DB 1014. For example, a Modbus or OPC command packet can be considered as non-sensitive data if these commands never be used in a predefined length of period, because these commands may be employed in attacks if they are seldom used in normal executions. Those commands can be configured in a data white list configuration DB 1015, which can be used as rule (s) to extract the non-sensitive data 30a from the piece of data 30. Following are some rules which can be used by the white list generator 1013:
(1) if a protocol never appear in the network traffic in a predefined period, then data of this protocol can be considered as non-sensitive.
(2) if a protocol command never appear in the network traffic in a predefined period, then data of this protocol can be considered as non-sensitive.
(3) if an IP address never appear in the network traffic in a predefined period, then data sent from this IP address can be considered as non-sensitive.
(4) if a TCP/UDP port never appear in the network traffic in a predefined period, then data sent from this port can be considered as non-sensitive.
(5) if a HTTP (HyperText Transfer Protocol) agent type never appear in the  network traffic in a predefined period, then data sent by an agent of this type can be considered as non-sensitive.
A method, apparatus and system for data analysis are provided in this disclosure. With the solutions provided, sensitive data can be filtered, only non-sensitive data will be sent out of the monitored network for data analysis. At the data analysis center, the non-sensitive data will be analyzed, and if necessary, a data analysis task will be generated to indicate further data analysis on the sensitive data and sent to the monitored network. Once the monitored network receives the data analysis task, it will make further analysis based on the sensitive data to get a final result of data analysis. So in the present disclosure, sensitive data will not be sent out of the monitored network, which prevents data leakage effectively. What’s more, with data analysis distributed on the monitored network and the data analysis center, and sensitive data is analyzed at the monitored network, with whole set of data, deep data analysis can be made without data leakage.
With the solutions provided in this disclosure, sensitive data can be protected from leakage in an external structure of data analysis center, especially for a data analysis center being deployed in the cloud. Following advantages can be achieved:
(1) with sensitive data collected from a monitored network being stored at the monitored network, not being sent outside, it becomes easier to protect sensitive data of a monitored network, especially for one with critical infrastructure.
(2) with sensitive data being stored in the local monitored network and not being sent out, it becomes easier to convince a customer to adopt an external monitoring center, such as a cloud based central monitoring center.
(3) a local monitored network generally has limited storage and computing capability, with non-sensitive data being sent to an external data analysis center, performance requirements on local devices and computing efficiency can be reduced.
(4) an analysis task can be divided into different sub-tasks and can be computed in a data analysis center and a local monitored network, which can reduce the whole cost of monitoring devices .
A computer-readable medium is also provided in the present disclosure, storing executable instructions, which upon execution by a computer, enables the computer to execute any of the methods presented in this disclosure.
A computer program, which is being executed by at least one processor and  performs any of the methods presented in this disclosure.
While the present technique has been described in detail with reference to certain embodiments, it should be appreciated that the present technique is not limited to those precise embodiments. Rather, in view of the present disclosure which describes exemplary modes for practicing the invention, many modifications and variations would present themselves, to those skilled in the art without departing from the scope and spirit of this invention. The scope of the invention is, therefore, indicated by the following claims rather than by the foregoing description. All changes, modifications, and variations coming within the meaning and range of equivalency of the claims are to be considered within their scope.

Claims (18)

  1. A method (300) for data analysis at a monitored network (10) , comprising:
    - acquiring (S201) , a piece of data (30) in the monitored network (10) ;
    - extracting (S202) non-sensitive data (30a) from the piece of data (30) ;
    - sending (S204) the non-sensitive data (30a) out of the monitored network (10) for data analysis;
    - receiving (S208) , a data analysis task (40) , wherein the data analysis task is generated based on a first data analysis result (51) on the non-sensitive data (30a) and indicates to make further data analysis on sensitive data (30b) of the piece of data (30) ;
    - executing (S209) , based on the sensitive data (30b) , the data analysis task (40) to generate a second data analysis result (52) .
  2. The method of claim 1, wherein the step of extracting non-sensitive data (30a) from the piece of data (30) comprises:
    - extracting, based on a white list mechanism, the non-sensitive data (30a) from the piece of data (30) .
  3. The method of claim 1 or 2, wherein the step of executing the data analysis task (40) comprises:
    - making data analysis (S2091) on the sensitive data (30b) to generate a third data analysis result (53) ;
    - receiving (S2092) the first data analysis result (51) ;
    - generating (S2093) the second data analysis result (52) based on the first data analysis result (51) and the third data analysis result (53) .
  4. The method of claim 1 or 2, wherein the step of executing the data analysis task (40) , comprises:
    - receiving (S2091’) the first data analysis result (51) ;
    - generating (S2092’) the second data analysis result (52) based on the first data analysis result (51) and the sensitive data (30b) .
  5. The method of any of claims 1~4, wherein
    - before the step of sending (S204) the non-sensitive data (30a) out of the monitored network (10) for data analysis, the method further comprises: making a first mark (61) on the non-sensitive data (30a) , wherein the first mark (61) indicates that the non-sensitive data (30a) is part of the piece of data (30) ;
    - the step of sending (S204) the non-sensitive data (30a) out of the monitored network (10) for data analysis comprises: sending the non-sensitive data (30a) with the first mark (61) ;
    - before the step of receiving (S208) a data analysis task (40) , the method further comprises: making a second mark (62) on the sensitive data (30b) and storing the sensitive data (30b) with the second mark (62) , wherein the second mark (62) indicates that the sensitive data (30b) is part of the piece of data (30) ;
    - the step of receiving (S208) a data analysis task (40) comprises: receiving the data analysis task (40) with the first mark (61) ;
    - before the step of executing (S209) , in the monitored network (10) , the data analysis task (40) , the method further comprises: getting, according to the first mark and the second mark, the sensitive data (30b) .
  6. A method (500) for data analysis at a data analysis center (20) , comprising:
    - receiving (S204) , from a monitored network (10) , non-sensitive data (30a) of a piece of data (30) in the monitored network (10) ;
    - making analysis (S205) on the non-sensitive data (30a) to generate a first data analysis result (51) ;
    - determining (S206) , based on the first result (51) , whether to generate a data analysis task (40) indicating to make further data analysis on sensitive data (30b) of the piece of data (30) ;
    - if determining to generate the data analysis task (40) , generating (S207) and sending (S208) the data analysis task (40) to the monitored network (10) .
  7. The method of claim 6, wherein the step of determining (S206) , based on the first data analysis result (51) , whether to generate a data analysis task (40) comprises:
    - if the first data analysis result (51) indicates that the sensitive data (30b) of the piece of data (30) is also needed to make analysis on, then determining to generate the data analysis task (40) .
  8. The method of claim 6 or 7, wherein,
    - the step of receiving (S204) , from a monitored network (10) , non-sensitive data (30a) of a piece of data (30) in the monitored network (10) comprises: receiving the non-sensitive data (30a) with a first mark (61) , wherein the first mark (61) indicates that the non-sensitive data (30a) is part of the piece of data (30) ;
    - the step of sending (S208) the data analysis task (40) to the monitored network (10) comprises: sending the data analysis task (40) with the first mark (61) .
  9. An apparatus (600) for data analysis at a monitored network (10) , comprising:
    - at least one memory (601) , configured to store instructions;
    - at least one processor (602) , coupled to the at least one memory (601) , and upon execution of the executable instructions, configured to:
    - acquire a piece of data (30) in the monitored network (10) ;
    - extract non-sensitive data (30a) from the piece of data (30) ;
    - send the non-sensitive data (30a) out of the monitored network (10) for data analysis;
    - receive a data analysis task (40) , wherein the data analysis task is generated based on a data analysis first result (51) on the non-sensitive data (30a) and indicates to make further data analysis on sensitive data (30b) of the piece of data (30) ;
    - execute, based on the sensitive data (30b) , the data analysis task (40) to generate a second data analysis result (52) on the piece of data (30) .
  10. The apparatus (600) of claim 9, wherein the at least one processor (602) is further, upon execution of the executable instructions, configured to extract, based on a white list mechanism, the non-sensitive data (30a) from the piece of data (30) when extracting non-sensitive data (30a) from the piece of data (30) .
  11. The apparatus (600) of claim 9 or 10, wherein, when executing the data analysis task (40) , the at least one processor (602) is further, upon execution of the executable instructions and upon configured to:
    - make data analysis on the sensitive data (30b) to generate a third result (53) of data analysis;
    - receive the first data analysis result (51) ;
    - generate the second data analysis result (52) based on the first data analysis result (51) and the third data analysis result (53) .
  12. The apparatus (600) of claim 9 or 10, wherein when executing the data  analysis task (40) , the at least one processor (602) is further, upon execution of the executable instructions, configured to:
    - receive the first data analysis result (51) ;
    - generate the second data analysis result (52) based on the first data analysis result (51) and the sensitive data (30b) .
  13. The apparatus (600) of any of claims 9~12, wherein the at least one processor (602) is further, upon execution of the executable instructions, configured to:
    - before sending the non-sensitive data (30a) out of the monitored network (10) for data analysis, make a first mark (61) on the non-sensitive data (30a) , wherein the first mark (61) indicates that the non-sensitive data (30a) is part of the piece of data (30) ;
    - when sending the non-sensitive data (30a) out of the monitored network (10) for data analysis, send the non-sensitive data (30a) with the first mark (61) ;
    - before receiving a data analysis task (40) , make a second mark (62) on the sensitive data (30b) and store the sensitive data (30b) with the second mark (62) , wherein the second mark (62) indicates that the sensitive data (30b) is part of the piece of data (30) ;
    - when receiving a data analysis task (40) , receive the data analysis task (40) with the first mark (61) ;
    - before executing, in the monitored network (10) , the data analysis task (40) , get according to the first mark and the second mark the sensitive data (30b) .
  14. An apparatus (700) for data analysis at a data analysis center (20) , comprising:
    - at least one memory (701) , configured to store executable instructions;
    - at least one processor (702) , coupled to the at least one memory (601) and upon execution of the executable instructions, configured to:
    - receive, from a monitored network (10) , non-sensitive data (30a) of a piece of data (30) in the monitored network (10) ;
    - make analysis on the non-sensitive data (30a) to generate a first data analysis result (51) ;
    - determine, based on the first data analysis result (51) , whether to generate a data analysis task (40) indicating to make further data analysis on sensitive data (30b) of the piece of data (30) ;
    - if determining to generate the data analysis task (40) , generate and send the data analysis task (40) to the monitored network (10) .
  15. The apparatus (700) of claim 14, wherein when determining, based on the first result (51) , whether to generate a data analysis task (40) , the at least one processor (702) is further, upon execution of the executable instructions, configured to:
    - if the first result (51) indicates that the sensitive data (30b) of the piece of data (30) is also needed to make analysis on, determine to generate the data analysis task (40) .
  16. The apparatus (700) of claim 14 or 15, wherein, at least one processor (702) is further, upon execution of the executable instructions, configured to:
    - when receiving from a monitored network (10) , non-sensitive data (30a) of a piece of data (30) in the monitored network (10) , receive the non-sensitive data (30a) with a first mark (61) , wherein the first mark (61) indicates that the non-sensitive data (30a) is part of the piece of data (30) ;
    - when sending the data analysis task (40) to the monitored network (10) , send the data analysis task (40) with the first mark (61) .
  17. a computer-readable medium, storing executable instructions, which upon execution by a computer, enables the computer to execute the method of any one of the claims 1~8.
  18. A system (100) for data analysis, comprising:
    - a monitored network (10) , configured to
    - acquire a piece of data (30) in the monitored network (10) ,
    - extract non-sensitive data (30a) from the piece of data (30) , and
    - send the non-sensitive data (30a) out of the monitored network (10) to a data analysis center (20) ;
    - the data analysis center (20) , configured to
    - receive, from a monitored network (10) , non-sensitive data (30a) of a piece data (30) transmitted in the monitored network (10) ,
    - make analysis on the non-sensitive data (30a) to generate a first data analysis result (51) ,
    - determine, based on the first data analysis result (51) , whether to generate a  data analysis task (40) indicating to make further data analysis on sensitive data (30b) of the piece of data (30) , and
    - if determining to generate the data analysis task (40) , generate and send the data analysis task (40) to the monitored network (10) ;
    - the monitored network (10) , further configured to
    - receive the data analysis task, and
    - execute, based on the sensitive data (30b) , the data analysis task (40) to generate a second data analysis result (52) on the piece of data (30) .
PCT/CN2018/117283 2018-11-23 2018-11-23 Method, apparatus and system for data analysis WO2020103154A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201880099783.XA CN113168460A (en) 2018-11-23 2018-11-23 Method, device and system for data analysis
PCT/CN2018/117283 WO2020103154A1 (en) 2018-11-23 2018-11-23 Method, apparatus and system for data analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/117283 WO2020103154A1 (en) 2018-11-23 2018-11-23 Method, apparatus and system for data analysis

Publications (1)

Publication Number Publication Date
WO2020103154A1 true WO2020103154A1 (en) 2020-05-28

Family

ID=70774335

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/117283 WO2020103154A1 (en) 2018-11-23 2018-11-23 Method, apparatus and system for data analysis

Country Status (2)

Country Link
CN (1) CN113168460A (en)
WO (1) WO2020103154A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114448819A (en) * 2021-12-24 2022-05-06 固安县艾拉信息科技有限公司 Network real-time data-based password analysis and implementation method
WO2022233236A1 (en) * 2021-05-04 2022-11-10 International Business Machines Corporation Secure data analytics

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009003527A (en) * 2007-06-19 2009-01-08 Toshiba Corp Information communication testing device and medical equipment
US20140289875A1 (en) * 2013-03-22 2014-09-25 Roche Diagnostics Operations, Inc. Method and system for ensuring sensitive data are not accessible
US20150381579A1 (en) * 2014-06-26 2015-12-31 Vivalect Software Ab Method and server for handling of personal information
CN105279366A (en) * 2014-06-11 2016-01-27 西门子公司 Computer system and method for analyzing data
CN106022173A (en) * 2016-05-18 2016-10-12 北京京东尚科信息技术有限公司 Sensitive data display method and apparatus
CN107748848A (en) * 2017-10-16 2018-03-02 维沃移动通信有限公司 A kind of information processing method and mobile terminal
US9946895B1 (en) * 2015-12-15 2018-04-17 Amazon Technologies, Inc. Data obfuscation

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009003527A (en) * 2007-06-19 2009-01-08 Toshiba Corp Information communication testing device and medical equipment
US20140289875A1 (en) * 2013-03-22 2014-09-25 Roche Diagnostics Operations, Inc. Method and system for ensuring sensitive data are not accessible
CN105279366A (en) * 2014-06-11 2016-01-27 西门子公司 Computer system and method for analyzing data
US20150381579A1 (en) * 2014-06-26 2015-12-31 Vivalect Software Ab Method and server for handling of personal information
US9946895B1 (en) * 2015-12-15 2018-04-17 Amazon Technologies, Inc. Data obfuscation
CN106022173A (en) * 2016-05-18 2016-10-12 北京京东尚科信息技术有限公司 Sensitive data display method and apparatus
CN107748848A (en) * 2017-10-16 2018-03-02 维沃移动通信有限公司 A kind of information processing method and mobile terminal

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022233236A1 (en) * 2021-05-04 2022-11-10 International Business Machines Corporation Secure data analytics
CN114448819A (en) * 2021-12-24 2022-05-06 固安县艾拉信息科技有限公司 Network real-time data-based password analysis and implementation method
CN114448819B (en) * 2021-12-24 2024-03-22 固安县艾拉信息科技有限公司 Cryptographic analysis and implementation method based on network real-time data

Also Published As

Publication number Publication date
CN113168460A (en) 2021-07-23

Similar Documents

Publication Publication Date Title
AU2019216687B2 (en) Path scanning for the detection of anomalous subgraphs and use of DNS requests and host agents for anomaly/change detection and network situational awareness
Ganame et al. A global security architecture for intrusion detection on computer networks
WO2015107862A1 (en) Information processing device, method, and program
Amaral et al. Deep IP flow inspection to detect beyond network anomalies
Bidou Security operation center concepts & implementation
CN113079185B (en) Industrial firewall control method and equipment for realizing deep data packet detection control
Neu et al. Lightweight IPS for port scan in OpenFlow SDN networks
Frye et al. An ontology-based system to identify complex network attacks
JP2011154727A (en) Analysis system, analysis method, and analysis program
Wang et al. Efficient and low‐cost defense against distributed denial‐of‐service attacks in SDN‐based networks
WO2020103154A1 (en) Method, apparatus and system for data analysis
Jones et al. Pptp vpn: An analysis of the effects of a ddos attack
La et al. A novel monitoring solution for 6LoWPAN-based Wireless Sensor Networks
d'Estalenx et al. NURSE: eNd-UseR IoT malware detection tool for Smart homEs
CN114172881B (en) Network security verification method, device and system based on prediction
Gad et al. Hierarchical events for efficient distributed network analysis and surveillance
Khirwadkar Defense against network attacks using game theory
Sagala et al. Improving SCADA security using IDS and MikroTIK
Winter Measuring and circumventing Internet censorship
KR20090116206A (en) System for defending client distribute denial of service and method therefor
Anand et al. Network intrusion detection and prevention
Al-thakafi et al. Analysis of IoT devices' Vulnerability Attack Using a Honeypot.
KR100938647B1 (en) Apparatus and method for storing flow data according to results of analysis of flow data
Alagiya et al. Performance analysis and enhancement of utm device in local area network
Liu et al. Community Cleanup: Incentivizing Network Hygiene via Distributed Attack Reporting

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18940587

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18940587

Country of ref document: EP

Kind code of ref document: A1