WO2023029066A1 - 流式数据的特征提取方法及装置、存储介质、计算机设备 - Google Patents

流式数据的特征提取方法及装置、存储介质、计算机设备 Download PDF

Info

Publication number
WO2023029066A1
WO2023029066A1 PCT/CN2021/117111 CN2021117111W WO2023029066A1 WO 2023029066 A1 WO2023029066 A1 WO 2023029066A1 CN 2021117111 W CN2021117111 W CN 2021117111W WO 2023029066 A1 WO2023029066 A1 WO 2023029066A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
feature
target
network security
feature data
Prior art date
Application number
PCT/CN2021/117111
Other languages
English (en)
French (fr)
Inventor
辜乘风
徐�明
魏国富
殷钱安
周晓勇
陶景龙
余贤喆
梁淑云
刘胜
王启凡
马影
Original Assignee
上海观安信息技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海观安信息技术股份有限公司 filed Critical 上海观安信息技术股份有限公司
Publication of WO2023029066A1 publication Critical patent/WO2023029066A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/02Capturing of monitoring data
    • H04L43/026Capturing of monitoring data using flow identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24568Data stream processing; Continuous queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/12Network monitoring probes

Definitions

  • the present application relates to the technical field of data processing, in particular to a feature extraction method and device, storage medium, and computer equipment for streaming data.
  • Streaming data is a set of sequential, massive, fast, and continuous arrival data sequences, and it is a dynamic data collection that continues to increase over time. Due to the characteristics of low latency and high throughput, streaming data is widely used in services that require high real-time data, such as network security services such as judgment of failed hosts and extraction of DNS request numbers.
  • the present application provides a stream data feature extraction method and device, storage medium, and computer equipment, which can perform real-time feature extraction on stream data in network security services, and make full use of the low cost of stream data. While improving the latency characteristics, it reduces the resource usage.
  • a method for feature extraction of streaming data including:
  • the network security feature extraction requirements include at least one target dimension to be extracted and at least one target feature to be extracted, and the target dimension Including at least one of IP dimension, time dimension and mac local area network address dimension, and the target feature includes at least one of DNS request quantity characteristic, ICMP request quantity characteristic, HTTP request quantity characteristic, DNS domain name set characteristic and page visit times characteristic kind;
  • the network security feature extraction requirement includes requesting the number of DNS domain name deduplication, and the requested number of DNS domain name deduplication includes the IP dimension and the set feature of the DNS domain name.
  • the method before acquiring the network security feature extraction requirements corresponding to the streaming data, the method further includes:
  • the streaming data is filtered according to preset data filtering conditions, wherein the preset data filtering conditions include a preset data protocol.
  • the generating a feature data extractor according to the target dimension and the target feature specifically includes:
  • the feature data extractor includes an extractor head and an extractor body, and the extractor head Used to indicate the target dimension and the target feature, the extractor body includes the feature data extraction tool.
  • using the feature data extractor to extract network security feature data corresponding to the target dimension and the target feature in the streaming data specifically includes:
  • the method further includes:
  • a feature data set list is generated, wherein the feature data set list includes a first header, a second header, and each first A table header and a list result corresponding to each second table header, the first table header includes the target dimension, the second table header includes the target feature, and the list result includes each target dimension and each The network security feature data corresponding to each target feature.
  • the method further includes:
  • sample data acquisition instruction In response to a sample data acquisition instruction, search for target feature data corresponding to the sample data acquisition instruction from the feature data set list, wherein the sample data acquisition instruction includes any first header and/or any second header header, the sample data acquisition instruction is used to acquire model training samples;
  • the target feature data is converted into numerical data, so as to use the converted target feature data for model training.
  • the method before generating the feature data set list based on the extractor header of the feature data extractor and the network security feature data, the method further includes:
  • the network security feature data is converted into numerical data, so as to use the converted network security feature data to generate a list of feature data sets.
  • the method before receiving streaming data, the method further includes:
  • the method further includes:
  • the model is trained.
  • a feature extraction device for streaming data including:
  • the requirement acquisition module receives streaming data, and acquires network security feature extraction requirements corresponding to the streaming data, wherein the network security feature extraction requirements include at least one target dimension to be extracted and at least one target feature to be extracted,
  • the target dimension includes at least one of IP dimension, time dimension and mac local area network address dimension
  • the target feature includes DNS request quantity feature, ICMP request quantity feature, HTTP request quantity feature, DNS domain name set feature and page access times feature at least one of;
  • An extractor generating module configured to generate a feature data extractor according to the target dimension and the target feature
  • the feature data extraction module uses the feature data extractor to extract network security feature data corresponding to the target dimension and the target feature in the streaming data.
  • the network security feature extraction requirement includes requesting the number of DNS domain name deduplication, and the requested number of DNS domain name deduplication includes the IP dimension and the set feature of the DNS domain name.
  • the device also includes:
  • the streaming data screening module is used to filter the streaming data according to preset data filtering conditions before acquiring the network security feature extraction requirements corresponding to the streaming data, wherein the preset data filtering conditions Includes preset data protocols.
  • the extractor generation module specifically includes:
  • An extraction tool establishment unit configured to establish a feature data extraction tool that matches each target feature respectively according to each target feature
  • An extractor generation unit configured to generate the feature data extractor according to the target dimension, the target feature and the feature data extraction tool, wherein the feature data extractor includes an extractor head and an extractor body , the extractor head is used to indicate the target dimension and the target feature, and the extractor body includes the feature data extraction tool.
  • the feature data extraction module is specifically used for:
  • the device also includes:
  • a list generating module configured to extract the network security feature data corresponding to the target dimension and the target feature in the streaming data, based on the extractor header of the feature data extractor and the network security feature data, generating a feature data set list, wherein the feature data set list includes a first header, a second header, and list results corresponding to each first header and each second header, and the first header A header includes the target dimension, the second header includes the target feature, and the list result includes the network security feature data corresponding to each target dimension and each target feature.
  • the device also includes:
  • a search module configured to search the feature data set list for target feature data corresponding to the sample data acquisition instruction according to the sample data acquisition instruction, wherein the sample data acquisition instruction includes any first header and/or or any second table header, the sample data acquisition instruction is used to acquire model training samples;
  • a judging module configured to judge whether the target characteristic data is numerical data, and when the target characteristic data is non-numeric data, call a corresponding data processing model based on the data type of the target characteristic data;
  • the conversion module is used to convert the target feature data into numerical data according to the data processing model, so as to use the converted target feature data for model training.
  • the device also includes:
  • the judging module is used to judge whether the network security feature data is numerical data before generating the feature data set list based on the extractor head of the feature data extractor and the network security feature data, and when the When the network security feature data is non-numeric data, based on the data type of the network security feature data, call the corresponding data processing model;
  • the conversion module is configured to convert the network security characteristic data into numerical data according to the data processing model, so as to generate a characteristic data set list by using the converted network security characteristic data.
  • the device also includes:
  • the training task receiving module is used to receive the model training task before receiving the streaming data
  • a requirement determination module configured to analyze the characteristics of the training samples required to perform the model training task, and determine the network security feature extraction requirements according to the characteristics of the training samples
  • the device also includes:
  • the training sample set determination module is configured to read the network security feature data corresponding to each header in the feature data set list after generating the feature data set list, and establish the network security feature data according to the read network security feature data.
  • a model training module configured to use the training sample set to train the model.
  • a storage medium on which a computer program is stored, and when the program is executed by a processor, the aforementioned method for feature extraction of streaming data is implemented.
  • a computer device including a storage medium, a processor, and a computer program stored on the storage medium and operable on the processor.
  • the processor executes the program, the above stream Data feature extraction methods.
  • the present application provides a streaming data feature extraction method and device, storage medium, and computer equipment to receive streaming data and obtain network security feature extraction that requires network security feature data extraction for streaming data.
  • analyze the network security feature extraction requirements obtain at least one target dimension to be extracted and at least one target feature to be extracted, generate a feature data extractor according to the target dimension and target feature, and then according to the generated feature data extractor , to extract the network security feature data of the received streaming data.
  • this application can perform instant feature extraction on streaming data, while fully utilizing the low-latency characteristics of streaming data, reducing resource usage.
  • FIG. 1 shows a schematic flow chart of a feature extraction method for streaming data provided by an embodiment of the present application
  • FIG. 2 shows a schematic flowchart of another method for feature extraction of streaming data provided by the embodiment of the present application
  • FIG. 3 shows a schematic structural diagram of a feature data extractor provided in an embodiment of the present application
  • FIG. 4 shows a schematic structural diagram of a feature data set list provided by an embodiment of the present application
  • FIG. 5 shows a schematic structural diagram of an apparatus for feature extraction of streaming data provided by an embodiment of the present application.
  • a method for feature extraction of streaming data includes:
  • Step 101 receiving streaming data, and obtaining network security feature extraction requirements corresponding to the streaming data, wherein the network security feature extraction requirements include at least one target dimension to be extracted and at least one target feature to be extracted, the The target dimension includes at least one of IP dimension, time dimension and mac local area network address dimension, and the target feature includes DNS request quantity feature, ICMP request quantity feature, HTTP request quantity feature, DNS domain name set feature and page access times feature at least one of
  • the streaming data can be used in network security services such as determining a compromised host and extracting the number of DNS requests.
  • the streaming data feature extraction method provided in the embodiment of the present application receives the streaming data and acquires network security feature extraction requirements for network security feature data extraction from the streaming data. Analyze the network security feature extraction requirements, and obtain the target dimension to be extracted and the target feature to be extracted. There are at least one target dimension to be extracted and target feature to be extracted, because different businesses may have different network security features Data extraction requirements, therefore, there may be multiple target dimensions to be extracted and target features to be extracted.
  • the target dimension can be IP dimension, time dimension and mac LAN address dimension, etc.
  • the target feature can be DNS (Domain Name System, computer domain name) request quantity feature, ICMP (Internet Control Message Protocol, Internet Control Message Protocol) request quantity Features, HTTP (HyperText Transfer Protocol, hypertext transfer protocol) request quantity features, DNS domain name set features, and page access times features, etc.
  • DNS Domain Name System, computer domain name
  • ICMP Internet Control Message Protocol, Internet Control Message Protocol
  • HTTP HyperText Transfer Protocol, hypertext transfer protocol
  • DNS domain name set features and page access times features
  • Step 102 generating a feature data extractor according to the target dimension and the target feature
  • a feature data extractor is generated according to the target dimension and target feature. If there are multiple target features, a feature data extractor is generated according to different target features and all target dimensions. For example, if a network security feature extraction requirement includes 2 target dimensions (respectively target dimension A and target dimension B), and 3 target features (respectively target feature a, target feature b, and target feature c), then correspondingly generated 3 feature data extractors (respectively feature data extractor 1, feature data extractor 2 and feature data extractor 3), feature data extractor 1 can be generated based on target dimension A, target dimension B and target feature a, The feature data extractor 2 may be generated based on the target dimension A, target dimension B and target feature b, and the feature data extractor 3 may be generated based on the target dimension A, target dimension B and target feature c.
  • the above description about the feature data extractor is only an example, which should not limit the protection scope of the present application.
  • Step 103 using the feature data extractor to extract network security feature data corresponding to the target dimension and the target feature in the streaming data.
  • the network security feature data of the received streaming data is extracted, and the network security feature data corresponds to the target dimension and the target feature.
  • the target feature can be the number of DNS requests sent by hosts in the intranet
  • the target dimension can be the IP address of the host.
  • a corresponding The characteristic data extractor and according to the characteristic data extractor, extract the specific DNS request quantity value in the streaming data.
  • stream data is received, and network security feature extraction requirements that require network security feature data extraction for streaming data are obtained, network security feature extraction requirements are analyzed, and at least one target to be extracted is obtained.
  • Dimensions and at least one target feature to be extracted generate a feature data extractor according to the target dimension and target feature, and then extract network security feature data of the received streaming data according to the generated feature data extractor.
  • Step 201 receiving streaming data, and filtering the streaming data according to preset data filtering conditions, wherein the preset data filtering conditions include preset data protocols;
  • the received streaming data is screened according to preset data filtering conditions, and the streaming data that meets the preset data filtering conditions can enter subsequent processing.
  • the preset data filtering conditions may also be different. Due to the huge amount of streaming data and various formats, the data protocol is usually included in the preset data filtering conditions, and a part of the streaming data is filtered out through the preset data protocol. For example, taking the network traffic analysis scenario as an example, the obtained streaming data contains a large number of different data protocol types, including DNS protocol, TCP protocol or UDP protocol, etc.
  • the obtained stream data can be preliminarily screened, and the stream data conforming to the DNS data protocol type can be screened out, and subsequent operations can be performed.
  • the embodiment of the present application screens the received streaming data by preset data filtering conditions, which is beneficial to reduce the processing amount of the streaming data, reduce the resource occupation of the computer, and improve the processing efficiency of the streaming data.
  • Step 202 acquire network security feature extraction requirements corresponding to the streaming data, wherein the network security feature extraction requirements include at least one target dimension to be extracted and at least one target feature to be extracted, and the target dimension includes IP dimension , at least one of the time dimension and the mac LAN address dimension, the target feature includes at least one of the DNS request quantity feature, the ICMP request quantity feature, the HTTP request quantity feature, the DNS domain name set feature, and the page access times feature;
  • Step 203 generating a feature data extractor according to the target dimension and the target feature
  • step 203 specifically includes:
  • each target feature establish a feature data extraction tool that matches each of the target features respectively; according to the target dimension, the target feature and the feature data extraction tool, generate the feature data extractor, wherein , the feature data extractor includes an extractor head and an extractor body, the extractor head is used to indicate the target dimension and the target feature, and the extractor body includes the feature data extraction tool.
  • a feature data extraction tool matching the target feature is established, and then the target dimension, target Features and feature data extraction tools generate feature data extractors.
  • the feature data extractor mainly includes two components, one is the extractor head and the other is the extractor body.
  • the extractor head may indicate the target dimension and the target feature, and the extractor body may include a feature data extraction tool corresponding to the target dimension and the target feature.
  • the target feature can be the maximum value of streaming data
  • the corresponding feature data extraction tool can be a function method, that is, the MAX() function.
  • the maximum value of the streaming data filtered by the target dimension is dynamically calculated, and the corresponding network security feature data can be updated according to the MAX() function, and the final maximum value is used as the network security feature data .
  • the structure of the feature data extractor is shown in Figure 3.
  • Step 204 using the feature data extractor to extract the network security feature data corresponding to the target dimension and the target feature in the streaming data; in the embodiment of the present application, optionally, step 204 specifically includes :
  • the streaming data is input into the feature data extractor, and after receiving the streaming data, the feature data extractor finds the current data in the feature extractor according to the target dimension. Group, if the group does not exist, create a new group, and then use the feature data extraction tool to extract the network security feature data that matches the target feature from the grouped streaming data and update it to the group where the current streaming data is located.
  • the target dimension is the host IP address
  • the network security feature data After extracting the network security feature data, update the network security feature data of the 192.168.10.10 group.
  • the current stream data is grouped by the target dimension in the feature data extractor, and then the network security feature data is extracted and updated for the grouped stream data, and each time the data is received, only the group value corresponding to the data is required Update, other groups remain unchanged, improving the extraction efficiency of network security feature data of streaming data.
  • Step 205 Generate a feature data set list based on the extractor header of the feature data extractor and the network security feature data, wherein the feature data set list includes a first header, a second header, and each A list result corresponding to a first table header and each second table header, the first table header includes the target dimension, the second table header includes the target feature, and the list result includes each target Dimensions and the network security feature data corresponding to each target feature;
  • the filtered streaming data is received, and the corresponding network security feature extraction requirements are obtained.
  • the network security feature extraction requirements may include at least one target dimension to be extracted and at least one target feature to be extracted, and then according to the target Dimensions and target features, generate the corresponding feature data extractor, and extract the corresponding network security feature data through the feature data extractor.
  • a feature data set list is generated.
  • the feature data set list may include three components, namely a first header part, a second header part and a list result part, wherein each list result corresponds to a first header and a second header respectively.
  • the first header may be the target dimension
  • the second header may be the target feature
  • the list result may be network security feature data corresponding to each target dimension and each target feature.
  • Step 206 in response to the sample data acquisition instruction, search for the target feature data corresponding to the sample data acquisition instruction from the feature data set list, wherein the sample data acquisition instruction includes any first header and/or Any second header, the sample data acquisition instruction is used to acquire model training samples;
  • the relevant staff wants to use the network security feature data in the feature data set list
  • search and sample data acquisition from the feature data set list The target characteristic data corresponding to the instruction.
  • any corresponding first header and/or any second header may be parsed from the sample data acquisition instruction, and corresponding target characteristic data may be searched according to the first header and the second header.
  • the sample data obtaining instruction is used to obtain target feature data from the list of feature data sets, and then use the target feature data as samples for model training.
  • Step 207 judging whether the target characteristic data is numerical data, and calling a corresponding data processing model based on the data type of the target characteristic data when the target characteristic data is non-numeric data;
  • the target feature data is found, it is judged whether the target feature data is numerical data. Since the target feature data is used to train the model, as the input of the machine learning model, it is necessary to ensure that the target feature data is numerical data before being input into the machine learning model, and the network security feature data extracted by the feature data extractor may have many types , such as data types such as collections, dictionaries, and tuples. When the target feature data is non-numeric data, according to the data type of the target feature data, a data processing model corresponding to the data type is invoked.
  • the feature data extractor when the target feature data is the number of de-duplicated DNS domain names requested by the host, the feature data extractor returns a data set containing all DNS domain names, calls the corresponding data processing model, counts the set length of the DNS domain names, and finally The obtained target feature data is the number of DNS domain name deduplication.
  • Step 208 convert the target feature data into numerical data, so as to use the converted target feature data for model training.
  • the non-numerical target feature data is converted into numerical data, and then the converted target feature data can be used for model training.
  • the network security feature extraction requirement includes requesting DNS domain name deduplication quantity
  • the requesting DNS domain name deduplication quantity includes the IP dimension and the DNS domain name set feature
  • the embodiment of the present application provides another method for feature extraction of streaming data, which method includes:
  • Step 301 receiving a model training task; analyzing the characteristics of the training samples required to perform the model training task, and determining the network security feature extraction requirements according to the characteristics of the training samples;
  • the staff when the staff wants to train the model, they receive the model training task, analyze which training sample features are required to perform the model training task, and further determine the corresponding network security feature extraction requirements according to the training sample features.
  • the training sample features can include the training sample features corresponding to the training samples input into the model and the training sample features corresponding to the training samples output from the model, and then respectively determine the network security feature extraction requirements of the model input training samples, and the model output training samples.
  • Network Security Feature Extraction Requirements For example, when the staff wants to train the network speed identification model, they receive the training task of the network speed identification model, and determine the training sample characteristics required for the analysis of the training task according to the input of the staff, and then determine the network security feature extraction requirements.
  • the required training sample characteristics are determined, and then the corresponding network security feature extraction requirements are determined, so that the network security feature data for model training can be automatically extracted in the future according to the network security feature extraction requirements, and the user can be improved. Acquisition efficiency of network security feature data for training models.
  • Step 302 receiving streaming data, and obtaining network security feature extraction requirements corresponding to the streaming data, wherein the network security feature extraction requirements include at least one target dimension to be extracted and at least one target feature to be extracted;
  • Step 303 generating a feature data extractor according to the target dimension and the target feature
  • Step 304 using the feature data extractor to extract network security feature data corresponding to the target dimension and the target feature in the streaming data;
  • stream data is received, and corresponding network security feature extraction requirements are obtained.
  • the network security feature extraction requirements may include at least one target dimension to be extracted and at least one target feature to be extracted, and then according to the target dimension and target feature, generate a corresponding feature data extractor, and extract corresponding network security feature data through the feature data extractor.
  • Step 305 judging whether the network security feature data is numerical data, and when the network security feature data is non-numerical data, calling a corresponding data processing model based on the data type of the network security feature data;
  • Step 306 Convert the network security feature data into numerical data according to the data processing model, so as to generate a list of feature data sets by using the converted network security feature data;
  • the network security feature data after extracting the corresponding network security feature data according to the feature data extractor, the network security feature data can also be judged one by one.
  • the network security feature data Based on the data type of the data, call the corresponding data processing model, and further convert the non-numeric feature data into the corresponding numerical feature data according to the data processing model, and then use the converted network security feature data to generate a list of feature data sets .
  • Step 307 Generate a feature data set list based on the extractor header of the feature data extractor and the network security feature data, wherein the feature data set list includes a first header, a second header, and each A list result corresponding to a first table header and each second table header, the first table header includes the target dimension, the second table header includes the target feature, and the list result includes each target Dimensions and the network security feature data corresponding to each target feature;
  • a feature data set list is generated based on the extractor head of the feature data extractor and the network security feature data extracted by the feature data extractor .
  • the feature data set list may include three components, namely a first header part, a second header part and a list result part, wherein each list result corresponds to a first header and a second header respectively.
  • the first header may be the target dimension
  • the second header may be the target feature
  • the list result may be numeric feature data corresponding to each target dimension and each target feature, as shown in FIG. 4 .
  • the network security feature data in the feature data collection list generated by the embodiment of the present application are all numerical feature data, which can be directly used by the staff when accessing the feature data, and can be directly input into the machine learning model, which is conducive to improving the machine model training efficiency.
  • Step 308 read the network security feature data corresponding to each header in the feature data set list, and establish a training sample set corresponding to the model training task according to the read network security feature data; use the training sample set , to train the model.
  • a training sample set corresponding to the model training task is established according to the read network security feature data, and the training sample set
  • the network security feature data are all numerical feature data, and the training sample set is further used to train the model.
  • the embodiment of the present application provides a device for feature extraction of streaming data, as shown in Figure 5, the device includes:
  • the requirement acquisition module receives streaming data, and acquires network security feature extraction requirements corresponding to the streaming data, wherein the network security feature extraction requirements include at least one target dimension to be extracted and at least one target feature to be extracted,
  • the target dimension includes at least one of IP dimension, time dimension and mac local area network address dimension
  • the target feature includes DNS request quantity feature, ICMP request quantity feature, HTTP request quantity feature, DNS domain name set feature and page access times feature at least one of;
  • An extractor generating module configured to generate a feature data extractor according to the target dimension and the target feature
  • the feature data extraction module uses the feature data extractor to extract network security feature data corresponding to the target dimension and the target feature in the streaming data.
  • the network security feature extraction requirement includes requesting the number of DNS domain name deduplication, and the requested number of DNS domain name deduplication includes the IP dimension and the set feature of the DNS domain name.
  • the device further includes:
  • the streaming data screening module is used to filter the streaming data according to preset data filtering conditions before acquiring the network security feature extraction requirements corresponding to the streaming data, wherein the preset data filtering conditions Includes preset data protocols.
  • the extractor generation module specifically includes:
  • An extraction tool establishment unit configured to establish a feature data extraction tool that matches each target feature respectively according to each target feature
  • An extractor generation unit configured to generate the feature data extractor according to the target dimension, the target feature and the feature data extraction tool, wherein the feature data extractor includes an extractor head and an extractor body , the extractor head is used to indicate the target dimension and the target feature, and the extractor body includes the feature data extraction tool.
  • the feature data extraction module is specifically used for:
  • the device further includes:
  • a list generating module configured to extract the network security feature data corresponding to the target dimension and the target feature in the streaming data, based on the extractor header of the feature data extractor and the network security feature data, generating a feature data set list, wherein the feature data set list includes a first header, a second header, and list results corresponding to each first header and each second header, and the first header A header includes the target dimension, the second header includes the target feature, and the list result includes the network security feature data corresponding to each target dimension and each target feature.
  • the device further includes:
  • a search module configured to search the feature data set list for target feature data corresponding to the sample data acquisition instruction according to the sample data acquisition instruction, wherein the sample data acquisition instruction includes any first header and/or or any second table header, the sample data acquisition instruction is used to acquire model training samples;
  • a judging module configured to judge whether the target characteristic data is numerical data, and when the target characteristic data is non-numeric data, call a corresponding data processing model based on the data type of the target characteristic data;
  • the conversion module is used to convert the target feature data into numerical data according to the data processing model, so as to use the converted target feature data for model training.
  • the device further includes:
  • the judging module is used to judge whether the network security feature data is numerical data before generating the feature data set list based on the extractor head of the feature data extractor and the network security feature data, and when the When the network security feature data is non-numeric data, based on the data type of the network security feature data, call the corresponding data processing model;
  • the conversion module is configured to convert the network security characteristic data into numerical data according to the data processing model, so as to generate a characteristic data set list by using the converted network security characteristic data.
  • the device further includes:
  • the training task receiving module is used to receive the model training task before receiving the streaming data
  • a requirement determination module configured to analyze the characteristics of the training samples required to perform the model training task, and determine the network security feature extraction requirements according to the characteristics of the training samples
  • the device also includes:
  • the training sample set determination module is configured to read the network security feature data corresponding to each header in the feature data set list after generating the feature data set list, and establish the network security feature data according to the read network security feature data.
  • a model training module configured to use the training sample set to train the model.
  • a storage medium on which a computer program is stored, and when the program is executed by a processor, the aforementioned method for feature extraction of streaming data is implemented.
  • the embodiment of the present application also provides a storage medium on which a computer program is stored, and when the computer program is executed by a processor, the above-mentioned steps as shown in Figures 1 to 2 are realized.
  • Feature extraction method for streaming data shown.
  • the technical solution of the present application can be embodied in the form of software products, which can be stored in a non-volatile storage medium (which can be CD-ROM, U disk, mobile hard disk, etc.), including several The instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute the methods described in various implementation scenarios of the present application.
  • a non-volatile storage medium which can be CD-ROM, U disk, mobile hard disk, etc.
  • the instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute the methods described in various implementation scenarios of the present application.
  • the embodiment of this application also provides a computer device, which can be specifically a personal computer, a server, a network equipment, etc.
  • the computer equipment includes a storage medium and a processor; the storage medium is used to store a computer program; the processor is used to execute the computer program to implement the feature extraction method of streaming data as shown in Figures 1 to 2 above.
  • the computer device may also include a user interface, a network interface, a camera, a radio frequency (Radio Frequency, RF) circuit, a sensor, an audio circuit, a WI-FI module, and the like.
  • the user interface may include a display screen (Display), an input unit such as a keyboard (Keyboard), and the like, and optional user interfaces may also include a USB interface, a card reader interface, and the like.
  • the network interface may include a standard wired interface, a wireless interface (such as a Bluetooth interface, a WI-FI interface) and the like.
  • a computer device does not constitute a limitation to the computer device, and may include more or less components, or combine some components, or arrange different components.
  • the storage medium may also include an operating system and a network communication module.
  • An operating system is a program that manages and maintains the hardware and software resources of a computer device, and supports the operation of information processing programs and other software and/or programs.
  • the network communication module is used to realize the communication between various components inside the storage medium, as well as communicate with other hardware and software in the physical device.
  • the present application can be realized by means of software plus a necessary general-purpose hardware platform, or by hardware.
  • Receive streaming data and obtain network security feature extraction requirements that require network security feature data extraction from streaming data, analyze network security feature extraction requirements, and obtain at least one target dimension to be extracted and at least one target feature to be extracted , generate a feature data extractor according to the target dimension and target feature, and then extract network security feature data of the received stream data according to the generated feature data extractor.
  • this application can perform instant feature extraction on streaming data, while fully utilizing the low-latency characteristics of streaming data, reducing resource usage.
  • the accompanying drawing is only a schematic diagram of a preferred implementation scenario, and the modules or processes in the accompanying drawings are not necessarily necessary for implementing the present application.
  • the modules in the devices in the implementation scenario can be distributed among the devices in the implementation scenario according to the description of the implementation scenario, or can be located in one or more devices different from the implementation scenario according to corresponding changes.
  • the modules of the above implementation scenarios can be combined into one module, or can be further split into multiple sub-modules.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本申请公开了一种流式数据的特征提取方法及装置、存储介质、计算机设备,该方法包括:接收流式数据,并获取所述流式数据对应的网络安全特征提取需求,其中,所述网络安全特征提取需求包括至少一个待提取的目标维度以及至少一个待提取的目标特征;依据所述目标维度以及所述目标特征,生成特征数据提取器;利用所述特征数据提取器,提取所述流式数据中与所述目标维度以及所述目标特征对应的网络安全特征数据。本申请通过构建特征数据提取器,并通过特征数据提取器提取流式数据的网络安全特征数据,能够对流式数据进行即时性特征提取,在充分发挥流式数据的低延迟性特点的同时,减少资源的占用量。

Description

流式数据的特征提取方法及装置、存储介质、计算机设备 技术领域
本申请涉及数据处理技术领域,尤其是涉及到一种流式数据的特征提取方法及装置、存储介质、计算机设备。
背景技术
流式数据是一组顺序、大量、快速、连续到达的数据序列,是一种随时间延续而不断增加的动态数据集合。由于流式数据具有低延迟、高吞吐的特性,因而被广泛应用于对数据实时性要求较高的业务中,例如失陷主机判断、DNS请求数量提取等网络安全业务。
网络安全业务中,大部分流式数据均需要经过特征提取后再加以应用,而当前在对流式数据进行特征提取时,通常根据网络安全业务实际情况,预先设定时间周期,之后按照该时间周期对流式数据进行统一处理,提取出想要的特征。这种特征提取方法一方面无法充分发挥流式数据的低延迟的特点,一方面按照时间周期对流式数据统一进行特征提取前,需要对这些流式数据进行统一存储,当流式数据的吞吐量较高时,需要占用大量的主机资源。
因此,如何对网络安全业务中的流式数据进行即时性特征提取,在充分发挥流式数据的低延迟性特点的同时,减少资源的占用量,成为了本领域亟待解决的问题。
发明内容
有鉴于此,本申请提供了一种流式数据的特征提取方法及装置、存储介质、计算机设备,能够对网络安全业务中的流式数据进行即时性特征提取,在充分发挥流式数据的低延迟性特点的同时,减少资源的占用量。
根据本申请的一个方面,提供了一种流式数据的特征提取方法,包括:
接收流式数据,并获取所述流式数据对应的网络安全特征提取需求,其中,所述网络安全特征提取需求包括至少一个待提取的目标维度以及至少一个待提取的目标特征,所述目标维度包括IP维度、时间维度以及mac局域网地址维度中的至少一种,所述目标特征包括DNS请求数量特征、ICMP请求数量特征、HTTP请求数量特征、DNS域名集合特征以及页面访问次数特征中的至少一种;
依据所述目标维度以及所述目标特征,生成特征数据提取器;
利用所述特征数据提取器,提取所述流式数据中与所述目标维度以及所述目标特征对应的网络安全特征数据。
可选地,所述网络安全特征提取需求包括请求DNS域名去重数量,所述请求DNS域名去重数量包括所述IP维度以及所述DNS域名集合特征。
可选地,所述获取所述流式数据对应的网络安全特征提取需求之前,所述方法还包括:
依据预设数据筛选条件,对所述流式数据进行筛选,其中,所述预设数据筛选条件包括预设数据协议。
可选地,所述依据所述目标维度以及所述目标特征,生成特征数据提取器,具体包括:
分别依据每个目标特征,建立与所述每个目标特征各自匹配的特征数据提取工具;
依据所述目标维度、所述目标特征以及所述特征数据提取工具,生成所述特征数据提取器,其中,所述特征数据提取器包括提取器头部以及提取器主体,所述提取器头部用于指示所述目标维度以及所述目标特征,所述提取器主体包括所述特征数据提取工具。
可选地,所述利用所述特征数据提取器,提取所述流式数据中与所述目标维度以及所述目标特征对应的网络安全特征数据,具体包括:
将所述流式数据输入到所述特征数据提取器中,以使所述特征数据提取器按所述目标维度对所述流式数据进行分组,并从分组后的流式数据中提取出与所述目标特征匹配的网络安全特征数据。
可选地,所述提取所述流式数据中与所述目标维度以及所述目标特征对应的网络安全特征数据之后,所述方法还包括:
基于所述特征数据提取器的提取器头部以及所述网络安全特征数据,生成特征数据集合列表,其中,所述特征数据集合列表包括第一表头、第二表头以及与每个第一表头和每个第二表头对应的列表结果,所述第一表头包括所述目标维度,所述第二表头包括所述目标特征,所述列表结果包括与每个目标维度和每个目标特征对应的所述网络安全特征数据。
可选地,所述生成特征数据集合列表之后,所述方法还包括:
响应于样本数据获取指令,从所述特征数据集合列表中查找与所述样本数据获取指令相对应的目标特征数据,其中,所述样本数据获取指令包括任意第一表头和/或任意第二表头,所述样本数据获取指令用于获取模型训练样本;
判断所述目标特征数据是否为数值型数据,并当所述目标特征数据为非数值型数据时,基于所述目标特征数据的数据类型,调用对应的数据处理模型;
依据所述数据处理模型,将所述目标特征数据转化为数值型数据,以利用转化后的目标特征数据进行模型训练。
可选地,所述基于所述特征数据提取器的提取器头部以及所述网络安全特征数据,生成特征数据集合列表之前,所述方法还包括:
判断所述网络安全特征数据是否为数值型数据,并当所述网络安全特征数据为非数值型数据时,基于所述网络安全特征数据的数据类型,调用对应的数据处理模型;
依据所述数据处理模型,将所述网络安全特征数据转化为数值型数据,以利用转化后的网络安全特征数据生成特征数据集合列表。
可选地,所述接收流式数据之前,所述方法还包括:
接收模型训练任务;
分析执行所述模型训练任务所需的训练样本特征,并依据所述训练样本特征,确定所述网络安全特征提取需求;
相应地,所述生成特征数据集合列表之后,所述方法还包括:
读取所述特征数据集合列表中各表头对应的网络安全特征数据,并依据读取出的网络安全特征数据建立所述模型训练任务对应的训练样本集;
利用所述训练样本集,训练所述模型。
根据本申请的另一方面,提供了一种流式数据的特征提取装置,包括:
需求获取模块,接收流式数据,并获取所述流式数据对应的网络安全特征提取需求,其中,所述网络安全特征提取需求包括至少一个待提取的目标维度以及至少一个待提取的目标特征,所述目标维度包括IP维度、时间维度以及mac局域网地址维度中的至少一种,所述目标特征包括DNS请求数量特征、ICMP请求数量特征、HTTP请求数量特征、DNS域名集合特征以及页面访问次数特征中的至少一种;
提取器生成模块,用于依据所述目标维度以及所述目标特征,生成特征数据提取器;
特征数据提取模块,利用所述特征数据提取器,提取所述流式数据中与所述目标维度以及所述目标特征对应的网络安全特征数据。
可选地,所述网络安全特征提取需求包括请求DNS域名去重数量,所述请求DNS域名去重数量包括所述IP维度以及所述DNS域名集合特征。
可选地,所述装置还包括:
流式数据筛选模块,用于所述获取所述流式数据对应的网络安全特征提取需求之前,依据预设数据筛选条件,对所述流式数据进行筛选,其中,所述预设数据筛选条件包括预设数据协议。
可选地,所述提取器生成模块,具体包括:
提取工具建立单元,用于分别依据每个目标特征,建立与所述每个目标特征各自匹配的特征数据提取工具;
提取器生成单元,用于依据所述目标维度、所述目标特征以及所述特征数据提取工具,生成所述特征数据提取器,其中,所述特征数据提取器包括提取器头部以及提取器主体,所述提取器头部用于指示所述目标维度以及所述目标特征,所述提取器主体包括所述特征数据提取工具。
可选地,所述特征数据提取模块,具体用于:
将所述流式数据输入到所述特征数据提取器中,以使所述特征数据提取器按所述目标维度对所述流式数据进行分组,并从分组后的流式数据中提取出与所述目标特征匹配的网络安全特征数据。
可选地,所述装置还包括:
列表生成模块,用于所述提取所述流式数据中与所述目标维度以及所述目标特征对应的网络安全特征数据之后,基于所述特征数据提取器的提取器头部以及所述网络安全特征数据,生成特征数据集合列表,其中,所述特征数据集合列表包括第一表头、第二表头以及与每个第一表头和每个第二表头对应的列表结果,所述第一表头包括所述目标维度,所述第二表头包括所述目标特征,所述列表结果包括与每个目标维度和每个目标特征对应的所述网络安全特征数据。
可选地,所述装置还包括:
查找模块,用于根据样本数据获取指令,从所述特征数据集合列表中查找与所述样本数据获取指令相对应的目标特征数据,其中,所述样本数据获取指令包括任意第一表头和/或任意第二表头,所述样本数据获取指令用于获取模型训练样本;
判断模块,用于判断所述目标特征数据是否为数值型数据,并当所述目标特征数据为非数值型数据时,基于所述目标特征数据的数据类型,调用对应的数据处理模型;
转化模块,用于依据所述数据处理模型,将所述目标特征数据转化为数值型数据,以利用转化后的目标特征数据进行模型训练。
可选地,所述装置还包括:
判断模块,用于所述基于所述特征数据提取器的提取器头部以及所述网络安全特征数据,生成特征数据集合列表之前,判断所述网络安全特征数据是否为数值型数据,并当所述网络安全特征数据为非数值型数据时,基于所述网络安全特征数据的数据类型,调用对应的数据处理模型;
转化模块,用于依据所述数据处理模型,将所述网络安全特征数据转化为数值型数据,以利用转化后的网络安全特征数据生成特征数据集合列表。
可选地,所述装置还包括:
训练任务接收模块,用于所述接收流式数据之前,接收模型训练任务;
需求确定模块,用于分析执行所述模型训练任务所需的训练样本特征,并依据所述训练样本特征,确定所述网络安全特征提取需求;
相应地,所述装置还包括:
训练样本集确定模块,用于在生成所述特征数据集合列表之后,读取所述特征数据集合列表中各表头对应的网络安全特征数据,并依据读取出的网络安全特征数据建立所述模型训练任务对应的训练样本集;
模型训练模块,用于利用所述训练样本集,训练所述模型。
依据本申请又一个方面,提供了一种存储介质,其上存储有计算机程序,所述程序被处理器执行时实现上述流式数据的特征提取方法。
依据本申请再一个方面,提供了一种计算机设备,包括存储介质、处理器及存储在存储介质上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现上述流式数据的特征提取方法。
借由上述技术方案,本申请提供的一种流式数据的特征提取方法及装置、存储介质、计算机设备,接收流式数据,并获取需要对流式数据进行网络安全特征数据提取的网络安全特征提取需求,对网络安全特征提取需求进行解析,获取其中待提取的至少一个目标维度以及待提取的至少一个目标特征,根据目标维度和目标特征,生成特征数据提取器,进而根据生成的特征数据提取器,提取接收的流式数据的网络安全特征数据。本申请通过构建特征数据提取器,并通过特征数据提取器提取流式数据的网络安全特征数据,能够对流式数据进行即时性特征提取,在充分发挥流式数据的低延迟性特点的同时,减少资源的占用量。
上述说明仅是本申请技术方案的概述,为了能够更清楚了解本申请的技术手段,而可依照说明书的内容予以实施,并且为了让本申请的上述和其它目的、特征和优点能够更明显易懂,以下特举本申请的具体实施方式。
附图说明
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:
图1示出了本申请实施例提供的一种流式数据的特征提取方法的流程示意图;
图2示出了本申请实施例提供的另一种流式数据的特征提取方法的流程示意图;
图3示出了本申请实施例提供的一种特征数据提取器的结构示意图;
图4示出了本申请实施例提供的一种特征数据集合列表的结构示意图;
图5示出了本申请实施例提供的一种流式数据的特征提取装置的结构示意图。
具体实施方式
下文中将参考附图并结合实施例来详细说明本申请。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。
在本实施例中提供了一种流式数据的特征提取方法,如图1所示,该方法包括:
步骤101,接收流式数据,并获取所述流式数据对应的网络安全特征提取需求,其中,所述网络安全特征提取需求包括至少一个待提取的目标维度以及至少一个待提取的目标特征,所述目标维度包括IP维度、时间维度以及mac局域网地址维度中的至少一种,所述目标特征包括DNS请求数量特征、ICMP请求数量特征、HTTP请求数量特征、DNS域名集合特征以及页面访问次数特征中的至少一种;
本申请实施例中,流式数据可以用于确定失陷主机、提取DNS请求数量等网络安全业务中。本申请实施例提供的流式数据特征提取方法,接收流式数据,并获取需要对流式数据进行网络安全特征数据提取的网络安全特征提取需求。对网络安全特征提取需求进行解析,获取其中待提取的目标维度以及待提取的目标特征,待提取的目标维度以及待提取的目标特征均至少为一个,由于不同的业务可能有不同的网络安全特征数据提取需求,因此待提取的目标维度以及待提取的目标特征均也可以是多个。例如,目标维度可以是IP维度、时间维度以及mac局域网地址维度等,目标特征可以是DNS(Domain Name System,计算机域名)请求数量特征、ICMP(Intemet Control Message Protocol,因特网控制报文协议)请求数量特征、HTTP(HyperText Transfer Protoco,超文本传输协议)请求数量特征、DNS域名集合特征以及页面访问次数特征等。例如,以查找内网中失陷主机的场景为例,由于研究对象是内网的主机,所以可以将内网主机IP作为目标维度进行特征提取,或者可以将内网主机IP以及时间共同作为目标维度。
步骤102,依据所述目标维度以及所述目标特征,生成特征数据提取器;
在该实施例中,获取目标维度和目标特征之后,根据目标维度和目标特征,生成特征数据提取器。如果目标特征有多个,那么根据不同的目标特征与全部目标维度,生成特征数据提取器。例如,某一网络安全特征提取需求中包括2个目标维度(分别是目标维度A与目标维度B),3个目标特征(分别是目标特征a,目标特征b以及目标特征c),那么相应生成3个特征数据提取器(分别是特征数据提取器1、特征数据提取器2以及特征数据提取器3),特征数据提取器1可以是基于目标维度A、目标维度B与目标特征a生成的,特征数据提取器2可以是基于目标维度A、目标维度B与目标特征b生成的,特征数据提取器3可以是基于目标维度A、目标维度B与目标特征c生成的。以上关于特征数据提取器的说明仅为一种举例说明,其并不应限制本申请的保护范围。
步骤103,利用所述特征数据提取器,提取所述流式数据中与所述目标维度以及所述目标特征对应的网络安全特征数据。
在该实施例中,根据生成的特征数据提取器,提取接收的流式数据的网络安全特征数据,网络安全特征数据和目标维度以及目标特征相对应。例如,当需要提取内网中主机发出的DNS请求数量特征时,目标特征可以为内网中主机发出的DNS请求数量,目标维度可以是主机IP地址,根据DNS请求数量以及主机IP地址,生成对应的特征数据提取器,并根据该特征数据提取器,提取流式数据中的具体DNS请求数量数值。
通过应用本实施例的技术方案,接收流式数据,并获取需要对流式数据进行网络安全特征数据提取的网络安全特征提取需求,对网络安全特征提取需求进行解析,获取其中待提取的至少一个目标维度以及待提取的至少一个目标特征,根据目标维度和目标特征,生成特征数据提取器,进而根据生成的特征数据提取器,提取接收的流式数据的网络安全特征数据。本申请通过构建特征数据提取器,并通过特征数据提取器提取流式数据的网络安全特征数据,能够对流式数据进行即时性特征提取,在充分发挥流式数据的低延迟性特点的同时,减少资源的占用量。
进一步的,作为上述实施例具体实施方式的细化和扩展,为了完整说明本实施例的具体实施过程,提供了另一种流式数据的特征提取方法,如图2所示,该方法包括:
步骤201,接收流式数据,依据预设数据筛选条件,对所述流式数据进行筛选,其中,所述预设数据筛选条件包括预设数据协议;
在该实施例中,接收到流式数据后,根据预设的数据筛选条件,筛选接收到的流式数据,满足预设数据筛选条件的流式数据,可以进入后续处理。对于不同的网络安全业务,预设的数据筛选条件也可能存在不同。由于流式数据数据量巨大,并且格式繁多,所以通常将数据协议包括在预设的数据筛选条件中,通过预设的数据协议筛选掉一部分流式数据。例如,以网络流量分析场景为例,获取的流式数据中包含了大量的不同数据协议类型,包括DNS协议、TCP协议或UDP协议等,如果在具体场景中只需要提取DNS数据协议的流式数据的网络安全特征数据时,可以对获取的流式数据进行初步筛选,筛选出符合DNS数据协议类型的流式数据,并进行后续操作。本申请实施例通过预设数据筛选条件,对接收的流式数据进行筛选,有利于减少流式数据的处理量,降低计算机的资源占用,提高流式数据处理效率。
步骤202,获取所述流式数据对应的网络安全特征提取需求,其中,所述网络安全特征提取需求包括至少一个待提取的目标维度以及至少一个待提取的目标特征,所述目标维度包括IP维度、时间维度以及mac局域网地址维度中的至少一种,所述目标特征包括DNS请求数量特征、ICMP请求数量特征、HTTP请求数量特征、DNS域名集合特征以及页面访问次数特征中的至少一种;
步骤203,依据所述目标维度以及所述目标特征,生成特征数据提取器;
在本申请实施例中,可选地,步骤203具体包括:
分别依据每个目标特征,建立与所述每个目标特征各自匹配的特征数据提取工具;依据所述目标维度、所述目标特征以及所述特征数据提取工具,生成所述特征数据提取器,其中,所述特征数据提取器包括提取器头部以及提取器主体,所述提取器头部用于指示所述目标维度以及所述目标特征,所述提取器主体包括所述特征数据提取工具。
在该实施例中,确定至少一个待提取的目标维度以及至少一个待提取的目标特征后,分别根据每个目标特征,建立与该目标特征相匹配的特征数据提取工具,进而以目标维度、目标特征以及特征数据提取工具,生成特征数据提取器。特征数据提取器主要包括两个组成部分,一个部分是提取器头部,一个部分是提取器主体。提取器头部可以指示目标维度以及目标特征,提取器主体可以包括与目标维度与目标特征相对应的特征数据提取工具。例如,目标特征可以是流式数据的最大值,那么对应的特征数据提取工具可以是函数方法即MAX()函数。每当一个流式数据输入后,动态计算经过目标维度过滤后的流式数据的最大值,对应的网络安全特征数据可以根据MAX()函数进行更新,并将最后的最大值作为网络安全特征数据。特征数据提取器的结构如图3所示。
步骤204,利用所述特征数据提取器,提取所述流式数据中与所述目标维度以及所述目标特征对应的网络安全特征数据;在本申请实施例中,可选地,步骤204具体包括:
将所述流式数据输入到所述特征数据提取器中,以使所述特征数据提取器按所述目标维度对所述流式数据进行分组,并从分组后的流式数据中提取出与所述目标特征匹配的网络安全特征数据。
在该实施例中,生成特征数据提取器后,将流式数据输入到该特征数据提取器中,特征数据提取器接收到流式数据后,依据目标维度找到当前数据在特征提取器中所属的分组,若分组不存在则新建分组,之后通过特征数据提取工具从分组后的流式数据中,提取出和目标特征匹配的网络安全特征数据并更新到当前流式数据所在的分组。例如,当目标 维度为主机IP地址时,某一时刻包含IP=192.168.10.10信息的流式数据进入特征提取器后,特征提取器会根据IP=192.168.10.10找到目标维度为192.168.10.10的分组,提取出网络安全特征数据后将192.168.10.10分组的网络安全特征数据进行更新。本申请实施例通过特征数据提取器中的目标维度分组当前流式数据,再对分组后的流式数据进行网络安全特征数据的提取和更新,每次接收数据只需对该数据对应的分组数值进行更新,其他分组则不变,提升流式数据的网络安全特征数据的提取效率。
步骤205,基于所述特征数据提取器的提取器头部以及所述网络安全特征数据,生成特征数据集合列表,其中,所述特征数据集合列表包括第一表头、第二表头以及与每个第一表头和每个第二表头对应的列表结果,所述第一表头包括所述目标维度,所述第二表头包括所述目标特征,所述列表结果包括与每个目标维度和每个目标特征对应的所述网络安全特征数据;
在该实施例中,接收筛选过的流式数据,获取对应的网络安全特征提取需求,网络安全特征提取需求中可以包括至少一个待提取的目标维度以及至少一个待提取的目标特征,之后根据目标维度和目标特征,生成对应的特征数据提取器,并通过特征数据提取器提取对应的网络安全特征数据。
进一步,以特征数据提取器的提取器头部和经过特征数据提取器提取的网络安全特征数据为基础,生成特征数据集合列表。特征数据集合列表可以包括三个组成部分,分别是第一表头部分、第二表头部分以及列表结果部分,其中每一个列表结果分别对应一个第一表头和一个第二表头。第一表头可以是目标维度,第二表头可以是目标特征,列表结果可以是与每个目标维度和每个目标特征对应的网络安全特征数据。本申请实施例通过生成特征数据集合列表,有利于工作人员查看和使用想要的网络安全特征数据,方便相关工作人员的操作。
步骤206,响应于样本数据获取指令,从所述特征数据集合列表中查找与所述样本数据获取指令相对应的目标特征数据,其中,所述样本数据获取指令包括任意第一表头和/或任意第二表头,所述样本数据获取指令用于获取模型训练样本;
在该实施例中,生成特征数据集合列表后,当相关工作人员想要利用特征数据集合列表中的网络安全特征数据时,响应于样本数据获取指令,从特征数据集合列表中查找和样本数据获取指令相对应的目标特征数据。具体可以从样本数据获取指令中解析出对应的任意第一表头和/或任意第二表头,根据第一表头和第二表头,查找对应的目标特征数据。样本数据获取指令用于从特征数据集合列表中获取目标特征数据,进而将目标特征数据作为模型训练的样本。
步骤207,判断所述目标特征数据是否为数值型数据,并当所述目标特征数据为非数值型数据时,基于所述目标特征数据的数据类型,调用对应的数据处理模型;
在该实施例中,查找到目标特征数据后,判断目标特征数据是否是数值型数据。由于目标特征数据用于训练模型,作为机器学习模型的输入,需要保证目标特征数据在输入机器学习模型前是数值型数据,而经过特征数据提取器提取的网络安全特征数据,可能有多种类型,例如集合、字典、元组等数据类型。当目标特征数据为非数值型数据时,根据目标特征数据的数据类型,调用与数据类型相对应的数据处理模型。例如,当目标特征数据为主机请求DNS域名的去重个数时,特征数据提取器返回的是一个包含所有DNS域名的 数据集合,调用对应的数据处理模型,统计这个DNS域名的集合长度,最终得到的目标特征数据为DNS域名去重个数。
步骤208,依据所述数据处理模型,将所述目标特征数据转化为数值型数据,以利用转化后的目标特征数据进行模型训练。
在该实施例中,根据数据处理模型,将非数值型的目标特征数据转化成数值型数据,进而可以利用转化之后的目标特征数据进行模型训练。
在本申请实施例中,可选地,所述网络安全特征提取需求包括请求DNS域名去重数量,所述请求DNS域名去重数量包括所述IP维度以及所述DNS域名集合特征。
进一步的,本申请实施例提供了另一种流式数据的特征提取方法,该方法包括:
步骤301,接收模型训练任务;分析执行所述模型训练任务所需的训练样本特征,并依据所述训练样本特征,确定所述网络安全特征提取需求;
在该实施例中,工作人员想要训练模型时,接收模型训练任务,并分析想要执行该模型训练任务都需要哪些训练样本特征,进一步根据训练样本特征,确定对应的网络安全特征提取需求。训练样本特征可以包括输入到模型中的训练样本对应的训练样本特征和从模型输出的训练样本对应的训练样本特征,进而分别确定模型输入训练样本的网络安全特征提取需求,以及模型输出训练样本的网络安全特征提取需求。例如,当工作人员想要训练网速识别模型时,接收网速识别模型的训练任务,并根据工作人员的输入确定该训练任务分析需要的训练样本特征,进而确定网络安全特征提取需求。本申请实施例通过接收模型训练任务,确定需要的训练样本特征,进而确定对应的网络安全特征提取需求,以便于后续自动根据网络安全特征提取需求提取用于模型训练的网络安全特征数据,提升用于训练模型的网络安全特征数据的获取效率。
步骤302,接收流式数据,并获取所述流式数据对应的网络安全特征提取需求,其中,所述网络安全特征提取需求包括至少一个待提取的目标维度以及至少一个待提取的目标特征;
步骤303,依据所述目标维度以及所述目标特征,生成特征数据提取器;
步骤304,利用所述特征数据提取器,提取所述流式数据中与所述目标维度以及所述目标特征对应的网络安全特征数据;
在该实施例中,接收流式数据,获取对应的网络安全特征提取需求,网络安全特征提取需求中可以包括至少一个待提取的目标维度以及至少一个待提取的目标特征,之后根据目标维度和目标特征,生成对应的特征数据提取器,并通过特征数据提取器提取对应的网络安全特征数据。
步骤305,判断所述网络安全特征数据是否为数值型数据,并当所述网络安全特征数据为非数值型数据时,基于所述网络安全特征数据的数据类型,调用对应的数据处理模型;
步骤306,依据所述数据处理模型,将所述网络安全特征数据转化为数值型数据,以利用转化后的网络安全特征数据生成特征数据集合列表;
在该实施例中,根据特征数据提取器提取出对应的网络安全特征数据后,还可以对网络安全特征数据逐个进行判断,当发现有网络安全特征数据为非数值型数据时,以网络安全特征数据的数据类型为基础,调用对应的数据处理模型,进一步根据数据处理模型,将非数值型特征数据转化为对应的数值型特征数据,进而可以利用转化之后的网络安全特征数据生成特征数据集合列表。
步骤307,基于所述特征数据提取器的提取器头部以及所述网络安全特征数据,生成特征数据集合列表,其中,所述特征数据集合列表包括第一表头、第二表头以及与每个第一表头和每个第二表头对应的列表结果,所述第一表头包括所述目标维度,所述第二表头包括所述目标特征,所述列表结果包括与每个目标维度和每个目标特征对应的所述网络安全特征数据;
在该实施例中,将网络安全特征数据全部转化为数值型特征数据后,以特征数据提取器的提取器头部和经过特征数据提取器提取的网络安全特征数据为基础,生成特征数据集合列表。特征数据集合列表可以包括三个组成部分,分别是第一表头部分、第二表头部分以及列表结果部分,其中每一个列表结果分别对应一个第一表头和一个第二表头。第一表头可以是目标维度,第二表头可以是目标特征,列表结果可以是与每个目标维度和每个目标特征对应的数值型特征数据,如图4所示。本申请实施例生成的特征数据集合列表中的网络安全特征数据均为数值型特征数据,在工作人员取用时直接取用数值型特征数据,可以直接输入到机器学习模型中,有利于提升机器模型训练的效率。
步骤308,读取所述特征数据集合列表中各表头对应的网络安全特征数据,并依据读取出的网络安全特征数据建立所述模型训练任务对应的训练样本集;利用所述训练样本集,训练所述模型。
在该实施例中,对特征数据集合列表中各表头对应的网络安全特征数据进行读取后,根据读取的网络安全特征数据,建立与模型训练任务相对应的训练样本集,训练样本集中的网络安全特征数据均为数值型特征数据,进一步利用训练样本集,对模型进行训练。
进一步的,作为图1方法的具体实现,本申请实施例提供了一种流式数据的特征提取装置,如图5所示,该装置包括:
需求获取模块,接收流式数据,并获取所述流式数据对应的网络安全特征提取需求,其中,所述网络安全特征提取需求包括至少一个待提取的目标维度以及至少一个待提取的目标特征,所述目标维度包括IP维度、时间维度以及mac局域网地址维度中的至少一种,所述目标特征包括DNS请求数量特征、ICMP请求数量特征、HTTP请求数量特征、DNS域名集合特征以及页面访问次数特征中的至少一种;
提取器生成模块,用于依据所述目标维度以及所述目标特征,生成特征数据提取器;
特征数据提取模块,利用所述特征数据提取器,提取所述流式数据中与所述目标维度以及所述目标特征对应的网络安全特征数据。
可选地,所述网络安全特征提取需求包括请求DNS域名去重数量,所述请求DNS域名去重数量包括所述IP维度以及所述DNS域名集合特征。
在本申请实施例中,可选地,所述装置还包括:
流式数据筛选模块,用于所述获取所述流式数据对应的网络安全特征提取需求之前,依据预设数据筛选条件,对所述流式数据进行筛选,其中,所述预设数据筛选条件包括预设数据协议。
在本申请实施例中,可选地,所述提取器生成模块,具体包括:
提取工具建立单元,用于分别依据每个目标特征,建立与所述每个目标特征各自匹配的特征数据提取工具;
提取器生成单元,用于依据所述目标维度、所述目标特征以及所述特征数据提取工具,生成所述特征数据提取器,其中,所述特征数据提取器包括提取器头部以及提取器主体,所述提取器头部用于指示所述目标维度以及所述目标特征,所述提取器主体包括所述特征数据提取工具。
在本申请实施例中,可选地,所述特征数据提取模块,具体用于:
将所述流式数据输入到所述特征数据提取器中,以使所述特征数据提取器按所述目标维度对所述流式数据进行分组,并从分组后的流式数据中提取出与所述目标特征匹配的网络安全特征数据。
在本申请实施例中,可选地,所述装置还包括:
列表生成模块,用于所述提取所述流式数据中与所述目标维度以及所述目标特征对应的网络安全特征数据之后,基于所述特征数据提取器的提取器头部以及所述网络安全特征数据,生成特征数据集合列表,其中,所述特征数据集合列表包括第一表头、第二表头以及与每个第一表头和每个第二表头对应的列表结果,所述第一表头包括所述目标维度,所述第二表头包括所述目标特征,所述列表结果包括与每个目标维度和每个目标特征对应的所述网络安全特征数据。
在本申请实施例中,可选地,所述装置还包括:
查找模块,用于根据样本数据获取指令,从所述特征数据集合列表中查找与所述样本数据获取指令相对应的目标特征数据,其中,所述样本数据获取指令包括任意第一表头和/或任意第二表头,所述样本数据获取指令用于获取模型训练样本;
判断模块,用于判断所述目标特征数据是否为数值型数据,并当所述目标特征数据为非数值型数据时,基于所述目标特征数据的数据类型,调用对应的数据处理模型;
转化模块,用于依据所述数据处理模型,将所述目标特征数据转化为数值型数据,以利用转化后的目标特征数据进行模型训练。
在本申请实施例中,可选地,所述装置还包括:
判断模块,用于所述基于所述特征数据提取器的提取器头部以及所述网络安全特征数据,生成特征数据集合列表之前,判断所述网络安全特征数据是否为数值型数据,并当所述网络安全特征数据为非数值型数据时,基于所述网络安全特征数据的数据类型,调用对应的数据处理模型;
转化模块,用于依据所述数据处理模型,将所述网络安全特征数据转化为数值型数据,以利用转化后的网络安全特征数据生成特征数据集合列表。
在本申请实施例中,可选地,所述装置还包括:
训练任务接收模块,用于所述接收流式数据之前,接收模型训练任务;
需求确定模块,用于分析执行所述模型训练任务所需的训练样本特征,并依据所述训练样本特征,确定所述网络安全特征提取需求;
相应地,所述装置还包括:
训练样本集确定模块,用于在生成所述特征数据集合列表之后,读取所述特征数据集合列表中各表头对应的网络安全特征数据,并依据读取出的网络安全特征数据建立所述模型训练任务对应的训练样本集;
模型训练模块,用于利用所述训练样本集,训练所述模型。
依据本申请又一个方面,提供了一种存储介质,其上存储有计算机程序,所述程序被处理器执行时实现上述流式数据的特征提取方法。
需要说明的是,本申请实施例提供的一种流式数据的特征提取装置所涉及各功能单元的其他相应描述,可以参考图1至图2方法中的对应描述,在此不再赘述。
基于上述如图1至图2所示方法,相应的,本申请实施例还提供了一种存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现上述如图1至图2所示的流式数据的特征提取方法。
基于这样的理解,本申请的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施场景所述的方法。
基于上述如图1至图2所示的方法,以及图5所示的虚拟装置实施例,为了实现上述目的,本申请实施例还提供了一种计算机设备,具体可以为个人计算机、服务器、网络设备等,该计算机设备包括存储介质和处理器;存储介质,用于存储计算机程序;处理器,用于执行计算机程序以实现上述如图1至图2所示的流式数据的特征提取方法。
可选地,该计算机设备还可以包括用户接口、网络接口、摄像头、射频(Radio Frequency,RF)电路,传感器、音频电路、WI-FI模块等等。用户接口可以包括显示屏(Display)、输入单元比如键盘(Keyboard)等,可选用户接口还可以包括USB接口、读卡器接口等。网络接口可选的可以包括标准的有线接口、无线接口(如蓝牙接口、WI-FI接口)等。
本领域技术人员可以理解,本实施例提供的一种计算机设备结构并不构成对该计算机设备的限定,可以包括更多或更少的部件,或者组合某些部件,或者不同的部件布置。
存储介质中还可以包括操作系统、网络通信模块。操作系统是管理和保存计算机设备硬件和软件资源的程序,支持信息处理程序以及其它软件和/或程序的运行。网络通信模块 用于实现存储介质内部各组件之间的通信,以及与该实体设备中其它硬件和软件之间通信。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到本申请可以借助软件加必要的通用硬件平台的方式来实现,也可以通过硬件实现。接收流式数据,并获取需要对流式数据进行网络安全特征数据提取的网络安全特征提取需求,对网络安全特征提取需求进行解析,获取其中待提取的至少一个目标维度以及待提取的至少一个目标特征,根据目标维度和目标特征,生成特征数据提取器,进而根据生成的特征数据提取器,提取接收的流式数据的网络安全特征数据。本申请通过构建特征数据提取器,并通过特征数据提取器提取流式数据的网络安全特征数据,能够对流式数据进行即时性特征提取,在充分发挥流式数据的低延迟性特点的同时,减少资源的占用量。
本领域技术人员可以理解附图只是一个优选实施场景的示意图,附图中的模块或流程并不一定是实施本申请所必须的。本领域技术人员可以理解实施场景中的装置中的模块可以按照实施场景描述进行分布于实施场景的装置中,也可以进行相应变化位于不同于本实施场景的一个或多个装置中。上述实施场景的模块可以合并为一个模块,也可以进一步拆分成多个子模块。
上述本申请序号仅仅为了描述,不代表实施场景的优劣。以上公开的仅为本申请的几个具体实施场景,但是,本申请并非局限于此,任何本领域的技术人员能思之的变化都应落入本申请的保护范围。

Claims (20)

  1. 一种流式数据的特征提取方法,其特征在于,包括:
    接收流式数据,并获取所述流式数据对应的网络安全特征提取需求,其中,所述网络安全特征提取需求包括至少一个待提取的目标维度以及至少一个待提取的目标特征,所述目标维度包括IP维度、时间维度以及mac局域网地址维度中的至少一种,所述目标特征包括DNS请求数量特征、ICMP请求数量特征、HTTP请求数量特征、DNS域名集合特征以及页面访问次数特征中的至少一种;
    依据所述目标维度以及所述目标特征,生成特征数据提取器;
    利用所述特征数据提取器,提取所述流式数据中与所述目标维度以及所述目标特征对应的网络安全特征数据。
  2. 根据权利要求1所述的方法,其特征在于,所述网络安全特征提取需求包括请求DNS域名去重数量,所述请求DNS域名去重数量包括所述IP维度以及所述DNS域名集合特征。
  3. 根据权利要求1所述的方法,其特征在于,所述获取所述流式数据对应的网络安全特征提取需求之前,所述方法还包括:
    依据预设数据筛选条件,对所述流式数据进行筛选,其中,所述预设数据筛选条件包括预设数据协议。
  4. 根据权利要求1所述的方法,其特征在于,所述依据所述目标维度以及所述目标特征,生成特征数据提取器,具体包括:
    分别依据每个目标特征,建立与所述每个目标特征各自匹配的特征数据提取工具;
    依据所述目标维度、所述目标特征以及所述特征数据提取工具,生成所述特征数据提取器,其中,所述特征数据提取器包括提取器头部以及提取器主体,所述提取器头部用于指示所述目标维度以及所述目标特征,所述提取器主体包括所述特征数据提取工具。
  5. 根据权利要求1所述的方法,其特征在于,所述利用所述特征数据提取器,提取所述流式数据中与所述目标维度以及所述目标特征对应的网络安全特征数据,具体包括:
    将所述流式数据输入到所述特征数据提取器中,以使所述特征数据提取器按所述目标维度对所述流式数据进行分组,并从分组后的流式数据中提取出与所述目标特征匹配的网络安全特征数据。
  6. 根据权利要求4所述的方法,其特征在于,所述提取所述流式数据中与所述目标维度以及所述目标特征对应的网络安全特征数据之后,所述方法还包括:
    基于所述特征数据提取器的提取器头部以及所述网络安全特征数据,生成特征数据集合列表,其中,所述特征数据集合列表包括第一表头、第二表头以及与每个第一表头和每个第二表头对应的列表结果,所述第一表头包括所述目标维度,所述第二表头包括所述目 标特征,所述列表结果包括与每个目标维度和每个目标特征对应的所述网络安全特征数据。
  7. 根据权利要求6所述的方法,其特征在于,所述生成特征数据集合列表之后,所述方法还包括:
    响应于样本数据获取指令,从所述特征数据集合列表中查找与所述样本数据获取指令相对应的目标特征数据,其中,所述样本数据获取指令包括任意第一表头和/或任意第二表头,所述样本数据获取指令用于获取模型训练样本;
    判断所述目标特征数据是否为数值型数据,并当所述目标特征数据为非数值型数据时,基于所述目标特征数据的数据类型,调用对应的数据处理模型;
    依据所述数据处理模型,将所述目标特征数据转化为数值型数据,以利用转化后的目标特征数据进行模型训练。
  8. 根据权利要求6所述的方法,其特征在于,所述基于所述特征数据提取器的提取器头部以及所述网络安全特征数据,生成特征数据集合列表之前,所述方法还包括:
    判断所述网络安全特征数据是否为数值型数据,并当所述网络安全特征数据为非数值型数据时,基于所述网络安全特征数据的数据类型,调用对应的数据处理模型;
    依据所述数据处理模型,将所述网络安全特征数据转化为数值型数据,以利用转化后的网络安全特征数据生成特征数据集合列表。
  9. 根据权利要求6所述的方法,其特征在于,所述接收流式数据之前,所述方法还包括:
    接收模型训练任务;
    分析执行所述模型训练任务所需的训练样本特征,并依据所述训练样本特征,确定所述网络安全特征提取需求;
    相应地,所述生成特征数据集合列表之后,所述方法还包括:
    读取所述特征数据集合列表中各表头对应的网络安全特征数据,并依据读取出的网络安全特征数据建立所述模型训练任务对应的训练样本集;
    利用所述训练样本集,训练所述模型。
  10. 一种流式数据的特征提取装置,其特征在于,包括:
    需求获取模块,接收流式数据,并获取所述流式数据对应的网络安全特征提取需求,其中,所述网络安全特征提取需求包括至少一个待提取的目标维度以及至少一个待提取的目标特征,所述目标维度包括IP维度、时间维度以及mac局域网地址维度中的至少一种,所述目标特征包括DNS请求数量特征、ICMP请求数量特征、HTTP请求数量特征、DNS域名集合特征以及页面访问次数特征中的至少一种;
    提取器生成模块,用于依据所述目标维度以及所述目标特征,生成特征数据提取器;
    特征数据提取模块,利用所述特征数据提取器,提取所述流式数据中与所述目标维度以及所述目标特征对应的网络安全特征数据。
  11. 根据权利要求10所述的装置,其特征在于,所述网络安全特征提取需求包括请求DNS域名去重数量,所述请求DNS域名去重数量包括所述IP维度以及所述DNS域名集合特征。
  12. 根据权利要求10所述的装置,其特征在于,所述装置还包括:
    流式数据筛选模块,用于所述获取所述流式数据对应的网络安全特征提取需求之前,依据预设数据筛选条件,对所述流式数据进行筛选,其中,所述预设数据筛选条件包括预设数据协议。
  13. 根据权利要求10所述的装置,其特征在于,所述提取器生成模块,具体包括:
    提取工具建立单元,用于分别依据每个目标特征,建立与所述每个目标特征各自匹配的特征数据提取工具;
    提取器生成单元,用于依据所述目标维度、所述目标特征以及所述特征数据提取工具,生成所述特征数据提取器,其中,所述特征数据提取器包括提取器头部以及提取器主体,所述提取器头部用于指示所述目标维度以及所述目标特征,所述提取器主体包括所述特征数据提取工具。
  14. 根据权利要求10所述的装置,其特征在于,所述特征数据提取模块,具体用于:
    将所述流式数据输入到所述特征数据提取器中,以使所述特征数据提取器按所述目标维度对所述流式数据进行分组,并从分组后的流式数据中提取出与所述目标特征匹配的网络安全特征数据。
  15. 根据权利要求13所述的装置,其特征在于,所述装置还包括:
    列表生成模块,用于所述提取所述流式数据中与所述目标维度以及所述目标特征对应的网络安全特征数据之后,基于所述特征数据提取器的提取器头部以及所述网络安全特征数据,生成特征数据集合列表,其中,所述特征数据集合列表包括第一表头、第二表头以及与每个第一表头和每个第二表头对应的列表结果,所述第一表头包括所述目标维度,所述第二表头包括所述目标特征,所述列表结果包括与每个目标维度和每个目标特征对应的所述网络安全特征数据。
  16. 根据权利要求15所述的装置,其特征在于,所述装置还包括:
    查找模块,用于根据样本数据获取指令,从所述特征数据集合列表中查找与所述样本数据获取指令相对应的目标特征数据,其中,所述样本数据获取指令包括任意第一表头和/或任意第二表头,所述样本数据获取指令用于获取模型训练样本;
    判断模块,用于判断所述目标特征数据是否为数值型数据,并当所述目标特征数据为非数值型数据时,基于所述目标特征数据的数据类型,调用对应的数据处理模型;
    转化模块,用于依据所述数据处理模型,将所述目标特征数据转化为数值型数据,以利用转化后的目标特征数据进行模型训练。
  17. 根据权利要求15所述的装置,其特征在于,所述装置还包括:
    判断模块,用于所述基于所述特征数据提取器的提取器头部以及所述网络安全特征数据,生成特征数据集合列表之前,判断所述网络安全特征数据是否为数值型数据,并当所述网络安全特征数据为非数值型数据时,基于所述网络安全特征数据的数据类型,调用对应的数据处理模型;
    转化模块,用于依据所述数据处理模型,将所述网络安全特征数据转化为数值型数据,以利用转化后的网络安全特征数据生成特征数据集合列表。
  18. 根据权利要求15所述的装置,其特征在于,所述装置还包括:
    训练任务接收模块,用于所述接收流式数据之前,接收模型训练任务;
    需求确定模块,用于分析执行所述模型训练任务所需的训练样本特征,并依据所述训练样本特征,确定所述网络安全特征提取需求;
    相应地,所述装置还包括:
    训练样本集确定模块,用于在生成所述特征数据集合列表之后,读取所述特征数据集合列表中各表头对应的网络安全特征数据,并依据读取出的网络安全特征数据建立所述模型训练任务对应的训练样本集;
    模型训练模块,用于利用所述训练样本集,训练所述模型。
  19. 一种存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1至9中任一项所述的方法。
  20. 一种计算机设备,包括存储介质、处理器及存储在存储介质上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现权利要求1至9中任一项所述的方法。
PCT/CN2021/117111 2021-08-30 2021-09-08 流式数据的特征提取方法及装置、存储介质、计算机设备 WO2023029066A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110999767.XA CN113452581B (zh) 2021-08-30 2021-08-30 流式数据的特征提取方法及装置、存储介质、计算机设备
CN202110999767.X 2021-08-30

Publications (1)

Publication Number Publication Date
WO2023029066A1 true WO2023029066A1 (zh) 2023-03-09

Family

ID=77818808

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/117111 WO2023029066A1 (zh) 2021-08-30 2021-09-08 流式数据的特征提取方法及装置、存储介质、计算机设备

Country Status (2)

Country Link
CN (1) CN113452581B (zh)
WO (1) WO2023029066A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116668380A (zh) * 2023-07-28 2023-08-29 北京中科网芯科技有限公司 汇聚分流器设备的报文处理方法及装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170134404A1 (en) * 2015-11-06 2017-05-11 Cisco Technology, Inc. Hierarchical feature extraction for malware classification in network traffic
CN111181986A (zh) * 2019-12-31 2020-05-19 奇安信科技集团股份有限公司 数据安全检测方法、模型训练方法、装置和计算机设备
CN111224946A (zh) * 2019-11-26 2020-06-02 杭州安恒信息技术股份有限公司 一种基于监督式学习的tls加密恶意流量检测方法及装置
CN111478921A (zh) * 2020-04-27 2020-07-31 深信服科技股份有限公司 一种隐蔽信道通信检测方法、装置及设备
CN112019449A (zh) * 2020-08-14 2020-12-01 四川电科网安科技有限公司 流量识别抓包方法和装置
CN112398779A (zh) * 2019-08-12 2021-02-23 中国科学院国家空间科学中心 一种网络流量数据分析方法及系统

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102202064B (zh) * 2011-06-13 2013-09-25 刘胜利 基于网络数据流分析的木马通信行为特征提取方法
CN105022960B (zh) * 2015-08-10 2017-11-21 济南大学 基于网络流量的多特征移动终端恶意软件检测方法及系统

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170134404A1 (en) * 2015-11-06 2017-05-11 Cisco Technology, Inc. Hierarchical feature extraction for malware classification in network traffic
CN112398779A (zh) * 2019-08-12 2021-02-23 中国科学院国家空间科学中心 一种网络流量数据分析方法及系统
CN111224946A (zh) * 2019-11-26 2020-06-02 杭州安恒信息技术股份有限公司 一种基于监督式学习的tls加密恶意流量检测方法及装置
CN111181986A (zh) * 2019-12-31 2020-05-19 奇安信科技集团股份有限公司 数据安全检测方法、模型训练方法、装置和计算机设备
CN111478921A (zh) * 2020-04-27 2020-07-31 深信服科技股份有限公司 一种隐蔽信道通信检测方法、装置及设备
CN112019449A (zh) * 2020-08-14 2020-12-01 四川电科网安科技有限公司 流量识别抓包方法和装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116668380A (zh) * 2023-07-28 2023-08-29 北京中科网芯科技有限公司 汇聚分流器设备的报文处理方法及装置
CN116668380B (zh) * 2023-07-28 2023-10-03 北京中科网芯科技有限公司 汇聚分流器设备的报文处理方法及装置

Also Published As

Publication number Publication date
CN113452581A (zh) 2021-09-28
CN113452581B (zh) 2021-12-14

Similar Documents

Publication Publication Date Title
WO2021169268A1 (zh) 数据处理方法、装置、设备和存储介质
US20170070520A1 (en) Website information extraction device, system, website information extraction method, and website information extraction program
WO2023029066A1 (zh) 流式数据的特征提取方法及装置、存储介质、计算机设备
US20180196861A1 (en) Method for generating graph database of incident resources and apparatus thereof
WO2021115348A1 (zh) 提供预览功能的云存储系统及预览方法、装置
CN111258973A (zh) Redis慢日志的存储、展示方法、装置、设备和介质
CN111224893A (zh) 一种基于vpn的安卓手机流量采集与标注系统及方法
CN111198806B (zh) 基于服务开放平台的服务调用数据统计分析方法及系统
CN109309665B (zh) 一种访问请求处理方法及装置、一种计算设备及存储介质
CN108345648B (zh) 一种基于列式存储的获取日志信息的方法及装置
CN111211939A (zh) 一种基于网络处理器实现流表高效计数的装置和方法
JP6703621B2 (ja) ドメイン名とウェブサイトアクセス行為との関連付け方法
CN113055420A (zh) Https业务识别方法、装置及计算设备
CN113395367A (zh) Https业务识别方法、装置、存储介质及电子设备
CN113407541B (zh) 数据采集方法、设备、存储介质及装置
CN107037262B (zh) 一种大数据频谱分析系统及其方法
CN109634929A (zh) 业务数据的采集方法、装置和服务器
CN113938462B (zh) 域名解析方法、装置、电子设备和存储介质
CN114722004A (zh) 消息检索方法、装置、电子设备及存储介质
CN111079044B (zh) 一种共享检测方法和装置
CN103796042B (zh) 资源信息推送方法及装置
CN112217896A (zh) 一种json报文转换方法以及相关装置
CN112953793A (zh) 一种工业网关测试系统及方法
CN113014555A (zh) 一种攻击事件的确定方法、装置、电子设备和存储介质
CN114328398B (zh) 一种快速展示页面的方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21955597

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21955597

Country of ref document: EP

Kind code of ref document: A1