CN114422271A - Data processing method, device, equipment and readable storage medium - Google Patents

Data processing method, device, equipment and readable storage medium Download PDF

Info

Publication number
CN114422271A
CN114422271A CN202210310556.5A CN202210310556A CN114422271A CN 114422271 A CN114422271 A CN 114422271A CN 202210310556 A CN202210310556 A CN 202210310556A CN 114422271 A CN114422271 A CN 114422271A
Authority
CN
China
Prior art keywords
vulnerability
sample
target
data
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210310556.5A
Other languages
Chinese (zh)
Other versions
CN114422271B (en
Inventor
颜波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202210310556.5A priority Critical patent/CN114422271B/en
Publication of CN114422271A publication Critical patent/CN114422271A/en
Application granted granted Critical
Publication of CN114422271B publication Critical patent/CN114422271B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1433Vulnerability analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a data processing method, a device, equipment and a readable storage medium, which relate to the deep learning technology in the field of artificial intelligence, and the method comprises the following steps: acquiring prediction association degrees of the k characteristic dimensions and vulnerability attribute categories respectively, and acquiring N target characteristic dimensions from the k characteristic dimensions based on the prediction association degrees; acquiring target sample vulnerability characteristics corresponding to target sample vulnerability data under N target characteristic dimensions respectively, inputting the N target sample vulnerability characteristics into an initial vulnerability detection model for prediction, and obtaining a target sample prediction result aiming at vulnerability attribute categories; and performing parameter adjustment on the initial vulnerability detection model based on the target sample prediction result and the obtained target sample label of the target sample vulnerability data to obtain a vulnerability detection model. By the method and the device, the incidence relation between the vulnerability data and the vulnerability attribute categories can be determined more accurately, and vulnerability detection efficiency of the vulnerability data is improved.

Description

Data processing method, device, equipment and readable storage medium
Technical Field
The present application relates to the field of internet technologies, and in particular, to a data processing method, apparatus, device, and readable storage medium.
Background
With the advent of the digital age, more and more occasions can be applied to information data transmission technology, and people have higher and higher requirements on the security of data transmission. Data security detection is generally based on known traffic attacks to establish, accumulate, match and the like detection rules. Common security detection is firewall technology that relies on rule superposition to form a database.
The security detection technologies which rely on rule accumulation have lower learning cost. The rule accumulation technologies have the advantages that the number of the rules is more and more, similar rules among the rules cannot be identified, redundancy exists among the rules, the number of the rules of the safety detection technologies is generally more, the rules are influenced mutually due to the more rules in the technologies, the labor maintenance cost of adding the rules and correcting the rules is improved, and the model detection efficiency is reduced.
Disclosure of Invention
The embodiment of the application provides a data processing method, a data processing device, data processing equipment and a readable storage medium, which can more accurately determine the incidence relation between vulnerability data and vulnerability attribute categories and improve vulnerability detection efficiency of the vulnerability data.
An embodiment of the present application provides a data processing method, including:
acquiring prediction association degrees of the k characteristic dimensions and vulnerability attribute categories respectively, and acquiring N target characteristic dimensions from the k characteristic dimensions based on the prediction association degrees; the prediction relevance is used for representing the change of the corresponding feature dimension and the influence degree on the prediction result of the vulnerability attribute category;
acquiring target sample vulnerability characteristics corresponding to target sample vulnerability data under N target characteristic dimensions respectively, inputting the N target sample vulnerability characteristics into an initial vulnerability detection model for prediction, and obtaining a target sample prediction result aiming at vulnerability attribute categories;
and acquiring a target sample label of target sample vulnerability data, and performing parameter adjustment on the initial vulnerability detection model based on a target sample prediction result and the target sample label to obtain a vulnerability detection model for vulnerability attribute class detection.
Further, obtaining the prediction association degrees of the k feature dimensions and the vulnerability attribute categories respectively comprises:
obtaining d sample vulnerability data and a sample label corresponding to each sample vulnerability data; the d sample vulnerability data comprise target sample vulnerability data; d is a positive integer;
acquiring the category proportion of vulnerability attribute categories based on the sample labels respectively corresponding to the d sample vulnerability data, and determining the category information quantity of the vulnerability attribute categories based on the category proportion of the vulnerability attribute categories;
determining dimension information quantity of vulnerability attribute categories under the ith characteristic dimension according to the characteristic state under the ith characteristic dimension and the sample labels respectively corresponding to the d sample vulnerability data; i is a positive integer less than or equal to k; the characteristic state under the ith characteristic dimension is used for representing the distribution condition of the characteristics of the d sample vulnerability data under the ith characteristic dimension;
and determining the prediction association degree of the ith characteristic dimension and the vulnerability attribute category based on the category information quantity of the vulnerability attribute category and the dimension information quantity of the vulnerability attribute category under the ith characteristic dimension.
Further, the characteristic state comprises a first characteristic state and a second characteristic state; the number of vulnerability attribute categories is m; m is a positive integer; determining the dimension information quantity of the vulnerability attribute category under the ith characteristic dimension according to the characteristic state under the ith characteristic dimension and the sample labels respectively corresponding to the d sample vulnerability data, wherein the dimension information quantity comprises the following steps:
determining the number of first sample tags corresponding to m vulnerability attribute categories respectively based on the sample tags corresponding to d sample vulnerability data respectively in the first characteristic state of the ith characteristic dimension, and determining the first characteristic probability of the m vulnerability attribute types in the first characteristic state of the ith characteristic dimension respectively based on the number of the m first sample tags;
determining the number of second sample labels respectively corresponding to m vulnerability attribute categories based on the sample labels respectively corresponding to d sample vulnerability data in the second characteristic state of the ith characteristic dimension, and determining the second characteristic probability of the m vulnerability attribute types in the second characteristic state of the ith characteristic dimension based on the number of the m second sample labels;
and performing probability integration processing on the m first characteristic probabilities and the m second characteristic probabilities to obtain dimension information quantity of the vulnerability attribute category under the ith characteristic dimension.
Further, obtaining the prediction association degrees of the k feature dimensions and the vulnerability attribute categories respectively comprises:
acquiring k dimensionality characteristics to be detected corresponding to the sample vulnerability data under k characteristic dimensionalities respectively;
performing dimension conversion on the k dimension features to be detected to obtain k vulnerability set features respectively corresponding to the k feature dimensions;
respectively inputting the k vulnerability set characteristics into an initial vulnerability detection model for prediction to obtain dimension category probabilities of the k vulnerability set characteristics respectively aiming at vulnerability attribute categories;
and determining the prediction association degrees of the k characteristic dimensions and the vulnerability attribute categories respectively according to the dimension category probability of the k vulnerability set characteristics respectively aiming at the vulnerability attribute categories and the difference data between the sample labels corresponding to the sample vulnerability data.
Further, based on the prediction relevance, acquiring N target feature dimensions from the k feature dimensions, including:
obtaining a prediction correlation threshold;
in the k characteristic dimensions, clustering the characteristic dimensions smaller than the prediction relevance threshold to obtain a first common dimension;
among the first commonality dimension and the feature dimension larger than or equal to the prediction relevance threshold, N target feature dimensions are determined.
Further, obtaining target sample vulnerability characteristics of the target sample vulnerability data respectively corresponding to the N target characteristic dimensions includes:
acquiring a sample data type of target sample vulnerability data, and acquiring vulnerability information corresponding to the sample data type;
acquiring vulnerability keywords from target sample vulnerability data based on vulnerability information;
and under the N target feature dimensions, performing feature extraction processing on the vulnerability keywords to obtain target sample vulnerability features corresponding to the target sample vulnerability data under the N target feature dimensions respectively.
An embodiment of the present application provides, in one aspect, another data processing method, including:
acquiring target to-be-detected characteristics corresponding to the target to-be-detected data under N target characteristic dimensions respectively; the N target feature dimensions are determined from the k feature dimensions based on the prediction relevance of the k feature dimensions and the vulnerability attribute categories respectively;
and inputting the target to-be-detected features respectively corresponding to the N target feature dimensions into a vulnerability detection model for vulnerability detection to obtain target detection results corresponding to the target to-be-detected data.
Further, the number of vulnerability attribute categories is m; m is a positive integer; inputting the target to-be-detected features respectively corresponding to the N target feature dimensions into a vulnerability detection model for vulnerability detection to obtain target detection results corresponding to the target to-be-detected data, and the vulnerability detection method comprises the following steps:
the method comprises the steps of obtaining m vulnerability attribute categories and prediction probabilities corresponding to the vulnerability attribute categories, and determining target detection results from the m vulnerability attribute categories based on the prediction probabilities.
The data processing method further comprises the following steps:
and if the target detection result is an abnormal detection type in the m vulnerability attribute types, isolating the target to-be-detected data corresponding to the abnormal detection type, and sending a data abnormal message to the vulnerability management equipment.
An embodiment of the present application provides a data processing apparatus, including:
the relevancy obtaining module is used for obtaining the prediction relevancy of the k characteristic dimensions and the vulnerability attribute categories respectively;
the dimension obtaining module is used for obtaining N target feature dimensions from the k feature dimensions based on the prediction relevance; the prediction relevance is used for representing the change of the corresponding feature dimension and the influence degree on the prediction result of the vulnerability attribute category;
the characteristic acquisition module is used for acquiring target sample vulnerability characteristics corresponding to the target sample vulnerability data under N target characteristic dimensions respectively;
the characteristic input module is used for inputting the vulnerability characteristics of the N target samples into the initial vulnerability detection model for prediction to obtain target sample prediction results aiming at vulnerability attribute categories;
and the label obtaining module is used for obtaining a target sample label of the target sample vulnerability data, and carrying out parameter adjustment on the initial vulnerability detection model based on the target sample prediction result and the target sample label to obtain a vulnerability detection model for vulnerability attribute type detection.
Wherein, the relevancy obtaining module comprises:
the data acquisition unit is used for acquiring d sample vulnerability data and a sample label corresponding to each sample vulnerability data; the d sample vulnerability data comprise target sample vulnerability data; d is a positive integer;
the proportion obtaining unit is used for obtaining the category proportion of the vulnerability attribute categories based on the sample labels respectively corresponding to the d sample vulnerability data, and determining the category information quantity of the vulnerability attribute categories based on the category proportion of the vulnerability attribute categories;
the information quantity determining unit is used for determining the dimension information quantity of the vulnerability attribute category under the ith characteristic dimension according to the characteristic state under the ith characteristic dimension and the sample labels respectively corresponding to the d sample vulnerability data; i is a positive integer less than or equal to k; the characteristic state under the ith characteristic dimension is used for representing the distribution condition of the characteristics of the d sample vulnerability data under the ith characteristic dimension;
and the first association degree determining unit is used for determining the prediction association degree of the ith characteristic dimension and the vulnerability attribute category based on the category information quantity of the vulnerability attribute category and the dimension information quantity of the vulnerability attribute category under the ith characteristic dimension.
Wherein the characteristic state comprises a first characteristic state and a second characteristic state; the number of vulnerability attribute categories is m; m is a positive integer;
the information amount determination unit includes:
the first probability determination subunit is used for determining the number of first sample tags corresponding to m vulnerability attribute categories respectively based on the sample tags corresponding to the d sample vulnerability data respectively in the first feature state of the ith feature dimension, and determining the first feature probability of the m vulnerability attribute types in the first feature state of the ith feature dimension respectively based on the number of the m first sample tags;
the second probability determining subunit is configured to determine, in a second feature state of an ith feature dimension, second sample label quantities respectively corresponding to m vulnerability attribute categories based on sample labels respectively corresponding to d sample vulnerability data, and determine, based on the m second sample label quantities, second feature probabilities respectively corresponding to the m vulnerability attribute types in the second feature state of the ith feature dimension;
and the probability integration subunit is used for performing probability integration processing on the m first characteristic probabilities and the m second characteristic probabilities to obtain dimension information quantity of the vulnerability attribute categories under the ith characteristic dimension.
Wherein, the association degree obtaining module further comprises:
the characteristic obtaining unit is used for obtaining k dimensionality characteristics to be detected of the sample vulnerability data under k characteristic dimensionalities;
the dimension conversion unit is used for carrying out dimension conversion on the k dimensions of features to be detected to obtain k vulnerability set features corresponding to the k feature dimensions respectively;
the feature detection unit is used for respectively inputting the k vulnerability set features into the initial vulnerability detection model for prediction to obtain the dimension category probability of the k vulnerability set features respectively aiming at the vulnerability attribute categories;
and the second relevance determining unit is used for determining the prediction relevance of the k characteristic dimensions and the vulnerability attribute categories respectively according to the dimension category probability of the k vulnerability set characteristics respectively aiming at the vulnerability attribute categories and the difference data between the sample labels corresponding to the sample vulnerability data.
Wherein, the dimension acquisition module includes:
a threshold value obtaining unit configured to obtain a prediction association threshold value;
the dimension clustering unit is used for clustering the characteristic dimension smaller than the prediction relevance threshold value in the k characteristic dimensions to obtain a first common dimension;
and the dimension determining unit is used for determining N target feature dimensions from the first common dimension and the feature dimensions which are greater than or equal to the prediction relevance threshold.
Wherein, the characteristic acquisition module includes:
the information acquisition unit is used for acquiring the sample data type of the target sample vulnerability data and acquiring vulnerability information corresponding to the sample data type;
the keyword acquisition unit is used for acquiring vulnerability keywords from target sample vulnerability data based on vulnerability information;
and the feature extraction unit is used for performing feature extraction processing on the vulnerability keywords under the N target feature dimensions to obtain target sample vulnerability features corresponding to the target sample vulnerability data under the N target feature dimensions respectively.
An embodiment of the present application provides a data processing apparatus, including:
the data acquisition module is used for acquiring target to-be-detected characteristics of the target to-be-detected data respectively corresponding to the N target characteristic dimensions; the N target feature dimensions are determined from the k feature dimensions based on the prediction relevance of the k feature dimensions and the vulnerability attribute categories respectively;
and the vulnerability detection module is used for inputting the target to-be-detected features respectively corresponding to the N target feature dimensions into a vulnerability detection model for vulnerability detection to obtain a target detection result corresponding to the target to-be-detected data.
Wherein the number of vulnerability attribute categories is m; m is a positive integer;
the vulnerability detection module is specifically used for obtaining m vulnerability attribute categories and the prediction probability corresponding to each vulnerability attribute category, and determining a target detection result from the m vulnerability attribute categories based on the prediction probability.
The data processing apparatus further includes:
and the data isolation module is used for isolating the target to-be-detected data corresponding to the abnormal detection category and sending a data abnormal message to the vulnerability management equipment if the target detection result is the abnormal detection category in the m vulnerability attribute categories.
One aspect of the present application provides a computer device, comprising: a processor, a memory, a network interface;
the processor is connected to the memory and the network interface, wherein the network interface is used for providing a data communication function, the memory is used for storing a computer program, and the processor is used for calling the computer program to enable the computer device to execute the method in the embodiment of the application.
An aspect of the present embodiment provides a computer-readable storage medium, in which a computer program is stored, where the computer program is adapted to be loaded by a processor and to execute the method in the present embodiment.
An aspect of an embodiment of the present application provides a computer program product or a computer program, where the computer program product or the computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium; the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method in the embodiment of the present application.
In the embodiment of the application, the prediction association degrees of k characteristic dimensions and vulnerability attribute categories are obtained, and N target characteristic dimensions are obtained from the k characteristic dimensions based on the prediction association degrees; the prediction relevance is used for representing the change of the corresponding feature dimension and the influence degree on the prediction result of the vulnerability attribute category; acquiring target sample vulnerability characteristics corresponding to target sample vulnerability data under N target characteristic dimensions respectively, inputting the N target sample vulnerability characteristics into an initial vulnerability detection model for prediction, and obtaining a target sample prediction result aiming at vulnerability attribute categories; and acquiring a target sample label of target sample vulnerability data, and performing parameter adjustment on the initial vulnerability detection model based on a target sample prediction result and the target sample label to obtain a vulnerability detection model for vulnerability attribute class detection. According to the embodiment of the application, the prediction relevance between the k characteristic dimensions and the vulnerability attribute categories is introduced, the characteristic selection is carried out on the k characteristic dimensions, the vulnerability characteristics are predicted according to the initial vulnerability detection model, and the prediction result aiming at the vulnerability attribute categories is obtained. Along with the reduction of the characteristic dimension, the calculation amount can be reduced, the incidence relation between the vulnerability data and the vulnerability attribute category can be more accurately determined, and the vulnerability detection efficiency of the vulnerability data is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a system architecture diagram according to an embodiment of the present application;
FIG. 2a is a schematic view of a scene for classifying object data according to an embodiment of the present disclosure;
FIG. 2b is a schematic view of a scene for classifying object data according to an embodiment of the present disclosure;
fig. 3 is a schematic flowchart of a data processing method according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a vulnerability detection model training process according to an embodiment of the present application;
FIG. 5 is a flow chart illustrating classification of object data according to an embodiment of the present disclosure;
FIG. 6 is a schematic flow chart diagram of another data processing method provided in the embodiments of the present application;
FIG. 7a is a schematic structural diagram for feature selection according to an embodiment of the present disclosure;
fig. 7b is a schematic flowchart of an attack response provided by an embodiment of the present application;
fig. 8 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;
fig. 10 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The application relates to a deep learning technology in the field of artificial intelligence, and the deep learning technology is used for predicting the incidence relation between an object entity and the object entity, training a relation prediction model and the like.
Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning, automatic driving, intelligent traffic and the like.
Machine Learning (Machine Learning) is a discipline that specializes in how computers simulate or implement human Learning behaviors to acquire new knowledge or skills and reorganize existing knowledge structures to continuously improve their performance.
Deep Learning (DL) is a new research direction in the field of Machine Learning (ML), which is introduced into Machine Learning to make it closer to the original target, Artificial Intelligence (AI). Deep learning is the intrinsic law and expression level of the learning sample data, and the information obtained in the learning process is very helpful for the interpretation of data such as characters, images and sounds. The final aim of the method is to enable the machine to have the analysis and learning capability like a human, and to recognize data such as characters, images and sounds. Deep learning is a complex machine learning algorithm, and achieves the effect in speech and image recognition far exceeding the prior related art. Deep learning has achieved many achievements in search technology, data mining, machine learning, machine translation, natural language processing, multimedia learning, speech, recommendation and personalization technologies, and other related fields. The deep learning enables the machine to imitate human activities such as audio-visual and thinking, solves a plurality of complex pattern recognition problems, and makes great progress on the artificial intelligence related technology.
Referring to fig. 1, fig. 1 is a schematic diagram of a system architecture according to an embodiment of the present disclosure. As shown in fig. 1, the system may include a computer device 100 and a terminal cluster, which may include: the terminal device 200a, the terminal device 200b, the terminal devices 200c, …, and the terminal device 200n, it is understood that the system may include one or more terminal devices, and the number of terminal devices is not limited in this application. The above-mentioned terminal device may be an electronic device, including but not limited to a mobile phone, a tablet computer, a desktop computer, a notebook computer, a palm computer, a vehicle-mounted device, an Augmented Reality/Virtual Reality (AR/VR) device, a helmet display, a smart television, a wearable device, a smart speaker, a digital camera, a camera, and other Mobile Internet Devices (MID) with network access capability, or a terminal device in a scene such as a train, a ship, or a flight, and the like.
The computer device mentioned in the present application may be a server or a terminal device, or may be a system composed of a server and a terminal device.
Communication connection may exist between the terminal clusters, for example, communication connection exists between the terminal device 200a and the terminal device 200b, and communication connection exists between the terminal device 200a and the terminal device 200 c. Meanwhile, any terminal device in the terminal cluster may have a communication connection with the service server 100, for example, a communication connection exists between the terminal device 200a and the service server 100, where the communication connection is not limited to a connection manner, and may be directly or indirectly connected through a wired communication manner, may also be directly or indirectly connected through a wireless communication manner, and may also be through other manners, which is not limited in this application.
It should be understood that each terminal device in the terminal cluster shown in fig. 1 may be installed with an application client for transmitting object data, and when the application client runs in each terminal device, data interaction, i.e. the above-mentioned communication connection, may be performed between the application client and the computer device 100 shown in fig. 1, respectively. The application client can be an application client with an object data transmission function, such as a short video application, a live broadcast application, a social application, an instant messaging application, a game application, a music application, a shopping application, a novel application, a browser and the like. The application client may be an independent client, or may be an embedded sub-client integrated in a certain client (for example, a social client, an educational client, a multimedia client, and the like), which is not limited herein.
The computer equipment can acquire the data to be detected from any terminal equipment or computer equipment, detect the data to be detected and determine whether a bug exists in the data to be detected; or sample vulnerability data used for model training is obtained from any terminal equipment or computer equipment, and model training is carried out based on the obtained sample vulnerability data to obtain a vulnerability detection model.
For convenience of subsequent understanding and description, please refer to fig. 2a together, and fig. 2a is a schematic view of a scene for classifying object data according to an embodiment of the present application. In fig. 2a, the computer device 300 may detect sample vulnerability data, and may perform initial vulnerability detection model training based on the sample vulnerability data to obtain a vulnerability detection model. Specifically, the computer device 300 may build a test environment, and obtain test simulation data in the test environment, where the test simulation data refers to data generated in the test environment, and the test simulation data may be data generated by simulating an attack in the test environment, and may include both data generating a simulation attack and data not generating a simulation attack, and may determine data generating a simulation attack as a positive sample and data not generating a simulation attack as a negative sample; and determining the obtained test simulation data as sample vulnerability data. Optionally, historical vulnerability data generated in an actual network environment may be acquired, the historical vulnerability data is determined as sample vulnerability data, and feature selection and extraction are performed on the sample vulnerability data to obtain sample vulnerability features. Model training is carried out on the initial vulnerability detection model based on the sample vulnerability characteristics, and a vulnerability detection model for classifying to-be-detected data is obtained.
Further, in fig. 2a, the computer device 300 may obtain data to be detected, and perform vulnerability classification processing on the data to be detected based on the vulnerability detection model. The computer device 300 may obtain data to be detected, detect the data to be detected based on the vulnerability detection model, determine a detection result, and if the detection result includes vulnerability data, perform isolation processing on the data to be detected.
For convenience of subsequent understanding and description, please refer to fig. 2b together, and fig. 2b is a schematic view of a scene for classifying object data according to an embodiment of the present application. In fig. 2b, the computer device 300 can collect the data to be detected through the test simulation environment and the actual network environment. And inputting the object data into the vulnerability inspection model to perform object data classification detection processing, and if the detection result comprises vulnerability data, performing isolation processing on the detection result comprising the vulnerability data.
It can be understood that the method provided in the embodiment of the present application may be executed by a computer device, where the server may be an independent physical server, may also be a server cluster or a distributed system formed by a plurality of physical servers, and may also be a cloud server that provides basic cloud computing services such as a cloud database, a cloud service, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a CDN, and a big data and artificial intelligence platform. The terminal devices include, but are not limited to, mobile phones, computers, intelligent voice interaction devices, intelligent household appliances, vehicle-mounted terminals, and the like. The terminal device and the computer device may be directly or indirectly connected in a wired or wireless manner, and the embodiment of the present application is not limited herein.
It is understood that the system architecture described above may be applied to a search system and a knowledge graph construction scenario, and specific service scenarios will not be listed here.
Further, please refer to fig. 3, where fig. 3 is a schematic flow chart of a data processing method according to an embodiment of the present application. As shown in fig. 3, the data processing method may include at least the following steps S101 to S103.
S101, acquiring prediction association degrees of k characteristic dimensions and vulnerability attribute categories respectively, and acquiring N target characteristic dimensions from the k characteristic dimensions based on the prediction association degrees; and the prediction relevance is used for representing the change of the corresponding feature dimension and the influence degree on the prediction result of the vulnerability attribute category.
Specifically, the feature dimension may refer to the number of feature attributes corresponding to the object data, and the feature of the object data under each feature attribute may be extracted based on the feature attributes, where the feature attributes are determined by the data type of the object data, for example, in the present application, the feature attributes may include, but are not limited to, a vulnerability location attribute, a vulnerability type attribute, a vulnerability content attribute, and the like if the object data is vulnerability-related data. That is, each vulnerability data corresponds to k feature attributes, each feature attribute can represent one feature dimension, and k is a positive integer; for example, assuming that k is 3, k feature attributes are a vulnerability location attribute, a vulnerability type attribute and a vulnerability content attribute, respectively, and vulnerability data 1 is processed based on the k feature attributesExtracting features, namely obtaining vulnerability features 1 (S1, S2, S3), wherein S1 refers to the features extracted by vulnerability data 1 under the vulnerability location attribute, and corresponds to feature dimension 1; the S2 is a feature extracted from the vulnerability data 1 under the vulnerability type attribute, and corresponds to the feature dimension 2; the S3 refers to the feature extracted by the vulnerability data 1 under the vulnerability content attribute, and corresponds to feature dimension 3; the sequence of S1, S2, and S3 is not limited to the sequence shown above, i.e., the relative positions of the k feature dimensions are not limited. The prediction relevance of the k characteristic dimensions of the object data and the vulnerability attribute categories can represent the influence degree of the k characteristic dimensions on the vulnerability attribute categories of the object data. N target feature dimensions can be obtained from the k feature dimensions based on the influence degree of the k feature dimensions on the vulnerability attribute categories of the object data respectively. The feature dimension may be ThIt is represented that k is a positive integer and h is equal to or less than k. For example, k may be 4, and k feature dimensions may be denoted as T1、T2、T3、T4
The vulnerability may be a defect existing in a hardware system, a software system, a protocol system, and the like, and the defect may be found in a specific implementation process or a system security policy. In the case of unauthorized, a vulnerability may assist an attacker in accessing or destroying the system. Vulnerabilities can be a variety of factors that compromise the compositional structure and data content of a computer network system. The vulnerability attribute category may include m vulnerability types, where m is a positive integer, for example, when m is 2, the vulnerability attribute category may be considered to include a malicious attribute category and a normal attribute category, where the malicious attribute category may correspond to malicious object data, and the normal attribute category may correspond to normal object data; for example, when m is 1, the vulnerability attribute category may be considered to include a vulnerability status, and a value of the vulnerability status is used to indicate that a vulnerability exists or does not exist; for example, the m vulnerability attribute categories may include (m-1) vulnerability types and normal attribute categories, etc. The high-risk vulnerability can be a serious vulnerability in a software system, and can be driven by a hacker in a virus or Trojan horse mode, and important data (such as object data related information, passwords and the like) can be stolen after the high-risk vulnerability invades the software system; high-risk bugs can even cause a software system to crash, thereby causing the situation that the whole software system cannot be used. Therefore, the detection of the high-risk vulnerability can be realized through the method, for example, the sample vulnerability data related to the high-risk vulnerability is obtained to perform model training, that is, the sample obtained from the high-risk vulnerability exists in the obtained sample vulnerability data, so as to obtain the vulnerability inspection model.
Under a prediction relevance obtaining mode, obtaining d sample vulnerability data and a sample label corresponding to each sample vulnerability data; the d sample vulnerability data comprise target sample vulnerability data; d is a positive integer. And acquiring the category proportion of the vulnerability attribute categories based on the sample labels respectively corresponding to the d sample vulnerability data, and determining the category information quantity of the vulnerability attribute categories based on the category proportion of the vulnerability attribute categories. Determining dimension information quantity of vulnerability attribute categories under the ith characteristic dimension according to the characteristic state under the ith characteristic dimension and the sample labels respectively corresponding to the d sample vulnerability data; i is a positive integer less than or equal to k; the characteristic state under the ith characteristic dimension is used for characterizing the distribution condition of the characteristics of the d sample vulnerability data under the ith characteristic dimension. And determining the prediction association degree of the ith characteristic dimension and the vulnerability attribute category based on the category information quantity of the vulnerability attribute category and the dimension information quantity of the vulnerability attribute category under the ith characteristic dimension.
Specifically, according to sample tags respectively corresponding to the d sample vulnerability data, a sample tag corresponding to each vulnerability attribute category can be obtained, the sample number of the sample vulnerability data associated in the d sample vulnerability data is determined, and the category proportion of each vulnerability attribute category is determined based on the proportion of the sample number of each vulnerability attribute category in the d sample vulnerability data. According to the category proportion of the vulnerability attribute categories, the expression of the category information quantity of the vulnerability attribute categories can be obtained, and the category information quantity of the vulnerability attribute categories can be obtained. For example, assume that the category of the vulnerability attribute category is 2, and includes vulnerability attribute category 1 and vulnerability attribute category 2, where d1 sample vulnerability data exists in d sample vulnerability data, the sample tags of d1 sample vulnerability data are vulnerability attribute category 1, d2 sample vulnerability data exists in d sample vulnerability data, and the sample tags of d2 sample vulnerability data are vulnerability attribute category 2, the category proportion of the vulnerability attribute category 1 is d 1/(d 1+ d 2), the category proportion of the vulnerability attribute category 2 is d 2/(d 1+ d 2), d1 is a positive integer less than or equal to d, and d2 is a positive integer less than or equal to d.
Wherein, the sample label can be a certain vulnerability attribute category. The sample vulnerability data can be a sample with sample tag content selected from object data as a certain vulnerability attribute category. The sample vulnerability data may be data obtained from a database of the computer device 300, and the data in the database of the computer device 300 may be used as a positive sample of the sample vulnerability data; or data generated when a test environment is set up to simulate attacks, and the test simulation data can be used as a positive sample of sample vulnerability data; the acquired historical vulnerability data can be used as a positive sample of the sample vulnerability data. Optionally, some conventional data may be selected as negative samples in the sample vulnerability data. The category proportion of the vulnerability attribute category may be the occurrence probability of the vulnerability attribute category. Further, the computer device may obtain a ratio logarithm corresponding to the category ratio of the vulnerability attribute category, and perform data fusion on the category ratio of the vulnerability attribute type and the ratio logarithm corresponding to the category ratio to obtain the sub information amount. If the number of the vulnerability attribute categories is 1, determining the sub information amount corresponding to the vulnerability attribute categories as category information amount; and if the number of the vulnerability attribute categories is not 1, fusing the sub information quantities corresponding to the vulnerability attribute categories to obtain category information quantities. For example, the category ratio of vulnerability attribute categories may be denoted as P (y), and the category ratio of the e-th vulnerability attribute category may be denoted as P (y)e) The category proportion of the e-th vulnerability attribute category can be converted to obtain category proportion associated information corresponding to the category proportions of the m vulnerability attribute categories, the m category proportion associated information is summed, e is a positive integer and is less than or equal to m, and then the vulnerability attribute categories are classified according to the category proportionThe amount of category information can be shown in formula (r):
Figure 950298DEST_PATH_IMAGE001
optionally, the category weight of each vulnerability attribute category may be obtained, and the sub information amounts respectively corresponding to each vulnerability attribute category are subjected to weighted summation processing based on the category weights, so as to obtain the category information amounts. Optionally, the generation process of the category information amount may refer to formula (ii):
Figure 133018DEST_PATH_IMAGE002
in formula 2, P (y)e) May represent the class proportion of the e-th vulnerability attribute class,
Figure 381597DEST_PATH_IMAGE003
a category weight under the e vulnerability attribute categories may be represented,
Figure 980068DEST_PATH_IMAGE004
the amount of category information can be represented.
It can be understood that the feature state under the ith feature dimension is obtained according to features corresponding to d sample vulnerability data under the ith feature dimension, and the feature state under the ith feature dimension is used for characterizing the distribution condition of the features of the d sample vulnerability data under the ith feature dimension, optionally, if the sample vulnerability data j has features under the ith feature dimension, the feature state of the sample vulnerability data j under the ith feature dimension can be considered as the first feature state; if the sample vulnerability data j does not have the characteristics under the ith characteristic dimension, the characteristic state of the sample vulnerability data j under the ith characteristic dimension can be considered as a second characteristic state. The sample vulnerability data j is any one of d sample vulnerability data, and j is a positive integer less than or equal to d. According to the feature state under the ith feature dimension and the sample labels respectively corresponding to the d sample vulnerability data, the two-dimensional corresponding relation between the feature state and the sample labels can be obtained, and according to the two-dimensional corresponding relation, the dimension information quantity of the vulnerability attribute category under the ith feature dimension can be determined.
Specifically, the characteristic state includes a first characteristic state and a second characteristic state; the number of vulnerability attribute categories is m; m is a positive integer; and under the first characteristic state of the ith characteristic dimension, determining the number of first sample tags corresponding to the m vulnerability attribute categories respectively based on the sample tags corresponding to the d sample vulnerability data respectively, wherein the characteristic state of the sample vulnerability data counted by the number of the first sample tags under the ith characteristic dimension is the first characteristic state. And determining first feature probabilities of the m vulnerability attribute types in the first feature state of the ith feature dimension respectively based on the number of the m first sample tags. For example, if the number of the acquired first sample tags is 6, it indicates that the feature states of the 6 sample vulnerability data in the ith feature dimension are all the first feature states. Determining the number of second sample labels respectively corresponding to m vulnerability attribute categories based on the sample labels respectively corresponding to d sample vulnerability data in the second characteristic state of the ith characteristic dimension, and determining the second characteristic probability of the m vulnerability attribute types in the second characteristic state of the ith characteristic dimension based on the number of the m second sample labels; and performing probability integration processing on the m first characteristic probabilities and the m second characteristic probabilities to obtain dimension information quantity of the vulnerability attribute category under the ith characteristic dimension. For example, in a feature probability obtaining manner, m feature probabilities of the e-th vulnerability attribute category in the f-th feature state of the i-th feature dimension may be converted to obtain m pieces of probability associated information respectively corresponding to the m feature probabilities, and the m pieces of probability associated information are summed to obtain a dimension information amount in the i-th feature dimension, where f is a positive integer and e is a positive integer, and as shown in formula (c):
Figure 263282DEST_PATH_IMAGE005
in formula (c), xfCan be represented inin the f-th characteristic state in i characteristic dimensions, P (y)e|xf) The feature probability of the e-th vulnerability attribute category in the f-th feature state of the i-th feature dimension can be represented.
Figure 351324DEST_PATH_IMAGE006
Can represent probability associated information corresponding to the feature probability of the e-th vulnerability attribute category in the f-th feature state of the i-th feature dimension,
Figure 87199DEST_PATH_IMAGE007
the dimension information amount in the ith feature dimension can be represented.
Optionally, in an obtaining manner of the feature probability, the feature probability of the e-th vulnerability attribute category in the f-th feature state of the i-th feature dimension may be converted to obtain probability associated information corresponding to the feature probabilities of the m vulnerability attribute categories, the data proportion under the e-th vulnerability attribute category is used as a related weight, and the m probability associated information is summed according to the related weight to obtain the dimension information amount under the i-th feature dimension, which may be specifically shown in formula (iv):
Figure 486432DEST_PATH_IMAGE008
in the formula (iv), xfCan represent the f-th feature state in i feature dimensions, P (y)e|xf) The feature probability of the e-th vulnerability attribute category in the f-th feature state of the i-th feature dimension can be represented. P (x)e) May represent the fraction of data under the e vulnerability attribute categories,
Figure 420890DEST_PATH_IMAGE009
can represent probability associated information corresponding to the feature probability of the e-th vulnerability attribute category in the f-th feature state of the i-th feature dimension,
Figure 883095DEST_PATH_IMAGE010
the dimension information amount in the ith feature dimension can be represented.
For another example, the number d of the sample vulnerability data may be 12, the first feature state of the ith feature dimension may be 0, which represents that the sample vulnerability data does not contain the ith feature dimension, and the second feature state of the ith feature dimension may be 1, which represents that the sample vulnerability data contains the ith feature dimension. The number m of vulnerability attribute categories can be 2, A can represent malicious attribute categories, B can represent normal attribute categories, and 6 malicious sample sets A can exist in 12 sample vulnerability datagAnd 6 normal sample sets BgG is a positive integer, and g is less than or equal to d. If malicious sample set AgOf these, 0 are the first characteristic states, and 6 are the second characteristic states; normal sample set BgOf these, 4 are the first characteristic states, and 2 are the second characteristic states; the first feature probability may include a first feature probability of sample vulnerability data of the malicious sample set in proportion to sample vulnerability data of the first feature state, and a first feature probability of sample vulnerability data of the normal sample set in proportion to sample vulnerability data of the first feature state, namely 0/4 and 4/4; the second feature probability may include a second feature probability of sample vulnerability data of the malicious sample set in proportion to sample vulnerability data of the second feature state, and a second feature probability of sample vulnerability data of the normal sample set in proportion to sample vulnerability data of the second feature state, namely 6/8 and 2/8; performing probability integration processing on the 2 first feature probabilities and the 2 second feature probabilities to obtain dimension information amount of the vulnerability attribute category under the ith feature dimension, which can be specifically shown in formula (v):
Figure 840687DEST_PATH_IMAGE011
in formula V, assume Si=0 denotes a first characteristic state in the i-th characteristic dimension, H (Y = a i |)i=0) may represent the dimension information quantity with the vulnerability attribute class a in the first feature state of the i-th feature dimension, H (Y = B i S)i=0) may represent a first characteristic state in the i-th characteristic dimensionNext, the vulnerability attribute category is the dimension information quantity of B; suppose Si=1 denotes a second characteristic state in the i-th characteristic dimension, H (Y = a i |)i=1) may represent a dimension information amount with a vulnerability attribute class a in a second feature state of the ith feature dimension, H (Y = B i S)i=1) may represent the dimension information amount that the vulnerability attribute category is B in the second feature state of the i-th feature dimension.
Further, the prediction association degree of the ith characteristic dimension and the vulnerability attribute category is determined based on the category information quantity of the vulnerability attribute category and the dimension information quantity of the vulnerability attribute category under the ith characteristic dimension.
Under a prediction relevance obtaining mode, obtaining k dimensionality characteristics to be measured, corresponding to the sample vulnerability data under k characteristic dimensionalities respectively; performing dimension conversion on the k dimension features to be detected to obtain k vulnerability set features respectively corresponding to the k feature dimensions; respectively inputting the k vulnerability set characteristics into an initial vulnerability detection model for prediction to obtain dimension category probabilities of the k vulnerability set characteristics respectively aiming at vulnerability attribute categories; and determining the prediction association degrees of the k characteristic dimensions and the vulnerability attribute categories respectively according to the dimension category probability of the k vulnerability set characteristics respectively aiming at the vulnerability attribute categories and the difference data between the sample labels corresponding to the sample vulnerability data.
Specifically, dimension conversion is carried out on k dimensions of features to be detected, and the obtained k vulnerability set features can meet the input standard of an initial vulnerability detection model; inputting the k vulnerability set characteristics into an initial vulnerability detection model for detection to obtain the dimension category probability of the k vulnerability set characteristics respectively aiming at vulnerability attribute categories; according to the corresponding relation between the k vulnerability set characteristics and the k dimensionality characteristics to be tested, dimensionality probability to be tested of the k dimensionality characteristics to be tested respectively aiming at the vulnerability attribute categories can be obtained; according to the corresponding relation between the k dimension to-be-detected features and the k feature dimensions, the feature probability of the k feature dimensions respectively aiming at the vulnerability attribute categories can be obtained; and determining the prediction association degrees of the k characteristic dimensions and the vulnerability attribute categories respectively according to the characteristic probabilities of the k characteristic dimensions respectively aiming at the vulnerability attribute categories and the difference data between the sample labels corresponding to the sample vulnerability data.
For example, k may be 4, and the feature to be detected in 4 dimensions may be S1、S2、S3、S4Dimension conversion is performed on the features to be detected, and the obtained 4 vulnerability set features may be that the features to be detected are subjected to processing of filling standard values according to the input format of the initial vulnerability detection model. Wherein the standard value may be 0, and the 4 vulnerability set characteristics may be { S }1,0,0,0}、{0,S2,0,0}、{0,0,S3,0}、{0,0,0,S4}。
Further, a prediction relevance threshold value is obtained; in the k characteristic dimensions, clustering the characteristic dimensions smaller than the prediction relevance threshold to obtain a first common dimension; among the first commonality dimension and the feature dimension larger than or equal to the prediction relevance threshold, N target feature dimensions are determined.
Optionally, the k feature dimensions may be subjected to clustering processing, a common dimension is extracted to obtain a second common dimension, and the second common dimension is determined as N target feature dimensions.
Optionally, the feature dimension greater than or equal to the prediction relevance threshold may be obtained, and the feature dimension greater than or equal to the prediction relevance threshold is determined as N target feature dimensions. Or, marking the feature dimension which is greater than or equal to the prediction relevance threshold as a quality dimension, and acquiring N target feature dimensions from the quality dimension. Optionally, the k feature dimensions may be sorted based on the prediction relevance threshold, the top N feature dimensions are obtained from the sorted k feature dimensions, and the top N feature dimensions are determined as N target feature dimensions.
Step S102, acquiring target sample vulnerability characteristics of target sample vulnerability data respectively corresponding to N target characteristic dimensions, inputting the N target sample vulnerability characteristics into an initial vulnerability detection model for prediction, and obtaining a target sample prediction result aiming at vulnerability attribute categories;
in particular, the sample vulnerability data may include a target sample vulnerability numberAccording to N target feature dimensions may be TuU is a positive integer, and u is less than or equal to N; extracting features of the target sample vulnerability data under N target feature dimensions to obtain target sample vulnerability features respectively corresponding to the N target feature dimensions, wherein the target sample vulnerability features can be Su. Will SuAnd inputting the target sample into an initial vulnerability detection model for prediction to obtain a target sample prediction result aiming at the vulnerability attribute category.
If the N target feature dimensions comprise a first common dimension or a second common dimension, executing the process, recording the first common dimension or the second common dimension as the target feature dimensions, selecting the feature dimensions from the feature dimensions which are greater than or equal to the prediction relevance threshold, forming N target feature dimensions together with the first common dimension or the second common dimension, and extracting the sample vulnerability characteristics of the target sample vulnerability data under the target feature dimensions. Specifically, in the training process, the process of obtaining N target feature dimensions may be: and recording the characteristic dimension of which the prediction relevance is greater than or equal to the prediction relevance threshold as a quality dimension, and acquiring the quality dimension characteristic corresponding to the quality dimension. And obtaining the characteristic dimension smaller than the prediction relevance threshold value in the characteristic dimensions of the sample, obtaining the sample vulnerability characteristics under the characteristic dimension with the commonality, and performing characteristic fusion on the sample vulnerability characteristics with the commonality to obtain the vulnerability commonality characteristics. And determining the quality dimension characteristics and the vulnerability commonality characteristics as sample vulnerability characteristics corresponding to the sample under the N target characteristic dimensions.
The method comprises the steps of obtaining a sample data type of target sample vulnerability data, and obtaining vulnerability information corresponding to the sample data type; acquiring vulnerability keywords from target sample vulnerability data based on vulnerability information; and under the N target feature dimensions, performing feature extraction processing on the vulnerability keywords to obtain target sample vulnerability features corresponding to the target sample vulnerability data under the N target feature dimensions respectively.
Different vulnerability attribute categories have different vulnerability information, and the different vulnerability information can correspond to different vulnerability keyword vulnerability positions, vulnerability identifications and the like. When the sample vulnerability data is obtained, the computer device 300 may set up an attack environment with a vulnerability, and when the vulnerability is attacked, the sample vulnerability data may be obtained, for example, the attack environment may be any version environment of Hypertext Preprocessor (PHP) development framework (thinkphp) version 5.0.0-5.0.12 or thinkphp version 5.0.13-5.0.23. Alternatively, the original data Of the vulnerability may be directly obtained from the public network, such as a vulnerability original code, a Proof Of Concept (POC), a vulnerability attack code (EXP ), and the like, and the obtained original data Of the vulnerability is determined as sample vulnerability data. The target sample vulnerability data may be a Uniform Resource Locator (URL) website, and the sample data type may be obtained as a URL access type according to the URL. And acquiring vulnerability information, and acquiring a position where a vulnerability indicated by the "&" symbol is possible to appear and a keyword part indicated by the "&" symbol from the target sample vulnerability data under the assumption that the url vulnerability information comprises the "&" symbol. For example, assuming that three parts are connected by the "&" symbol in the url sample vulnerability data, the three departments are obtained, the contents of the three parts (i.e., three url parameters) are determined as vulnerability keywords corresponding to the url sample vulnerability data, the vulnerability keywords of the three parts are extracted, the vulnerability keywords are subjected to feature extraction, and the extracted features are spliced to obtain target sample vulnerability features.
As another example, the offensive environment may be any of the versions of the environments thinkphp5.0.0-5.0.12 or thinkphp5.0.13-thinkp 5.0.23. The original data Of the vulnerability is directly obtained from the public network, such as vulnerability original code, poc (Proof Of Concept), exp (vulnerability attack code), and the like. The target sample vulnerability data can be a post website, and the sample data type can be obtained to be a post access type according to the fliter filter. And acquiring vulnerability information, and acquiring the position where the vulnerability indicated by the special data is possible to appear from the target sample vulnerability data and acquiring the keyword part indicated by the special data if the post vulnerability information comprises an s variable and the value of the fliter filter is 'system'. For example, assuming that in post sample vulnerability data, the post vulnerability information includes an s variable and the value of the fliter filter is "system", acquiring special data, determining the content of the special data as a vulnerability keyword corresponding to the post sample vulnerability data, extracting the vulnerability keyword corresponding to the post sample vulnerability data, performing feature extraction on the vulnerability keyword, and performing splicing processing on the extracted features to obtain target sample vulnerability features.
For example, under the version of thinphp5.0.0-5.0.12, the possible generated bug keywords may be shown in table 1, and the corresponding relationship between the bug attribute category and the keywords may be shown in table 1:
TABLE 1
Figure 843278DEST_PATH_IMAGE012
Under the possible sample data types, under the version of thinphp5.0.13-thinphp5.0.23, the corresponding relationship between the vulnerability attribute category and the keyword can also be seen in table 2:
TABLE 2
Figure 835504DEST_PATH_IMAGE013
It should be understood that the correspondence between the vulnerability attribute categories and the keywords listed in tables 1 and 2 is the correspondence between some vulnerability attribute categories and the keywords, and the correspondence between other vulnerability attribute categories and the keywords is not limited.
It can be understood that the feature extraction process may include analyzing the collected target sample vulnerability data, collecting features of the analyzed target sample vulnerability data, and performing vectorization processing on the collected features of the analyzed target sample vulnerability data, so as to obtain N target feature dimensions from k feature dimensions of the target sample vulnerability data, and obtain target sample vulnerability features.
Step S103, obtaining a target sample label of target sample vulnerability data, and performing parameter adjustment on the initial vulnerability detection model based on the target sample prediction result and the target sample label to obtain a vulnerability detection model for vulnerability attribute type detection.
Further, please refer to fig. 4, where fig. 4 is a schematic structural diagram of a vulnerability detection model training process according to an embodiment of the present application. In fig. 4, the computer device may obtain the malicious attribute type sample vulnerability data and the normal attribute type sample vulnerability data, extract the training sample vulnerability data corresponding to the malicious attribute type sample vulnerability data and the training sample vulnerability data corresponding to the normal attribute type sample vulnerability data to perform feature extraction, and obtain the training sample vulnerability features corresponding to the malicious attribute type sample vulnerability data and the training sample vulnerability features corresponding to the normal attribute type sample vulnerability data. Inputting training sample vulnerability characteristics corresponding to the malicious attribute type sample vulnerability data and training sample vulnerability characteristics corresponding to the normal attribute type sample vulnerability data into an initial vulnerability detection model to obtain sample prediction results respectively corresponding to the malicious attribute type training sample vulnerability characteristics and the normal attribute type training sample vulnerability characteristics. And carrying out model training and parameter adjustment according to sample labels respectively corresponding to the vulnerability characteristics of the malicious attribute type training sample and the vulnerability characteristics of the normal attribute type training sample and sample prediction results respectively corresponding to the vulnerability characteristics of the malicious attribute type training sample and the normal attribute type training sample to obtain a vulnerability detection model for vulnerability attribute type detection. Extracting test sample vulnerability data corresponding to the malicious attribute type sample vulnerability data and test sample vulnerability data corresponding to the normal attribute type sample vulnerability data, inputting the test sample vulnerability data corresponding to the malicious attribute type sample vulnerability data and the test sample vulnerability data corresponding to the normal attribute type sample vulnerability data into a vulnerability detection model, and obtaining a prediction result of the test sample vulnerability data corresponding to the malicious attribute type sample vulnerability data and the test sample vulnerability data corresponding to the normal attribute type sample vulnerability data.
It should be noted that training of the initial vulnerability detection model may extract a part of samples from sample vulnerability data (including a malicious attribute category sample set and a normal attribute category sample set) in a multiple-time and-repeat manner to generate a new sample set, each sample set may be constructed into a decision tree in subsequent model loop iterations, and a plurality of decision trees may be assembled into a random forest. According to the voting statistical result of each classification subtree of the decision tree, the obtained target sample detection result can keep the diversity of the target sample detection result, and the target sample detection result can have a wider selection range.
In the embodiment of the application, the prediction association degrees of k characteristic dimensions and vulnerability attribute categories are obtained, and N target characteristic dimensions are obtained from the k characteristic dimensions based on the prediction association degrees; acquiring target sample vulnerability characteristics corresponding to target sample vulnerability data under N target characteristic dimensions respectively, inputting the N target sample vulnerability characteristics into an initial vulnerability detection model for prediction, and obtaining a target sample prediction result aiming at vulnerability attribute categories; and performing parameter adjustment on the initial vulnerability detection model based on the target sample prediction result and the obtained target sample label of the target sample vulnerability data to obtain a vulnerability detection model. By adopting the method and the device, the influence degree of the change of the characteristic dimension on the prediction result of the vulnerability attribute category can be obtained. By the method and the device, the characteristic dimensionality can be selected, the incidence relation between the vulnerability data and the vulnerability attribute categories can be determined more accurately, and vulnerability detection efficiency of the vulnerability data is improved.
Further, please refer to fig. 5, and fig. 5 is a schematic flowchart of object data classification according to an embodiment of the present application. In fig. 5, a computer device 300 may set up a test environment for data collection. After sample vulnerability data is collected, feature extraction processing can be performed. In the process of feature processing, preliminary data analysis can be performed on the sample vulnerability data, features corresponding to the analyzed sample vulnerability data can be collected, and further, feature vectorization can be performed on the features corresponding to the collected sample vulnerability data to obtain sample vulnerability features after feature extraction. And carrying out feature selection processing on the extracted sample vulnerability features to obtain target sample vulnerability features. And inputting the vulnerability characteristics of the target sample into an initial vulnerability detection model, and performing vulnerability classification processing.
Further, please refer to fig. 6, where fig. 6 is a schematic flowchart of a data processing method according to an embodiment of the present application. As shown in fig. 6, the data processing method may include at least the following steps S201 to S203.
Step S201, acquiring target to-be-detected characteristics of target to-be-detected data respectively corresponding to N target characteristic dimensions; the N target feature dimensions are determined from the k feature dimensions based on the prediction relevance of the k feature dimensions and the vulnerability attribute categories respectively;
specifically, the target data to be detected may be unlabeled strange object data. The N target feature dimensions can be the feature dimensions with higher prediction relevance with the vulnerability attribute category in the k feature dimensions, and the target to be detected features respectively corresponding to the target to be detected data can be obtained under the N target feature dimensions, so that the accurately selected target to be detected features can be obtained. For example,
when the N target feature dimensions include a first common dimension or a second common dimension, the first common dimension or the second common dimension may be separated from the non-first common dimension and the non-second common dimension to acquire the feature to be detected. For example, if k is 6 and N is 4, the feature corresponding to k feature dimensions may be S1、S2、S3、S4、S5、S6. Acquiring a feature dimension (i.e. a first common dimension in step S101 in fig. 3) obtained after clustering the feature dimensions smaller than the prediction relevance threshold among the k feature dimensions; if S1、S2、S3Respectively corresponding feature dimensions are greater than a prediction relevance threshold, S4、S5、S6Respectively corresponding feature dimensions are smaller than the prediction relevance threshold, S4、S5、S6After the clustering processing of the corresponding feature dimensions, the obtained first common dimension may be S7The corresponding feature dimensions. The first common dimension and the feature dimension which is greater than or equal to the prediction relevance threshold are N target feature dimensions, namely S1、S2、S3、S7Respectively correspond toThe characteristic dimension of (c). Optionally, after the k feature dimensions are clustered, a second commonality dimension (i.e., the second commonality dimension in step S101 in fig. 3) may be obtained and determined as N target feature dimensions, i.e., S1、S2、S3、S4、S5、S6S obtained after clustering processing is carried out on the corresponding characteristic dimensions respectively8、S9、S10、S11Respectively corresponding characteristic dimensions, S8、S9、S10、S11The feature dimensions corresponding to each feature dimension are the N target feature dimensions.
Step S202, m vulnerability attribute categories and prediction probabilities corresponding to the vulnerability attribute categories are obtained, and target detection results are determined from the m vulnerability attribute categories based on the prediction probabilities; m is a positive integer.
Specifically, the target to-be-detected features respectively corresponding to the N target feature dimensions are input into a vulnerability detection model for vulnerability detection, the vulnerability detection model can output m vulnerability attribute categories and prediction probabilities corresponding to each vulnerability attribute category, and target detection results are determined from the m vulnerability attribute categories based on the prediction probabilities, so that target detection results corresponding to the target to-be-detected data are obtained.
Step S203, if the target detection result is an abnormal detection category in the m vulnerability attribute categories, isolating the target to-be-detected data corresponding to the abnormal detection category, and sending a data abnormal message to the vulnerability management device.
The anomaly detection category may include a malicious attribute category, such as a high-risk vulnerability. If the target detection result is an abnormal detection type, it indicates that a security risk exists in the software system, at this time, the computer device may send an abnormal message to the vulnerability management device, isolate the target to-be-detected data corresponding to the abnormal detection type, perform early warning on the entire software system, remind the vulnerability management device to perform software version check, and check whether a higher-level version is updated or not.
Specifically, the isolation processing may be that firewall processing is started on the target data to be detected corresponding to the abnormal detection category, and the firewall may intercept a source Internet Protocol Address (IP Address) corresponding to the target data to be detected corresponding to the abnormal detection category, or place the source Internet Protocol Address corresponding to the target data to be detected corresponding to the abnormal detection category into, for example, a software system blacklist. By starting firewall processing on the target data to be detected corresponding to the abnormal detection category, the effect of quickly detecting object data of malicious attribute types can be achieved.
In the embodiment of the application, the target to-be-detected characteristics of the target to-be-detected data respectively corresponding to the N target characteristic dimensions are obtained, and the target to-be-detected characteristics respectively corresponding to the N target characteristic dimensions are input into the vulnerability detection model for vulnerability detection, so that a target detection result corresponding to the target to-be-detected data is obtained. And if the target detection result is an abnormal detection type in the m vulnerability attribute types, isolating the target to-be-detected data corresponding to the abnormal detection type, and sending a data abnormal message to the vulnerability management equipment. By adopting the method and the device, the target to-be-detected data corresponding to the detected abnormal detection category can be isolated, the threat degree of the target to-be-detected data corresponding to the abnormal detection category to the software system can be weakened, and the overall safety of the software system is improved. By adopting the method and the device, the data exception message can be sent to the vulnerability management equipment, and the vulnerability management equipment can issue the received data exception message on the Internet, so that the benefit of the network security environment construction can be greatly improved.
According to the method and the device, the initial vulnerability detection model is adjusted based on the prediction relevance, so that the vulnerability attribute type of the vulnerability data of the target sample can be well predicted by the trained initial vulnerability detection model, further, the object data detection effect of the malicious attribute type can be improved, the labor input and the maintenance cost can be reduced by using the vulnerability detection model, and the saved resources can be used for conveniently expanding the protection range of the assets.
Referring to fig. 7a, fig. 7a is a schematic structural diagram for feature selection according to an embodiment of the present disclosure. As shown in fig. 7a, performing feature dimension extraction processing on the malicious sample set and the normal sample set in step S101 in fig. 3 to obtain a malicious sample dimension corresponding to the malicious sample set and a normal sample dimension corresponding to the normal sample set; based on the prediction relevance obtained in step S101 in fig. 3, feature selection is performed among k feature dimensions (i.e., malicious sample dimensions plus normal sample dimensions), N target feature dimensions are selected, and feature extraction is performed based on the N target feature dimensions, so as to obtain target sample features. Inputting the target sample characteristics into the initial vulnerability detection model in step S102 in fig. 3, and outputting a prediction result.
Referring to fig. 7b, fig. 7b is a schematic flowchart of a process for responding to an attack according to an embodiment of the present application. As shown in fig. 7b, when the computer device 300 detects an actual attack, data collection may be performed through the gateway device, the collected data is input into the vulnerability detection model, the vulnerability detection model classifies the data, the vulnerability attribute category of the actual attack data is output, and the classification result is sent to the firewall. The firewall may intercept or blacklist the data corresponding to the classification result. The attack of the vulnerability data to the software system can be prevented through the set of attack response flow, and the vulnerability attribute category of the actual attack can be quickly detected.
The data collection on the gateway device can adopt software with real-time packet capturing or mirror image processing, the vulnerability data in the actual attack can be obtained through real-time packet capturing, and the data copy which is the same as the vulnerability data in the actual attack can be obtained through mirror image processing.
Optionally, after the vulnerability detection model outputs the vulnerability attribute category of the actual attack data, the computer device 300 may search and process the output vulnerability attribute category on the internet, obtain the characteristics and the effective protection measures related to the vulnerability attribute type, and send and process the obtained characteristics and the effective protection measures related to the vulnerability attribute type to the firewall. If the new characteristics of the vulnerability attribute model, which are different from the known characteristics of the vulnerability attribute types on the Internet, are detected, the new characteristics of the vulnerability attribute types can be sent to the Internet, so that protection can be improved for more software systems, and a more sound safety protection network environment can be constructed.
Optionally, the firewall may intercept or blacklist the corresponding source address of the data corresponding to the classification result, and may perform commander selection on the vulnerability data from the root by intercepting the vulnerability data source address and other processing, so as to increase the security of the software system.
In the embodiment of the application, the prediction association degrees of k feature dimensions and vulnerability attribute categories are obtained, and N target feature dimensions are obtained from the k feature dimensions based on the prediction association degrees. By the method and the device, the incidence relation between the vulnerability data and the vulnerability attribute categories can be determined more accurately, and vulnerability detection efficiency of the vulnerability data is improved.
Further, please refer to fig. 8, where fig. 8 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application. The data processing means may be a computer program (including program code) running on a computer device, for example, an application software; the apparatus may be used to perform the corresponding steps in the methods provided by the embodiments of the present application. As shown in fig. 8, the data processing apparatus 1 may include: the system comprises an association degree acquisition module 11, a dimension acquisition module 12, a feature acquisition module 13, a feature input module 14 and a label acquisition module 15.
The relevancy obtaining module 11 is configured to obtain predicted relevancy between k feature dimensions and vulnerability attribute categories;
a dimension obtaining module 12, configured to obtain N target feature dimensions from the k feature dimensions based on the prediction association degree; the prediction relevance is used for representing the change of the corresponding feature dimension and the influence degree on the prediction result of the vulnerability attribute category;
the feature acquisition module 13 is configured to acquire target sample vulnerability features corresponding to the target sample vulnerability data under N target feature dimensions respectively;
the feature input module 14 is configured to input the vulnerability features of the N target samples into the initial vulnerability detection model for prediction, so as to obtain a target sample prediction result for the vulnerability attribute category;
and the tag obtaining module 15 is configured to obtain a target sample tag of the target sample vulnerability data, and perform parameter adjustment on the initial vulnerability detection model based on the target sample prediction result and the target sample tag to obtain a vulnerability detection model for vulnerability attribute class detection.
For specific functional implementation manners of the association degree obtaining module 11, the dimension obtaining module 12, the feature obtaining module 13, the feature input module 14, and the tag obtaining module 15, reference may be made to steps S101 to S103 in the embodiment corresponding to fig. 3, which is not described herein again.
Referring to fig. 8 again, the association degree obtaining module 11 includes:
the data acquisition unit 111 is configured to acquire d sample vulnerability data and a sample tag corresponding to each sample vulnerability data; the d sample vulnerability data comprise target sample vulnerability data; d is a positive integer;
a proportion obtaining unit 112, configured to obtain category proportions of vulnerability attribute categories based on sample tags respectively corresponding to the d sample vulnerability data, and determine category information amounts of the vulnerability attribute categories based on the category proportions of the vulnerability attribute categories;
an information amount determining unit 113, configured to determine, according to the feature state in the ith feature dimension and the sample tags respectively corresponding to the d sample vulnerability data, a dimension information amount of the vulnerability attribute category in the ith feature dimension; i is a positive integer less than or equal to k; the characteristic state under the ith characteristic dimension is used for representing the distribution condition of the characteristics of the d sample vulnerability data under the ith characteristic dimension;
a first relevance determining unit 114, configured to determine a predicted relevance between an ith feature dimension and a vulnerability attribute category based on a category information amount of the vulnerability attribute category and a dimension information amount of the vulnerability attribute category under the ith feature dimension.
For specific functional implementation manners of the data obtaining unit 111, the proportion obtaining unit 112, the information amount determining unit 113, and the first association degree determining unit 114, reference may be made to step S101 in the embodiment corresponding to fig. 3, which is not described herein again.
Referring to fig. 8, the feature states include a first feature state and a second feature state; the number of vulnerability attribute categories is m; m is a positive integer;
the information amount determination unit 113 includes:
a first probability determination subunit 1131, configured to determine, in a first feature state of an ith feature dimension, first sample label quantities respectively corresponding to m vulnerability attribute categories based on sample labels respectively corresponding to d sample vulnerability data, and determine, based on the m first sample label quantities, first feature probabilities respectively corresponding to the m vulnerability attribute types in the first feature state of the ith feature dimension;
a second probability determining subunit 1132, configured to determine, in a second feature state of an ith feature dimension, second sample label numbers respectively corresponding to m vulnerability attribute categories based on sample labels respectively corresponding to d sample vulnerability data, and determine, based on the m second sample label numbers, second feature probabilities respectively corresponding to the m vulnerability attribute types in the second feature state of the ith feature dimension;
and a probability integration subunit 1133, configured to perform probability integration processing on the m first feature probabilities and the m second feature probabilities, to obtain a dimension information amount of the vulnerability attribute category in the ith feature dimension.
For specific functional implementation manners of the first probability determining subunit 1131, the second probability determining subunit 1132 and the probability integrating subunit 1133, reference may be made to step S101 in the embodiment corresponding to fig. 3, and details are not described here.
Referring to fig. 8 again, the association obtaining module 11 further includes:
the feature obtaining unit 115 is configured to obtain k dimensional features to be measured, which correspond to the sample vulnerability data in k feature dimensions, respectively;
the dimension conversion unit 116 is configured to perform dimension conversion on the k dimensions of features to be detected to obtain k vulnerability set features corresponding to the k feature dimensions respectively;
the feature detection unit 117 is configured to input the k vulnerability set features into the initial vulnerability detection model respectively for prediction, so as to obtain dimension category probabilities of the k vulnerability set features respectively for vulnerability attribute categories;
the second relevance determining unit 118 is configured to determine prediction relevance between k feature dimensions and vulnerability attribute categories according to the dimension category probabilities of the k vulnerability set features respectively for vulnerability attribute categories and difference data between sample tags corresponding to the sample vulnerability data.
For specific functional implementation manners of the feature obtaining unit 115, the dimension converting unit 116, the feature detecting unit 117, and the second association degree determining unit 118, reference may be made to step S101 in the embodiment corresponding to fig. 3, which is not described herein again.
Referring to fig. 8 again, the dimension obtaining module 12 includes:
a threshold value obtaining unit 121 configured to obtain a prediction association threshold value;
the dimension clustering unit 122 is configured to perform clustering processing on the feature dimensions smaller than the prediction relevance threshold value among the k feature dimensions to obtain a first common dimension;
a dimension determining unit 123, configured to determine N target feature dimensions from among the first common dimension and the feature dimensions greater than or equal to the prediction relevance threshold.
For specific functional implementation manners of the threshold obtaining unit 121, the dimension clustering unit 122, and the dimension determining unit 123, reference may be made to step S101 in the embodiment corresponding to fig. 3, which is not described herein again.
Referring to fig. 8 again, the feature obtaining module 13 includes:
the information obtaining unit 131 is configured to obtain a sample data type of the target sample vulnerability data, and obtain vulnerability information corresponding to the sample data type;
a keyword obtaining unit 132, configured to obtain a vulnerability keyword from target sample vulnerability data based on vulnerability information;
the feature extraction unit 133 is configured to perform feature extraction processing on the vulnerability keywords in the N target feature dimensions, so as to obtain target sample vulnerability features corresponding to the target sample vulnerability data in the N target feature dimensions respectively.
For specific functional implementation manners of the information obtaining unit 131, the keyword obtaining unit 132, and the feature extracting unit 133, reference may be made to step S101 in the embodiment corresponding to fig. 3, which is not described herein again.
In the embodiment of the application, the object data of the malicious attribute category can be detected more accurately and more quickly by performing feature extraction and feature selection on the simulated object data of the malicious attribute category. The method and the device can detect the core nodes (namely high-risk vulnerabilities) in the attack paths of the malicious attribute types, can reduce the vulnerability attribute type detection range of a vulnerability detection model, and reduce the maintenance cost. According to the method, classification detection is carried out through a vulnerability detection model algorithm, compared with methods of establishing, accumulating, matching and the like of detection rules based on known flow attacks, the method can detect the object data of unknown vulnerability attribute types more intelligently, and along with parameter adjustment of a vulnerability detection model, the accuracy of vulnerability attribute type detection can be improved. By the method and the device, the incidence relation between the vulnerability data and the vulnerability attribute categories can be determined more accurately, and vulnerability detection efficiency of the vulnerability data is improved.
Further, please refer to fig. 9, where fig. 9 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application. The data processing means may be a computer program (including program code) running on a computer device, for example, an application software; the apparatus may be used to perform the corresponding steps in the methods provided by the embodiments of the present application. As shown in fig. 9, the data processing apparatus 2 may include: a data acquisition module 21, a vulnerability detection module 22 and a data isolation module 23.
The data acquisition module 21 is configured to acquire target to-be-detected features corresponding to the target to-be-detected data under N target feature dimensions respectively; the N target feature dimensions are determined from the k feature dimensions based on the prediction relevance of the k feature dimensions and the vulnerability attribute categories respectively;
and the vulnerability detection module 22 is configured to input the target to-be-detected features respectively corresponding to the N target feature dimensions into a vulnerability detection model for vulnerability detection, so as to obtain a target detection result corresponding to the target to-be-detected data.
Referring to fig. 9 again, the number of vulnerability attribute categories is m; m is a positive integer;
the vulnerability detection module 22 is specifically configured to obtain m vulnerability attribute categories and prediction probabilities corresponding to each vulnerability attribute category, and determine a target detection result from the m vulnerability attribute categories based on the prediction probabilities.
The data processing apparatus 2 further includes:
and the data isolation module 23 is configured to, if the target detection result is an abnormal detection category in the m vulnerability attribute categories, perform isolation processing on the target to-be-detected data corresponding to the abnormal detection category, and send a data abnormal message to the vulnerability management device.
For specific functional implementation manners of the data obtaining module 21, the vulnerability detecting module 22 and the data isolating module 23, reference may be made to steps S201 to S203 in the embodiment corresponding to fig. 6, which is not described herein again.
In the embodiment of the application, the object data of the malicious attribute category can be detected more accurately and more quickly by performing feature extraction and feature selection on the simulated object data of the malicious attribute category. The method and the device can detect the core nodes (namely high-risk vulnerabilities) in the attack paths of the malicious attribute types, can reduce the vulnerability attribute type detection range of a vulnerability detection model, and reduce the maintenance cost. According to the method, classification detection is carried out through a vulnerability detection model algorithm, compared with methods of establishing, accumulating, matching and the like of detection rules based on known flow attacks, the method can detect the object data of unknown vulnerability attribute types more intelligently, and along with parameter adjustment of a vulnerability detection model, the accuracy of vulnerability attribute type detection can be improved. By the method and the device, the incidence relation between the vulnerability data and the vulnerability attribute categories can be determined more accurately, and vulnerability detection efficiency of the vulnerability data is improved.
Further, please refer to fig. 10, where fig. 10 is a schematic structural diagram of a computer device according to an embodiment of the present application. As shown in fig. 10, the computer apparatus 1000 may include: at least one processor 1001, such as a CPU, at least one network interface 1004, a user interface 1003, memory 1005, at least one communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display (Display) and a Keyboard (Keyboard), and the network interface 1004 may optionally include a standard wired interface and a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one disk memory. The memory 1005 may optionally also be at least one storage device located remotely from the aforementioned processor 1001. As shown in fig. 10, the memory 1005, which is one type of computer storage medium, may include an operating system, a network communication module, a user interface module, and a device control application program.
In the computer device 1000 shown in fig. 10, the network interface 1004 may provide a network communication function; the user interface 1003 is an interface for providing a user with input; and the processor 1001 may be used to invoke a device control application stored in the memory 1005 to implement:
acquiring prediction association degrees of the k characteristic dimensions and vulnerability attribute categories respectively, and acquiring N target characteristic dimensions from the k characteristic dimensions based on the prediction association degrees; the prediction relevance is used for representing the change of the corresponding feature dimension and the influence degree on the prediction result of the vulnerability attribute category; acquiring target sample vulnerability characteristics corresponding to target sample vulnerability data under N target characteristic dimensions respectively, inputting the N target sample vulnerability characteristics into an initial vulnerability detection model for prediction, and obtaining a target sample prediction result aiming at vulnerability attribute categories; and acquiring a target sample label of target sample vulnerability data, and performing parameter adjustment on the initial vulnerability detection model based on a target sample prediction result and the target sample label to obtain a vulnerability detection model for vulnerability attribute class detection.
In one embodiment, when obtaining the prediction association degrees of the k feature dimensions and the vulnerability attribute categories respectively, the processor 1001 further performs the following steps:
obtaining d sample vulnerability data and a sample label corresponding to each sample vulnerability data; the d sample vulnerability data comprise target sample vulnerability data; d is a positive integer; acquiring the category proportion of vulnerability attribute categories based on the sample labels respectively corresponding to the d sample vulnerability data, and determining the category information quantity of the vulnerability attribute categories based on the category proportion of the vulnerability attribute categories; determining dimension information quantity of vulnerability attribute categories under the ith characteristic dimension according to the characteristic state under the ith characteristic dimension and the sample labels respectively corresponding to the d sample vulnerability data; i is a positive integer less than or equal to k; the characteristic state under the ith characteristic dimension is used for representing the distribution condition of the characteristics of the d sample vulnerability data under the ith characteristic dimension; and determining the prediction association degree of the ith characteristic dimension and the vulnerability attribute category based on the category information quantity of the vulnerability attribute category and the dimension information quantity of the vulnerability attribute category under the ith characteristic dimension.
In one embodiment, processor 1001 is in the state of including a first feature state and a second feature state; the number of vulnerability attribute categories is m; m is a positive integer;
according to the characteristic state under the ith characteristic dimension and the sample labels respectively corresponding to the d sample vulnerability data, when determining the dimension information quantity of the vulnerability attribute category under the ith characteristic dimension, executing the following steps:
determining the number of first sample tags corresponding to m vulnerability attribute categories respectively based on the sample tags corresponding to d sample vulnerability data respectively in the first characteristic state of the ith characteristic dimension, and determining the first characteristic probability of the m vulnerability attribute types in the first characteristic state of the ith characteristic dimension respectively based on the number of the m first sample tags; determining the number of second sample labels respectively corresponding to m vulnerability attribute categories based on the sample labels respectively corresponding to d sample vulnerability data in the second characteristic state of the ith characteristic dimension, and determining the second characteristic probability of the m vulnerability attribute types in the second characteristic state of the ith characteristic dimension based on the number of the m second sample labels; and performing probability integration processing on the m first characteristic probabilities and the m second characteristic probabilities to obtain dimension information quantity of the vulnerability attribute category under the ith characteristic dimension.
In one embodiment, when obtaining the prediction association degrees of the k feature dimensions and the vulnerability attribute categories respectively, the processor 1001 further performs the following steps:
acquiring k dimensionality characteristics to be detected corresponding to the sample vulnerability data under k characteristic dimensionalities respectively; performing dimension conversion on the k dimension features to be detected to obtain k vulnerability set features respectively corresponding to the k feature dimensions; respectively inputting the k vulnerability set characteristics into an initial vulnerability detection model for prediction to obtain dimension category probabilities of the k vulnerability set characteristics respectively aiming at vulnerability attribute categories; and determining the prediction association degrees of the k characteristic dimensions and the vulnerability attribute categories respectively according to the dimension category probability of the k vulnerability set characteristics respectively aiming at the vulnerability attribute categories and the difference data between the sample labels corresponding to the sample vulnerability data.
In one embodiment, when N target feature dimensions are to be obtained from k feature dimensions based on the predicted relevance, the processor 1001 further performs the following steps:
obtaining a prediction correlation threshold; in the k characteristic dimensions, clustering the characteristic dimensions smaller than the prediction relevance threshold to obtain a first common dimension; among the first commonality dimension and the feature dimension larger than or equal to the prediction relevance threshold, N target feature dimensions are determined.
In an embodiment, when target sample vulnerability characteristics corresponding to target sample vulnerability data under N target characteristic dimensions are to be obtained, the processor 1001 further performs the following steps:
acquiring a sample data type of target sample vulnerability data, and acquiring vulnerability information corresponding to the sample data type; acquiring vulnerability keywords from target sample vulnerability data based on vulnerability information; and under the N target feature dimensions, performing feature extraction processing on the vulnerability keywords to obtain target sample vulnerability features corresponding to the target sample vulnerability data under the N target feature dimensions respectively.
In one embodiment, the processor 1001 obtains target to-be-detected features corresponding to the target to-be-detected data under N target feature dimensions, respectively; the N target feature dimensions are determined from the k feature dimensions based on the prediction relevance of the k feature dimensions and the vulnerability attribute categories respectively; and inputting the target to-be-detected features respectively corresponding to the N target feature dimensions into a vulnerability detection model for vulnerability detection to obtain target detection results corresponding to the target to-be-detected data.
In one embodiment, the processor 1001 sets the number of vulnerability attribute categories to m; m is a positive integer;
inputting the target to-be-detected features respectively corresponding to the N target feature dimensions into a vulnerability detection model for vulnerability detection, and specifically executing the following steps when obtaining a target detection result corresponding to the target to-be-detected data:
the method comprises the steps of obtaining m vulnerability attribute categories and prediction probabilities corresponding to the vulnerability attribute categories, and determining target detection results from the m vulnerability attribute categories based on the prediction probabilities.
In one embodiment, the processor 1001 further performs the following steps:
and if the target detection result is an abnormal detection type in the m vulnerability attribute types, isolating the target to-be-detected data corresponding to the abnormal detection type, and sending a data abnormal message to the vulnerability management equipment.
It should be understood that the computer device 1000 described in this embodiment of the present application may perform the description of the data processing method in the embodiment corresponding to fig. 2a, fig. 2b, fig. 3, fig. 4, fig. 5, fig. 6, fig. 7a, and fig. 7b, may also perform the description of the data processing apparatus 1 in the embodiment corresponding to fig. 8, and may also perform the description of the data processing apparatus 2 in the embodiment corresponding to fig. 9, which is not described herein again. In addition, the beneficial effects of the same method are not described in detail.
An embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, where the computer program includes program instructions, and the program instructions, when executed by a processor, implement the data processing method provided in each step in fig. 2a, fig. 2b, fig. 3, fig. 4, fig. 5, fig. 6, fig. 7a, and fig. 7b, which may specifically refer to the implementation manner provided in each step in fig. 2a, fig. 2b, fig. 3, fig. 4, fig. 5, fig. 6, fig. 7a, and fig. 7b, and are not described herein again. In addition, the beneficial effects of the same method are not described in detail.
The computer readable storage medium may be the data processing apparatus provided in any of the foregoing embodiments or an internal storage unit of the computer device, such as a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Memory Card (SMC), a Secure Digital (SD) card, a flash card (flash card), and the like, provided on the computer device. Further, the computer-readable storage medium may also include both an internal storage unit and an external storage device of the computer device. The computer-readable storage medium is used for storing the computer program and other programs and data required by the computer device. The computer readable storage medium may also be used to temporarily store data that has been output or is to be output.
Embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer instruction from the computer-readable storage medium, and executes the computer instruction, so that the computer device can perform the description of the data processing method in the embodiments corresponding to fig. 2a, fig. 2b, fig. 3, fig. 4, fig. 5, fig. 6, fig. 7a, and fig. 7b, which is not described herein again. In addition, the beneficial effects of the same method are not described in detail.
The term "comprises" and any variations thereof in the description and claims of the embodiments of the present application and in the drawings is intended to cover non-exclusive inclusions. For example, a process, method, apparatus, product, or apparatus that comprises a list of steps or elements is not limited to the listed steps or modules, but may alternatively include other steps or modules not listed or inherent to such process, method, apparatus, product, or apparatus.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The method and the related apparatus provided by the embodiments of the present application are described with reference to the flowchart and/or the structural diagram of the method provided by the embodiments of the present application, and each flow and/or block of the flowchart and/or the structural diagram of the method, and the combination of the flow and/or block in the flowchart and/or the block diagram can be specifically implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block or blocks of the block diagram. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block or blocks of the block diagram. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block or blocks.
The above disclosure is only for the purpose of illustrating the preferred embodiments of the present application and is not to be construed as limiting the scope of the present application, so that the present application is not limited thereto, and all equivalent variations and modifications can be made to the present application.

Claims (12)

1. A data processing method, comprising:
acquiring prediction association degrees of k characteristic dimensions and vulnerability attribute categories respectively, and acquiring N target characteristic dimensions from the k characteristic dimensions based on the prediction association degrees; the prediction relevance is used for representing the change of the corresponding feature dimension and the influence degree of the change on the prediction result of the vulnerability attribute category;
acquiring target sample vulnerability characteristics of target sample vulnerability data respectively corresponding to the N target characteristic dimensions, inputting the N target sample vulnerability characteristics into an initial vulnerability detection model for prediction, and obtaining a target sample prediction result aiming at the vulnerability attribute category;
and acquiring a target sample label of the target sample vulnerability data, and performing parameter adjustment on the initial vulnerability detection model based on the target sample prediction result and the target sample label to obtain a vulnerability detection model for vulnerability attribute type detection.
2. The method according to claim 1, wherein the obtaining of the prediction association degrees of the k feature dimensions and the vulnerability attribute categories respectively comprises:
obtaining d sample vulnerability data and a sample label corresponding to each sample vulnerability data; the d sample vulnerability data comprises the target sample vulnerability data; d is a positive integer;
acquiring the category proportion of the vulnerability attribute categories based on the sample labels respectively corresponding to the d sample vulnerability data, and determining the category information quantity of the vulnerability attribute categories based on the category proportion of the vulnerability attribute categories;
determining dimension information quantity of vulnerability attribute categories under the ith characteristic dimension according to the characteristic state under the ith characteristic dimension and the sample labels respectively corresponding to the d sample vulnerability data; i is a positive integer less than or equal to k; the feature state under the ith feature dimension is used for characterizing the distribution condition of the features of the d sample vulnerability data under the ith feature dimension respectively;
and determining the prediction association degree of the ith characteristic dimension and the vulnerability attribute category based on the category information quantity of the vulnerability attribute category and the dimension information quantity of the vulnerability attribute category under the ith characteristic dimension.
3. The method of claim 2, wherein the feature states comprise a first feature state and a second feature state; the number of the vulnerability attribute categories is m; m is a positive integer; determining the dimension information quantity of the vulnerability attribute category under the ith characteristic dimension according to the characteristic state under the ith characteristic dimension and the sample labels respectively corresponding to the d sample vulnerability data, wherein the step comprises the following steps:
in the first feature state of the ith feature dimension, determining the number of first sample tags corresponding to m vulnerability attribute categories respectively based on the sample tags corresponding to the d sample vulnerability data respectively, and determining first feature probabilities of the m vulnerability attribute types in the first feature state of the ith feature dimension respectively based on the number of the m first sample tags;
in the second feature state of the ith feature dimension, determining the number of second sample tags respectively corresponding to m vulnerability attribute categories based on the sample tags respectively corresponding to the d sample vulnerability data, and determining second feature probabilities of the m vulnerability attribute types respectively in the second feature state of the ith feature dimension based on the number of the m second sample tags;
and performing probability integration processing on the m first characteristic probabilities and the m second characteristic probabilities to obtain dimension information quantity of the vulnerability attribute category under the ith characteristic dimension.
4. The method according to claim 1, wherein the obtaining of the prediction association degrees of the k feature dimensions and the vulnerability attribute categories respectively comprises:
acquiring k dimensions of features to be detected, which correspond to the sample vulnerability data under the k feature dimensions respectively;
performing dimension conversion on the k dimension features to be detected to obtain k vulnerability set features respectively corresponding to the k feature dimensions;
inputting the k vulnerability set characteristics into the initial vulnerability detection model respectively for prediction to obtain dimension category probabilities of the k vulnerability set characteristics aiming at the vulnerability attribute categories respectively;
and determining the prediction association degrees of the k characteristic dimensions and the vulnerability attribute categories respectively according to the dimension category probability of the k vulnerability set characteristics respectively aiming at the vulnerability attribute categories and the difference data between the sample labels corresponding to the sample vulnerability data.
5. The method according to claim 1, wherein the obtaining N target feature dimensions from the k feature dimensions based on the predicted relevance comprises:
obtaining a prediction correlation threshold;
in the k characteristic dimensions, clustering the characteristic dimensions smaller than the prediction relevance threshold to obtain a first common dimension;
determining N target feature dimensions from among the first commonality dimension and feature dimensions greater than or equal to the predicted relevance threshold.
6. The method according to claim 1, wherein the obtaining target sample vulnerability characteristics of the target sample vulnerability data corresponding to the N target characteristic dimensions, respectively, comprises:
acquiring a sample data type of the target sample vulnerability data, and acquiring vulnerability information corresponding to the sample data type;
acquiring vulnerability keywords from the vulnerability data of the target sample based on the vulnerability information;
and under the N target feature dimensions, performing feature extraction processing on the vulnerability keywords to obtain target sample vulnerability features corresponding to the target sample vulnerability data under the N target feature dimensions respectively.
7. A data processing method, comprising:
acquiring target to-be-detected characteristics corresponding to the target to-be-detected data under N target characteristic dimensions respectively; the N target feature dimensions are determined from the k feature dimensions based on the prediction relevance of the k feature dimensions to the vulnerability attribute categories respectively;
and inputting the target to-be-detected features respectively corresponding to the N target feature dimensions into a vulnerability detection model for vulnerability detection to obtain target detection results corresponding to the target to-be-detected data.
8. The method of claim 7, wherein the number of vulnerability attribute categories is m; m is a positive integer; inputting the target to-be-detected features respectively corresponding to the N target feature dimensions into a vulnerability detection model for vulnerability detection to obtain target detection results corresponding to the target to-be-detected data, wherein the vulnerability detection comprises the following steps:
acquiring m vulnerability attribute categories and a prediction probability corresponding to each vulnerability attribute category, and determining the target detection result from the m vulnerability attribute categories based on the prediction probability;
the method further comprises the following steps:
and if the target detection result is an abnormal detection type in the m vulnerability attribute types, isolating the target to-be-detected data corresponding to the abnormal detection type, and sending a data abnormal message to vulnerability management equipment.
9. A data processing apparatus, comprising:
the relevancy obtaining module is used for obtaining the prediction relevancy of the k characteristic dimensions and the vulnerability attribute categories respectively;
a dimension obtaining module, configured to obtain N target feature dimensions from the k feature dimensions based on the prediction association degree; the prediction relevance is used for representing the change of the corresponding feature dimension and the influence degree of the change on the prediction result of the vulnerability attribute category;
the characteristic acquisition module is used for acquiring target sample vulnerability characteristics corresponding to the target sample vulnerability data under the N target characteristic dimensions respectively;
the characteristic input module is used for inputting the vulnerability characteristics of the N target samples into an initial vulnerability detection model for prediction to obtain target sample prediction results aiming at the vulnerability attribute categories;
and the label acquisition module is used for acquiring a target sample label of the target sample vulnerability data, and carrying out parameter adjustment on the initial vulnerability detection model based on the target sample prediction result and the target sample label to obtain a vulnerability detection model for vulnerability attribute type detection.
10. A data processing apparatus, comprising:
the data acquisition module is used for acquiring target to-be-detected characteristics of the target to-be-detected data respectively corresponding to the N target characteristic dimensions; the N target feature dimensions are determined from the k feature dimensions based on the prediction relevance of the k feature dimensions to the vulnerability attribute categories respectively;
and the vulnerability detection module is used for inputting the target to-be-detected features respectively corresponding to the N target feature dimensions into a vulnerability detection model for vulnerability detection to obtain a target detection result corresponding to the target to-be-detected data.
11. A computer device, comprising: a processor, a memory, and a network interface;
the processor is connected to the memory and the network interface, wherein the network interface is configured to provide data communication functions, the memory is configured to store program code, and the processor is configured to call the program code to perform the method of any one of claims 1 to 6 or perform the method of any one of claims 7 to 8.
12. A computer-readable storage medium, in which a computer program is stored which is adapted to be loaded by a processor and to carry out the method of any one of claims 1 to 6 or to carry out the method of any one of claims 7 to 8.
CN202210310556.5A 2022-03-28 2022-03-28 Data processing method, device, equipment and readable storage medium Active CN114422271B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210310556.5A CN114422271B (en) 2022-03-28 2022-03-28 Data processing method, device, equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210310556.5A CN114422271B (en) 2022-03-28 2022-03-28 Data processing method, device, equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN114422271A true CN114422271A (en) 2022-04-29
CN114422271B CN114422271B (en) 2022-07-08

Family

ID=81264033

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210310556.5A Active CN114422271B (en) 2022-03-28 2022-03-28 Data processing method, device, equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN114422271B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115225336A (en) * 2022-06-24 2022-10-21 中国人民解放军国防科技大学 Vulnerability availability calculation method and device for network environment
CN115277198A (en) * 2022-07-27 2022-11-01 西安热工研究院有限公司 Vulnerability detection method and device for industrial control system network and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109255241A (en) * 2018-08-31 2019-01-22 国鼎网络空间安全技术有限公司 Android privilege-escalation leak detection method and system based on machine learning
CN111949994A (en) * 2020-08-19 2020-11-17 北京紫光展锐通信技术有限公司 Vulnerability analysis method and system, electronic device and storage medium
CN113032792A (en) * 2021-04-12 2021-06-25 中国移动通信集团陕西有限公司 System service vulnerability detection method, system, equipment and storage medium
US20210367961A1 (en) * 2020-05-21 2021-11-25 Tenable, Inc. Mapping a vulnerability to a stage of an attack chain taxonomy

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109255241A (en) * 2018-08-31 2019-01-22 国鼎网络空间安全技术有限公司 Android privilege-escalation leak detection method and system based on machine learning
US20210367961A1 (en) * 2020-05-21 2021-11-25 Tenable, Inc. Mapping a vulnerability to a stage of an attack chain taxonomy
CN111949994A (en) * 2020-08-19 2020-11-17 北京紫光展锐通信技术有限公司 Vulnerability analysis method and system, electronic device and storage medium
CN113032792A (en) * 2021-04-12 2021-06-25 中国移动通信集团陕西有限公司 System service vulnerability detection method, system, equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
周钦青: "ACO和SVM选择加权特征的网络攻击监测方法", 《科技通报》, vol. 31, no. 10, 31 October 2015 (2015-10-31), pages 250 - 253 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115225336A (en) * 2022-06-24 2022-10-21 中国人民解放军国防科技大学 Vulnerability availability calculation method and device for network environment
CN115225336B (en) * 2022-06-24 2023-08-08 中国人民解放军国防科技大学 Network environment-oriented vulnerability availability computing method and device
CN115277198A (en) * 2022-07-27 2022-11-01 西安热工研究院有限公司 Vulnerability detection method and device for industrial control system network and storage medium

Also Published As

Publication number Publication date
CN114422271B (en) 2022-07-08

Similar Documents

Publication Publication Date Title
CN114422271B (en) Data processing method, device, equipment and readable storage medium
CN109284606A (en) Data flow anomaly detection system based on empirical characteristics and convolutional neural network
WO2011032094A1 (en) Extracting information from unstructured data and mapping the information to a structured schema using the naive bayesian probability model
CN105516196A (en) HTTP message data-based parallelization network anomaly detection method and system
CN111538929A (en) Network link identification method and device, storage medium and electronic equipment
CN114422211B (en) HTTP malicious traffic detection method and device based on graph attention network
CN112468520A (en) Data detection method, device and equipment and readable storage medium
CN113704328B (en) User behavior big data mining method and system based on artificial intelligence
CN114330966A (en) Risk prediction method, device, equipment and readable storage medium
CN115080756A (en) Attack and defense behavior and space-time information extraction method oriented to threat information map
CN116827656A (en) Network information safety protection system and method thereof
CN115378619A (en) Sensitive data access method, electronic equipment and computer readable storage medium
Chen et al. Using adversarial examples to bypass deep learning based url detection system
CN117454380B (en) Malicious software detection method, training method, device, equipment and medium
CN117729003A (en) Threat information credibility analysis system and method based on machine learning
CN116633804A (en) Modeling method, protection method and related equipment of network flow detection model
CN113626817B (en) Malicious code family classification method
CN115935358A (en) Malicious software identification method and device, electronic equipment and storage medium
CN114826628A (en) Data processing method and device, computer equipment and storage medium
CN114328818A (en) Text corpus processing method and device, storage medium and electronic equipment
CN115410201A (en) Method, device and related equipment for processing verification code characters
CN114938285B (en) Data security identification method and storage medium
CN115913688B (en) Network data security monitoring method, device, equipment and storage medium
US11907658B2 (en) User-agent anomaly detection using sentence embedding
CN118626982A (en) Multi-mode anomaly detection method and system for large data network flow

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant