CN116108428B - Software online upgrading method and system based on information security big data - Google Patents

Software online upgrading method and system based on information security big data Download PDF

Info

Publication number
CN116108428B
CN116108428B CN202310008447.2A CN202310008447A CN116108428B CN 116108428 B CN116108428 B CN 116108428B CN 202310008447 A CN202310008447 A CN 202310008447A CN 116108428 B CN116108428 B CN 116108428B
Authority
CN
China
Prior art keywords
software
training sample
vulnerability
upgrading
sample set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310008447.2A
Other languages
Chinese (zh)
Other versions
CN116108428A (en
Inventor
秦潇健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Mc Science And Technology Co ltd
Original Assignee
Guangdong Mc Science And Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Mc Science And Technology Co ltd filed Critical Guangdong Mc Science And Technology Co ltd
Priority to CN202310008447.2A priority Critical patent/CN116108428B/en
Publication of CN116108428A publication Critical patent/CN116108428A/en
Application granted granted Critical
Publication of CN116108428B publication Critical patent/CN116108428B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/52Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity ; Preventing unwanted data erasure; Buffer overflow
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application provides a software online upgrading method and system based on information security big data, firstly, word segmentation processing is carried out on a target character string, and a continuous character string is split into a plurality of independent word characters; performing stop word filtering on a plurality of independent word characters, and extracting variants or derivatives of the word characters subjected to the stop word filtering as stems or roots; mapping the stem or root word into a vector space as a training sample; labeling the training samples, classifying the training samples, and obtaining a positive training sample and a negative training sample; performing supervised learning training to generate a software online upgrading vulnerability classification model; and performing vulnerability detection on the target software subjected to online upgrading by using the software online upgrading vulnerability classification model, and determining whether the target software has vulnerabilities. Aiming at the characteristics of the software upgrading loopholes, the application provides a loophole detection scheme based on software upgrading behavior analysis, and the loopholes can be detected when the software is upgraded online.

Description

Software online upgrading method and system based on information security big data
Technical Field
The invention relates to the technical field of information security, in particular to a software online upgrading method and system based on information security big data.
Background
With the development of the Internet and software engineering technology, software has been extended to the aspects of our work and life, and has become an indispensable element for normal operation of society. The software brings great information security risk challenges while bringing convenience to the work and life of people. The rapid development of the software industry has led to a growing complexity of the software supply chain, which has led to a series of security problems.
The online upgrade of the software is to download and update files such as programs of the software, and the upgrade has important significance on version change of the software, BUG repair, functional perfection and the like, and is also an important ring of a software supply chain. However, due to the lack of sufficient security and cryptographic knowledge of a significant portion of software developers, the lack of knowledge of the security mechanisms of the upgrade causes a series of security vulnerabilities in the online upgrade process of the software, resulting in a high risk of the software being attacked during the upgrade process.
The software upgrade vulnerability refers to a security defect that upgrade communication may be hijacked to be embedded with malicious programs due to lack of authentication or encryption protection of the communication process or upgrade package in the online upgrade process of software. In the recent APT attack, a hacker controls a large number of servers and hosts by using the online upgrade process of the software to upgrade the vulnerability hijack software, which causes great damage to users. Hijacking attack for online upgrading of software has become one of the most threatening means for affecting network security.
At present, because developers lack enough security and cryptography knowledge, security holes for online upgrade are ubiquitous in common software, and a large number of common software exist. However, the detection research on the software upgrade loopholes is still in a starting stage, and particularly, the rapid detection of batch software upgrade loopholes faces a great challenge. Firstly, the reasons for the software upgrading hijacking loopholes are various, some software lacks a password protection mechanism, some software protection mechanisms are unreasonably designed, and some software developers have programming loopholes when realizing the protection mechanisms; secondly, the design complexity and the functional diversity of the software are improved, the safety of the upgrading function is analyzed in the complex software, and the requirements on the reverse analysis and the cryptology knowledge of the analyst are high; thirdly, the existing method is concentrated on the aspects of manual software debugging reverse analysis, upgrading network flow analysis and the like, has low efficiency, and cannot realize automatic analysis on upgrading safety of certain batches of software.
Disclosure of Invention
In view of the above drawbacks of the prior art, the present invention is directed to providing a method and a system for online upgrading software based on information security big data, which are used for solving the problems existing in the prior art.
In order to achieve the above and other related objects, the present invention provides a software online upgrade method based on information security big data, comprising the following steps:
acquiring a target character string, wherein the target character string comprises a character string in the online upgrading process of software;
word segmentation processing is carried out on the target character string, and the continuous character string is split into a plurality of independent word characters;
performing stop word filtering on the plurality of independent word characters, and extracting variants or derivatives of the word characters subjected to the stop word filtering as stems; or extracting the variant or derivative of the word characters after the filtering of the stop words is completed as the root words;
mapping the stem or the root into a vector space to serve as a training sample of an online upgrading vulnerability classification model of the online upgrading software of the software;
labeling the training samples, classifying the training samples based on labeling results, and obtaining positive training samples and negative training samples;
performing supervised learning training based on the positive training sample and the negative training sample to generate a software online upgrading vulnerability classification model;
and performing vulnerability detection on the target software subjected to online upgrading by using the software online upgrading vulnerability classification model, and determining whether the target software has vulnerabilities.
Optionally, performing supervised learning training based on the positive training sample and the negative training sample, and generating the software online upgrade vulnerability classification model includes:
acquiring a first positive training sample set P and an unlabeled training sample set U;
randomly taking out part of samples from the first positive training sample set P as spy samples S, taking all the remaining positive training samples in the first positive training sample set P as a second positive training sample set Ps, and marking each positive training sample in the second positive training sample set Ps as 1;
adding the spy sample S into the unlabeled training sample set U, and marking each training sample in the unlabeled training sample set added with the spy sample as-1 to obtain a negative training sample set Us;
training a classifier by using the second positive training sample set Ps and the negative training sample set Us, classifying each sample in the unlabeled training sample set by using the trained classifier, and obtaining the probability that each sample in the unlabeled training sample set is positive, and marking the probability as a first probability;
the first probability is weighted and summed to obtain the probability that all samples in the unlabeled training sample set are positive, and the probability is recorded as a second probability;
And comparing the second probability with a preset probability threshold, and combining based on a classifier at the current moment and corresponding classification parameters when the second probability is larger than the preset probability threshold to generate a software online upgrading vulnerability classification model.
Optionally, when training a classifier using the second positive training sample set Ps and the negative training sample set Us, the method further comprises:
setting the loss function as the hinge loss, the positive sample classification error loss objective function F 1 The method comprises the following steps:
labeling unlabeled samples, and setting a negative sample classification error loss objective function F 2 The method comprises the following steps:
wherein C is + Penalty factor representing positive classification error, C - A penalty factor representing a negative classification error;
y i a label representing an unlabeled specimen;
x i representing the feature vector;
w and b represent natural constants.
Optionally, before performing vulnerability detection on the target software subjected to online upgrade by using the software online upgrade vulnerability classification model, the method further includes:
traversing the software catalogue of the target software and inputting an executable file path of the target software;
extracting semantic information from the software catalogue and the executable file path, and preprocessing the extracted semantic information;
Upgrading semantic recognition is carried out on the preprocessed semantic information by utilizing the software online upgrading vulnerability classification model, and whether traversing of the software catalogue is completed is judged;
if so, extracting a semantic call tree and calculating a positioning function;
if not, continuing traversing the software catalogue of the target software.
Optionally, the process of performing vulnerability detection on the target software subjected to online upgrade by using the software online upgrade vulnerability classification model, where determining whether the target software has a vulnerability includes:
defining the software online upgrade vulnerability classification model as V= { C, P }; in the formula, the communication line safety index is upgraded for software, and the safety index is checked for an upgrade package; wherein 1 represents security and 0 represents risk of attack;
when v= {1,1}, it indicates that the target software is very secure and no vulnerability exists;
when v= {1,0} or v= {0,1} it indicates that the target software has security risk but no vulnerability;
when v= {0,0}, it indicates that the target software is very unsafe and has a vulnerability.
Optionally, when the target software has a vulnerability, the method further includes:
acquiring a software behavior chain S, upgrading network communication and upgrading packet check function sequence f which are invoked in the online upgrading process of software 1 ,f 2 ,...,f n Wherein f i (i=1, 2,., n) represents the i-th action for which the target software upgrade behaviour is recorded;
will f i Defined as the triplet Addr, name, parameters; addr represents a function address, name represents a function Name, and parameters represent key parameters of the function;
and determining the vulnerability position of the target software by utilizing the software behavior chain S check function address, the function name and the function key parameters.
Optionally, when the target software has a vulnerability, the method further includes:
checking the upgrade package text by adopting a hash algorithm, and determining whether the upgrade package text is sent through a plaintext;
if the upgrade package text is transmitted through a plaintext, determining that the target software has a transmission vulnerability;
if the upgrade edition text does not pass through the plaintext transmission, carrying out key verification on the upgrade edition text; if the upgrade package text does not have the key, determining that the target software has key leakage; and if the upgrade text has a secret key, determining that the target software has a transmission vulnerability.
The application also provides a software online upgrading system based on the information security big data, which comprises the following steps:
the character string module is used for acquiring a target character string, wherein the target character string comprises a character string in the online upgrading process of the software;
The word segmentation module is used for carrying out word segmentation processing on the target character string and splitting the continuous character string into a plurality of independent word characters;
the filtering module is used for filtering the stop words of the plurality of independent word characters and extracting variants or derivatives of the word characters subjected to the stop word filtering as stems; or extracting the variant or derivative of the word characters after the filtering of the stop words is completed as the root words;
the training sample module is used for mapping the stem or the root into a vector space and is used as a training sample of the online upgrading vulnerability classification model of the online upgrading software of the software;
the labeling module is used for labeling the training samples and classifying the training samples based on labeling results to obtain positive training samples and negative training samples;
the training module is used for performing supervised learning training based on the positive training sample and the negative training sample to generate a software online upgrading vulnerability classification model;
and the detection module is used for carrying out vulnerability detection on the target software subjected to online upgrading by utilizing the software online upgrading vulnerability classification model and determining whether the target software has vulnerabilities.
Optionally, performing supervised learning training based on the positive training sample and the negative training sample, and generating the software online upgrade vulnerability classification model includes:
Acquiring a first positive training sample set P and an unlabeled training sample set U;
randomly taking out part of samples from the first positive training sample set P as spy samples S, taking all the remaining positive training samples in the first positive training sample set P as a second positive training sample set Ps, and marking each positive training sample in the second positive training sample set Ps as 1;
adding the spy sample S into the unlabeled training sample set U, and marking each training sample in the unlabeled training sample set added with the spy sample as-1 to obtain a negative training sample set Us;
training a classifier by using the second positive training sample set Ps and the negative training sample set Us, classifying each sample in the unlabeled training sample set by using the trained classifier, and obtaining the probability that each sample in the unlabeled training sample set is positive, and marking the probability as a first probability;
the first probability is weighted and summed to obtain the probability that all samples in the unlabeled training sample set are positive, and the probability is recorded as a second probability;
and comparing the second probability with a preset probability threshold, and combining based on a classifier at the current moment and corresponding classification parameters when the second probability is larger than the preset probability threshold to generate a software online upgrading vulnerability classification model.
Optionally, when training a classifier using the second positive training sample set Ps and the negative training sample set Us, the method further comprises:
setting the loss function as the hinge loss, the positive sample classification error loss objective function F 1 The method comprises the following steps:
labeling unlabeled samples, and setting a negative sample classification error loss objective function F 2 The method comprises the following steps:
wherein C is + Penalty factor representing positive classification error, C - A penalty factor representing a negative classification error;
y i a label representing an unlabeled specimen;
x i representing the feature vector;
w and b represent natural constants.
As described above, the application provides a software online upgrading method and system based on information security big data, which has the following beneficial effects:
firstly, acquiring a target character string, wherein the target character string comprises a character string in the online upgrading process of software; then word segmentation processing is carried out on the target character string, and the continuous character string is split into a plurality of independent word characters; performing stop word filtering on the plurality of independent word characters, and extracting variants or derivatives of the word characters subjected to the stop word filtering as stems; or extracting the variant or derivative of the word characters after the filtering of the stop words is completed as the root words; then mapping the stem or the root into a vector space to be used as a training sample of an online upgrading vulnerability classification model of the online upgrading software of the software; secondly, marking the training samples, and classifying the training samples based on marking results to obtain positive training samples and negative training samples; finally, performing supervised learning training based on the positive training sample and the negative training sample to generate a software online upgrading vulnerability classification model; and performing vulnerability detection on the target software subjected to online upgrading by using the software online upgrading vulnerability classification model, and determining whether the target software has vulnerabilities. Therefore, aiming at the characteristics of the software upgrading loopholes, the application provides a loophole detection scheme based on software upgrading behavior analysis, and the loophole detection can be carried out when the software is upgraded online. By collecting and sorting the network communication and password related API functions and parameters thereof, behavior nodes are sorted into XML files, and detection rules based on upgrade behavior chain loopholes are described. On the basis, the dynamic and static combined program analysis method is utilized to extract the software upgrading behaviors, static analysis is the primary analysis method, on the basis of the reverse positioning of the software upgrading functions, IDA Python scripts are utilized to extract the upgrading behavior nodes, and the safety is judged. The dynamic analysis is to make up the static dynamic analysis, mainly use the binary dynamic plug-in method, track the software upgrading flow, extract the behavior node of the software upgrading, use the analysis method of the data flow if necessary, extract the behavior basis of upgrading, judge the security according to the model of judging of the leak. The detection method can automatically detect the upgrade behavior loopholes, and greatly improves the detection efficiency under the condition of ensuring the reliability.
Drawings
FIG. 1 is a flow chart of an online upgrade method for software based on information security big data according to an embodiment;
FIG. 2 is a schematic diagram of a framework for online upgrade of software according to one embodiment;
fig. 3 is a schematic hardware structure diagram of a software online upgrade system based on information security big data according to an embodiment.
Detailed Description
Referring to fig. 1 and 2, the present application provides a software online upgrade method based on information security big data, comprising:
acquiring a target character string, wherein the target character string comprises a character string in the online upgrading process of software;
and performing word segmentation processing on the target character string, and splitting the continuous character string into a plurality of independent word characters. Specifically, the word segmentation operation is to split a continuous string of characters into several independent words or characters. Semantic information such as character strings or function names in binary programs is often not a canonical english writing. Some are not complete words, some are not separated by spaces, etc., such as the function name "CheckUpdate" should be divided into two words "check" and "update", and the string "update_found_new_version" should be divided into four english words "update", "found", "new" and "version". Each word represents a text classification dimension for classification training. The application can divide the combined character strings into small words and remove special symbols by means of the English word segmentation tool wordinja of the Python interface.
Performing stop word filtering on the plurality of independent word characters, and extracting variants or derivatives of the word characters subjected to the stop word filtering as stems; or extracting the variant or derivative of the word character after the filtering of the stop word as the root word. In particular, in english text, some common words are used quite frequently, and these words are not helpful to analyze semantic information, but rather affect the classification effect, so they need to be preprocessed. Numbers, definite articles and the like in semantic information in the binary program belong to stop words. The application deletes the stop words in the software semantic information by means of the natural language processing tool NLTK. Experiments prove that the removal of stop words can improve the upgrading semantic recognition accuracy. Stem extraction is a process of extracting variants or derivatives of english words into stem or root forms: for example, "updating" and "updated" are both semantic factors related to the upgrade, and stem extraction may transform them together into a common stem "update". For the related semantic information of the software upgrading, word stem extraction can convert words with different word shapes into word roots, so that the learning dimension is reduced, semantic factors related to the online upgrading function of the software are focused, and the text classification learning accuracy is improved.
Mapping the stem or the root into a vector space to serve as a training sample of an online upgrading vulnerability classification model of the online upgrading software of the software; specifically, the word after preprocessing in the sample library is mapped into the vector space, so that the calculation of the semantic similarity can be facilitated. In order to automatically identify semantic information updated on line in a software program, the application utilizes word embedding technology in natural language processing to convert the preprocessed semantic information into word vectors for classification training. The application utilizes the most commonly used word vector model, which is a word2vec word vector model proposed by Mikolov et al, and is a continuous skip-gram model. The basic idea behind word2vec is that words that appear in similar contexts map to similar vectors, and all dimensions are equally important, suitable for inferring software upgrade-related semantic inference recognition.
Labeling the training samples, classifying the training samples based on labeling results, and obtaining positive training samples and negative training samples;
performing supervised learning training based on the positive training sample and the negative training sample to generate a software online upgrading vulnerability classification model; specifically, PU Learning is a special case of semi-supervised Learning. It is used to deal with the problem of learning training with positive and unlabeled samples in the absence of negative samples. PU Learning classifier creation generally includes Direct and Two-step classification (Two-step classification). Through experimental comparison, the classification effect of the two-step classification method is better when the number of positive samples is limited, and the two-step classification method is selected because the number of manually obtained positive samples is limited. A first stage of screening reliable negative samples from unlabeled samples; in the second stage, the traditional supervision model is trained by using the positive samples and the screened negative samples, and new samples are further predicted. In order to improve accuracy, the application selects a Support Vector Machine (SVM) model to mark whether the semantic information of the software is relevant to online upgrade. The SVM model learns a hyperplane and divides the positive and negative vector sample sets so as to achieve the effect of text classification. After training, the generated software online upgrade vulnerability classification model can be used for classifying unlabeled samples, and given a semantic information feature vector in one piece of software, the classifier predicts whether the label is 1 or-1, and indicates whether the information classification result is positive correlation or negative correlation. Given a piece of software, traversing all executable files under the directory, extracting all semantic information by using an IDA Python script, and forming a feature vector after natural language preprocessing except an API function, wherein the feature vector is marked as V (S). V (S) is input into the classifier, and semantic information with a tag of 1 is automatically identified. The API function is also important semantic information in software upgrade, and in order to determine whether an API function is related to an upgrade function more accurately, another important semantic information, i.e., parameters of the API function, needs to be acquired. An IDA plug-in named argtatter developed by the fire eye company can extract parameters of API functions under static conditions. For example, the third parameter of the parameter WinHttpOpenRequest of the HTTP connection API function is the URL of the website, and if there is an "Update" related string, the description is the URL of the online upgrade, it can be determined as the upgrade related factor. The method of the application inputs the extracted parameters into the upgrade semantic classifier to identify the API function related to the upgrade.
And performing vulnerability detection on the target software subjected to online upgrading by using the software online upgrading vulnerability classification model, and determining whether the target software has vulnerabilities.
Therefore, aiming at the characteristics of the software upgrading vulnerability, the embodiment provides a vulnerability detection scheme based on software upgrading behavior analysis, and can detect the vulnerability during online software upgrading. By collecting and sorting the network communication and password related API functions and parameters thereof, behavior nodes are sorted into XML files, and detection rules based on upgrade behavior chain loopholes are described. On the basis, the dynamic and static combined program analysis method is utilized to extract the software upgrading behaviors, static analysis is the primary analysis method, on the basis of the reverse positioning of the software upgrading functions, IDA Python scripts are utilized to extract the upgrading behavior nodes, and the safety is judged. The dynamic analysis is to make up the static dynamic analysis, mainly use the binary dynamic plug-in method, track the software upgrading flow, extract the behavior node of the software upgrading, use the analysis method of the data flow if necessary, extract the behavior basis of upgrading, judge the security according to the model of judging of the leak. The detection method can automatically detect the upgrade behavior loopholes, and greatly improves the detection efficiency under the condition of ensuring the reliability.
According to the above description, in an exemplary embodiment, the process of generating the software online upgrade vulnerability classification model based on the positive training sample and the negative training sample includes:
acquiring a first positive training sample set P and an unlabeled training sample set U;
randomly taking out part of samples from the first positive training sample set P as spy samples S, taking all the remaining positive training samples in the first positive training sample set P as a second positive training sample set Ps, and marking each positive training sample in the second positive training sample set Ps as 1;
adding the spy sample S into the unlabeled training sample set U, and marking each training sample in the unlabeled training sample set added with the spy sample as-1 to obtain a negative training sample set Us;
training a classifier by using the second positive training sample set Ps and the negative training sample set Us, classifying each sample in the unlabeled training sample set by using the trained classifier, and obtaining the probability that each sample in the unlabeled training sample set is positive, and marking the probability as a first probability;
the first probability is weighted and summed to obtain the probability that all samples in the unlabeled training sample set are positive, and the probability is recorded as a second probability;
And comparing the second probability with a preset probability threshold, and combining based on a classifier at the current moment and corresponding classification parameters when the second probability is larger than the preset probability threshold to generate a software online upgrading vulnerability classification model.
Specifically, when training the classifier using the second positive training sample set Ps and the negative training sample set Us, it may further include:
in class training, there are many strategies that can be used to help improve PU Learning. To further improve classification accuracy, accurately identify online upgrade semantic information, a cost-sensitive classification strategy is used herein to set the loss function to fitPage loss, then positive sample classification error loss objective function F 1 The method comprises the following steps:
labeling unlabeled samples, and setting a negative sample classification error loss objective function F 2 The method comprises the following steps:
wherein C is + Penalty factor representing positive classification error, C - A penalty factor representing a negative classification error;
y i a label representing an unlabeled specimen;
x i representing the feature vector;
w and b represent natural constants.
Therefore, a group of reliable negative samples and an original positive sample set are obtained through PU Learning training, so that a semantic information sample set is formed. Then, preprocessing is carried out by using NLP, a semantic information sample set is converted into a group of word vector training sample set, then positive and negative samples are used for constructing a group of feature vectors, the feature vectors are input into an SVM classifier, and an upgrading semantic classifier is trained and is specially used for automatic classification and identification of online upgrading semantic information of software. In actual work, SVM iterative training is used for classifying the residual unidentified sample U, the separated negative sample is added into the negative sample for training, a classifier is trained iteratively, and classification accuracy is observed through classifying the test set.
In an exemplary embodiment, before the vulnerability detection of the target software for online upgrade by using the software online upgrade vulnerability classification model, the method may further include: traversing the software catalogue of the target software and inputting an executable file path of the target software; extracting semantic information from the software catalogue and the executable file path, and preprocessing the extracted semantic information; upgrading semantic recognition is carried out on the preprocessed semantic information by utilizing the software online upgrading vulnerability classification model, and whether traversing of the software catalogue is completed is judged; if so, extracting a semantic call tree and calculating a positioning function; if not, continuing traversing the software catalogue of the target software. By analyzing the first two stages, determining a program where an upgrading function is located, identifying a batch of semantic factors related to upgrading in the program, and acquiring a cross-reference chain of each semantic factor through IDA Python. The function call relationship may be represented by a function relationship call tree. The semantic information of the four stages of online upgrading of the software can be identified through analysis of the software in the first two stages, and virtual memory addresses of the software in the program can be obtained. As an example, the software checks the semantic information "check update", "app_version_build=", "http:// upmobilev. Qq.com" of version information, e.g. based on keyword matching, the virtual memory addresses of these three pieces of semantic information are all called by the function sub_42F324, it can be inferred that the sub_42F324 function is a check version information module. The character string "StartDownloadTask" is related to downloading the upgrade package, and is called by the function sub_4226d4 and sub_4226d4 is called by the function sub_42f324 after analysis. The function of the final inferred sub_42F324 function is to check version information and download software update packages.
In an exemplary embodiment, the process of performing vulnerability detection on the target software subjected to online upgrade by using the software online upgrade vulnerability classification model to determine whether the target software has a vulnerability includes: defining the software online upgrade vulnerability classification model as V= { C, P }; in the formula, the communication line safety index is upgraded for software, and the safety index is checked for an upgrade package; wherein 1 represents security and 0 represents risk of attack; when v= {1,1}, it indicates that the target software is very secure and no vulnerability exists; when (when)
When v= {1,0} or v= {0,1}, it indicates that the target software has security risk, but has no vulnerability; when v= {0,0}, it indicates that the target software is very unsafe and has a vulnerability.
In an exemplary embodiment, according to the aboveWhen the vulnerability exists in the target software, the method can further include: acquiring a software behavior chain S, upgrading network communication and upgrading packet check function sequence f which are invoked in the online upgrading process of software 1 ,f 2 ,...,f n Wherein f i (i=1, 2,., n) represents the i-th action for which the target software upgrade behaviour is recorded; will f i Defined as the triplet Addr, name, parameters; addr represents a function address, name represents a function Name, and parameters represent key parameters of the function; and determining the vulnerability position of the target software by utilizing the software behavior chain S check function address, the function name and the function key parameters.
In an exemplary embodiment, when the target software has a bug, the method may further include: checking the upgrade package text by adopting a hash algorithm, and determining whether the upgrade package text is sent through a plaintext; if the upgrade package text is transmitted through a plaintext, determining that the target software has a transmission vulnerability; if the upgrade edition text does not pass through the plaintext transmission, carrying out key verification on the upgrade edition text; if the upgrade package text does not have the key, determining that the target software has key leakage; and if the upgrade text has a secret key, determining that the target software has a transmission vulnerability. In this embodiment, to ensure the integrity of the upgrade package text, many software will use HSAH algorithm to verify the upgrade package text. If the check value is sent through the plaintext, HASH check misuse holes are caused. In addition, the key factor of encryption and decryption security is the security of the secret key, and the security of the information can be ensured by the secure secret key. However, part of the software uses a fixed key during the upgrade process, or the key is calculated by some fixed algorithm, which can be obtained by reverse analysis of the software. In essence, for both the server and the client, only the keys transmitted over the SSL channel are secure keys, i.e., the keys transmitted over the HTTPS channel. And if the solved plaintext needs to be compared and checked, the transmission of the check value also needs to be transmitted through an HTTPS channel.
In summary, the present application provides a software online upgrade method based on information security big data, which includes firstly obtaining a target character string, wherein the target character string includes a character string in a software online upgrade process; then word segmentation processing is carried out on the target character string, and the continuous character string is split into a plurality of independent word characters; performing stop word filtering on the plurality of independent word characters, and extracting variants or derivatives of the word characters subjected to the stop word filtering as stems; or extracting the variant or derivative of the word characters after the filtering of the stop words is completed as the root words; then mapping the stem or the root into a vector space to be used as a training sample of an online upgrading vulnerability classification model of the online upgrading software of the software; secondly, marking the training samples, and classifying the training samples based on marking results to obtain positive training samples and negative training samples; finally, performing supervised learning training based on the positive training sample and the negative training sample to generate a software online upgrading vulnerability classification model; and performing vulnerability detection on the target software subjected to online upgrading by using the software online upgrading vulnerability classification model, and determining whether the target software has vulnerabilities. Therefore, aiming at the characteristics of the software upgrading loopholes, the application provides a loophole detection scheme based on software upgrading behavior analysis, and the loophole detection can be carried out when the software is upgraded online. By collecting and sorting the network communication and password related API functions and parameters thereof, behavior nodes are sorted into XML files, and detection rules based on upgrade behavior chain loopholes are described. On the basis, the dynamic and static combined program analysis method is utilized to extract the software upgrading behaviors, static analysis is the primary analysis method, on the basis of the reverse positioning of the software upgrading functions, IDA Python scripts are utilized to extract the upgrading behavior nodes, and the safety is judged. The dynamic analysis is to make up the static dynamic analysis, mainly use the binary dynamic plug-in method, track the software upgrading flow, extract the behavior node of the software upgrading, use the analysis method of the data flow if necessary, extract the behavior basis of upgrading, judge the security according to the model of judging of the leak. The detection method can automatically detect the upgrade behavior loopholes, and greatly improves the detection efficiency under the condition of ensuring the reliability. The method and the device can firstly define the software upgrading behavior chain and describe the program behavior for finishing the software upgrading. Based on the general flow of software upgrading, the important focus of this chapter is to upgrade communication behaviors and encryption and decryption behaviors, and API functions used for correspondingly realizing the behaviors, and track information such as parameters related to the behaviors. Secondly, an upgrade behavior chain is extracted by adopting a static analysis method, and the method is a primary software upgrade vulnerability detection method. The static analysis can extract the related behavior chains of the binary program under the condition of not starting the software upgrade, thereby greatly improving the detection speed. On the basis of reverse positioning of the upgrading function, by means of an IDA Python plug-in, a control flow diagram of the software upgrading action can be locally extracted under the static condition, fixed parameters of related API functions are extracted, and the upgrading action can be judged through extraction and matching of the parameters, for example, whether a communication protocol belongs to HTTP or HTTPS, whether a fixed key is used or not can be judged. Again, dynamic analysis methods were employed as a complement to static analysis methods. Because static analysis spreads and analyzes the software under the condition that the software upgrading behavior is not started, the static analysis has own limitations, and the dynamic behavior of partial software upgrading cannot be extracted or is not extracted completely, so that the upgrading security cannot be judged completely. The chapter adopts a dynamic binary instrumentation method to track the upgrading process and further extracts a software upgrading behavior chain. And analyzing the relationship of the upgrade check value or the key data stream between partial upgrade behavior chains by adopting a data stream analysis method, so as to perfect the upgrade behavior chains. Finally, according to the upgrade vulnerability classification model established in the second chapter, vulnerability detection rules for matching are established, wherein the rules comprise software upgrade communication vulnerability detection rules and upgrade package verification vulnerability detection rules.
As shown in fig. 3, the present application further provides a software online upgrade system based on information security big data, which includes:
the character string module is used for acquiring a target character string, wherein the target character string comprises a character string in the online upgrading process of the software;
the word segmentation module is used for carrying out word segmentation processing on the target character string and splitting the continuous character string into a plurality of independent word characters; specifically, the word segmentation operation is to split a continuous string of characters into several independent words or characters. Semantic information such as character strings or function names in binary programs is often not a canonical english writing. Some are not complete words, some are not separated by spaces, etc., such as the function name "CheckUpdate" should be divided into two words "check" and "update", and the string "update_found_new_version" should be divided into four english words "update", "found", "new" and "version". Each word represents a text classification dimension for classification training. The application can divide the combined character strings into small words and remove special symbols by means of the English word segmentation tool wordinja of the Python interface.
The filtering module is used for filtering the stop words of the plurality of independent word characters and extracting variants or derivatives of the word characters subjected to the stop word filtering as stems; or extracting the variant or derivative of the word characters after the filtering of the stop words is completed as the root words; in particular, in english text, some common words are used quite frequently, and these words are not helpful to analyze semantic information, but rather affect the classification effect, so they need to be preprocessed. Numbers, definite articles and the like in semantic information in the binary program belong to stop words. The application deletes the stop words in the software semantic information by means of the natural language processing tool NLTK. Experiments prove that the removal of stop words can improve the upgrading semantic recognition accuracy. Stem extraction is a process of extracting variants or derivatives of english words into stem or root forms: for example, "updating" and "updated" are both semantic factors related to the upgrade, and stem extraction may transform them together into a common stem "update". For the related semantic information of the software upgrading, word stem extraction can convert words with different word shapes into word roots, so that the learning dimension is reduced, semantic factors related to the online upgrading function of the software are focused, and the text classification learning accuracy is improved.
The training sample module is used for mapping the stem or the root into a vector space and is used as a training sample of the online upgrading vulnerability classification model of the online upgrading software of the software; specifically, the word after preprocessing in the sample library is mapped into the vector space, so that the calculation of the semantic similarity can be facilitated. In order to automatically identify semantic information updated on line in a software program, the application utilizes word embedding technology in natural language processing to convert the preprocessed semantic information into word vectors for classification training. The application utilizes the most commonly used word vector model, which is a word2vec word vector model proposed by Mikolov et al, and is a continuous skip-gram model. The basic idea behind word2vec is that words that appear in similar contexts map to similar vectors, and all dimensions are equally important, suitable for inferring software upgrade-related semantic inference recognition.
The labeling module is used for labeling the training samples and classifying the training samples based on labeling results to obtain positive training samples and negative training samples;
the training module is used for performing supervised learning training based on the positive training sample and the negative training sample to generate a software online upgrading vulnerability classification model; specifically, PU Learning is a special case of semi-supervised Learning. It is used to deal with the problem of learning training with positive and unlabeled samples in the absence of negative samples. PU Learning classifier creation generally includes Direct and Two-step classification (Two-step classification). Through experimental comparison, the classification effect of the two-step classification method is better when the number of positive samples is limited, and the two-step classification method is selected because the number of manually obtained positive samples is limited. A first stage of screening reliable negative samples from unlabeled samples; in the second stage, the traditional supervision model is trained by using the positive samples and the screened negative samples, and new samples are further predicted. In order to improve accuracy, the application selects a Support Vector Machine (SVM) model to mark whether the semantic information of the software is relevant to online upgrade. The SVM model learns a hyperplane and divides the positive and negative vector sample sets so as to achieve the effect of text classification. After training, the generated software online upgrade vulnerability classification model can be used for classifying unlabeled samples, and given a semantic information feature vector in one piece of software, the classifier predicts whether the label is 1 or-1, and indicates whether the information classification result is positive correlation or negative correlation. Given a piece of software, traversing all executable files under the directory, extracting all semantic information by using an IDA Python script, and forming a feature vector after natural language preprocessing except an API function, wherein the feature vector is marked as V (S). V (S) is input into the classifier, and semantic information with a tag of 1 is automatically identified. The API function is also important semantic information in software upgrade, and in order to determine whether an API function is related to an upgrade function more accurately, another important semantic information, i.e., parameters of the API function, needs to be acquired. An IDA plug-in named argtatter developed by the fire eye company can extract parameters of API functions under static conditions. For example, the third parameter of the parameter WinHttpOpenRequest of the HTTP connection API function is the URL of the website, and if there is an "Update" related string, the description is the URL of the online upgrade, it can be determined as the upgrade related factor. The method of the application inputs the extracted parameters into the upgrade semantic classifier to identify the API function related to the upgrade.
And the detection module is used for carrying out vulnerability detection on the target software subjected to online upgrading by utilizing the software online upgrading vulnerability classification model and determining whether the target software has vulnerabilities.
Therefore, aiming at the characteristics of the software upgrading vulnerability, the embodiment provides a vulnerability detection scheme based on software upgrading behavior analysis, and can detect the vulnerability during online software upgrading. By collecting and sorting the network communication and password related API functions and parameters thereof, behavior nodes are sorted into XML files, and detection rules based on upgrade behavior chain loopholes are described. On the basis, the dynamic and static combined program analysis method is utilized to extract the software upgrading behaviors, static analysis is the primary analysis method, on the basis of the reverse positioning of the software upgrading functions, IDA Python scripts are utilized to extract the upgrading behavior nodes, and the safety is judged. The dynamic analysis is to make up the static dynamic analysis, mainly use the binary dynamic plug-in method, track the software upgrading flow, extract the behavior node of the software upgrading, use the analysis method of the data flow if necessary, extract the behavior basis of upgrading, judge the security according to the model of judging of the leak. The detection method can automatically detect the upgrade behavior loopholes, and greatly improves the detection efficiency under the condition of ensuring the reliability.
According to the above description, in an exemplary embodiment, the process of generating the software online upgrade vulnerability classification model based on the positive training sample and the negative training sample includes:
acquiring a first positive training sample set P and an unlabeled training sample set U;
randomly taking out part of samples from the first positive training sample set P as spy samples S, taking all the remaining positive training samples in the first positive training sample set P as a second positive training sample set Ps, and marking each positive training sample in the second positive training sample set Ps as 1;
adding the spy sample S into the unlabeled training sample set U, and marking each training sample in the unlabeled training sample set added with the spy sample as-1 to obtain a negative training sample set Us;
training a classifier by using the second positive training sample set Ps and the negative training sample set Us, classifying each sample in the unlabeled training sample set by using the trained classifier, and obtaining the probability that each sample in the unlabeled training sample set is positive, and marking the probability as a first probability;
the first probability is weighted and summed to obtain the probability that all samples in the unlabeled training sample set are positive, and the probability is recorded as a second probability;
And comparing the second probability with a preset probability threshold, and combining based on a classifier at the current moment and corresponding classification parameters when the second probability is larger than the preset probability threshold to generate a software online upgrading vulnerability classification model.
Specifically, when training a classifier using the second positive training sample set Ps and the negative training sample set Us, the method further comprises:
setting the loss function as hinge loss, and classifying the positive sample as the error loss target functionNumber F 1 The method comprises the following steps:
labeling unlabeled samples, and setting a negative sample classification error loss objective function F 2 The method comprises the following steps:
wherein C is + Penalty factor representing positive classification error, C - A penalty factor representing a negative classification error;
y i a label representing an unlabeled specimen;
x i representing the feature vector;
w and b represent natural constants.
Therefore, a group of reliable negative samples and an original positive sample set are obtained through PU Learning training, so that a semantic information sample set is formed. Then, preprocessing is carried out by using NLP, a semantic information sample set is converted into a group of word vector training sample set, then positive and negative samples are used for constructing a group of feature vectors, the feature vectors are input into an SVM classifier, and an upgrading semantic classifier is trained and is specially used for automatic classification and identification of online upgrading semantic information of software. In actual work, SVM iterative training is used for classifying the residual unidentified sample U, the separated negative sample is added into the negative sample for training, a classifier is trained iteratively, and classification accuracy is observed through classifying the test set.
In an exemplary embodiment, before the vulnerability detection of the target software for online upgrade by using the software online upgrade vulnerability classification model, the method may further include: traversing the software catalogue of the target software and inputting an executable file path of the target software; extracting semantic information from the software catalogue and the executable file path, and preprocessing the extracted semantic information; upgrading semantic recognition is carried out on the preprocessed semantic information by utilizing the software online upgrading vulnerability classification model, and whether traversing of the software catalogue is completed is judged; if so, extracting a semantic call tree and calculating a positioning function; if not, continuing traversing the software catalogue of the target software. By analyzing the first two stages, determining a program where an upgrading function is located, identifying a batch of semantic factors related to upgrading in the program, and acquiring a cross-reference chain of each semantic factor through IDA Python. The function call relationship may be represented by a function relationship call tree. The semantic information of the four stages of online upgrading of the software can be identified through analysis of the software in the first two stages, and virtual memory addresses of the software in the program can be obtained. As an example, the software checks the semantic information "check update", "app_version_build=", "http:// upmobilev. Qq.com" of version information, e.g. based on keyword matching, the virtual memory addresses of these three pieces of semantic information are all called by the function sub_42F324, it can be inferred that the sub_42F324 function is a check version information module. The character string "StartDownloadTask" is related to downloading the upgrade package, and is called by the function sub_4226d4 and sub_4226d4 is called by the function sub_42f324 after analysis. The function of the final inferred sub_42F324 function is to check version information and download software update packages.
In an exemplary embodiment, the process of performing vulnerability detection on the target software subjected to online upgrade by using the software online upgrade vulnerability classification model to determine whether the target software has a vulnerability includes: defining the software online upgrade vulnerability classification model as V= { C, P }; in the formula, the communication line safety index is upgraded for software, and the safety index is checked for an upgrade package; wherein 1 represents security and 0 represents risk of attack; when v= {1,1}, it indicates that the target software is very secure and no vulnerability exists; when (when)
When v= {1,0} or v= {0,1}, it indicates that the target software has security risk, but has no vulnerability; when v= {0,0}, it indicates that the target software is very unsafe and has a vulnerability.
According to the above description, in an exemplary embodiment, when the target software has a bug, the target software may alsoComprising the following steps: acquiring a software behavior chain S, upgrading network communication and upgrading packet check function sequence f which are invoked in the online upgrading process of software 1 ,f 2 ,...,f n Wherein f i (i=1, 2,., n) represents the i-th action for which the target software upgrade behaviour is recorded; will f i Defined as the triplet Addr, name, parameters; addr represents a function address, name represents a function Name, and parameters represent key parameters of the function; and determining the vulnerability position of the target software by utilizing the software behavior chain S check function address, the function name and the function key parameters.
In an exemplary embodiment, when the target software has a bug, the method may further include: checking the upgrade package text by adopting a hash algorithm, and determining whether the upgrade package text is sent through a plaintext; if the upgrade package text is transmitted through a plaintext, determining that the target software has a transmission vulnerability; if the upgrade edition text does not pass through the plaintext transmission, carrying out key verification on the upgrade edition text; if the upgrade package text does not have the key, determining that the target software has key leakage; and if the upgrade text has a secret key, determining that the target software has a transmission vulnerability. In this embodiment, to ensure the integrity of the upgrade package text, many software will use HSAH algorithm to verify the upgrade package text. If the check value is sent through the plaintext, HASH check misuse holes are caused. In addition, the key factor of encryption and decryption security is the security of the secret key, and the security of the information can be ensured by the secure secret key. However, part of the software uses a fixed key during the upgrade process, or the key is calculated by some fixed algorithm, which can be obtained by reverse analysis of the software. In essence, for both the server and the client, only the keys transmitted over the SSL channel are secure keys, i.e., the keys transmitted over the HTTPS channel. And if the solved plaintext needs to be compared and checked, the transmission of the check value also needs to be transmitted through an HTTPS channel.
In summary, the present application provides a software online upgrade system based on information security big data, which firstly obtains a target character string, wherein the target character string comprises a character string in the software online upgrade process; then word segmentation processing is carried out on the target character string, and the continuous character string is split into a plurality of independent word characters; performing stop word filtering on the plurality of independent word characters, and extracting variants or derivatives of the word characters subjected to the stop word filtering as stems; or extracting the variant or derivative of the word characters after the filtering of the stop words is completed as the root words; then mapping the stem or the root into a vector space to be used as a training sample of an online upgrading vulnerability classification model of the online upgrading software of the software; secondly, marking the training samples, and classifying the training samples based on marking results to obtain positive training samples and negative training samples; finally, performing supervised learning training based on the positive training sample and the negative training sample to generate a software online upgrading vulnerability classification model; and performing vulnerability detection on the target software subjected to online upgrading by using the software online upgrading vulnerability classification model, and determining whether the target software has vulnerabilities. Therefore, aiming at the characteristics of the software upgrading loopholes, the application provides a loophole detection scheme based on software upgrading behavior analysis, and the loophole detection can be carried out when the software is upgraded online. By collecting and sorting the network communication and password related API functions and parameters thereof, behavior nodes are sorted into XML files, and detection rules based on upgrade behavior chain loopholes are described. On the basis, the dynamic and static combined program analysis method is utilized to extract the software upgrading behaviors, static analysis is the primary analysis method, on the basis of the reverse positioning of the software upgrading functions, IDA Python scripts are utilized to extract the upgrading behavior nodes, and the safety is judged. The dynamic analysis is to make up the static dynamic analysis, mainly use the binary dynamic plug-in method, track the software upgrading flow, extract the behavior node of the software upgrading, use the analysis method of the data flow if necessary, extract the behavior basis of upgrading, judge the security according to the model of judging of the leak. The detection method can automatically detect the upgrade behavior loopholes, and greatly improves the detection efficiency under the condition of ensuring the reliability. The method and the device can firstly define the software upgrading behavior chain and describe the program behavior for finishing the software upgrading. Based on the general flow of software upgrading, the important focus of this chapter is to upgrade communication behaviors and encryption and decryption behaviors, and API functions used for correspondingly realizing the behaviors, and track information such as parameters related to the behaviors. Secondly, an upgrade behavior chain is extracted by adopting a static analysis method, and the method is a primary software upgrade vulnerability detection method. The static analysis can extract the related behavior chains of the binary program under the condition of not starting the software upgrade, thereby greatly improving the detection speed. On the basis of reverse positioning of the upgrading function, by means of an IDA Python plug-in, a control flow diagram of the software upgrading action can be locally extracted under the static condition, fixed parameters of related API functions are extracted, and the upgrading action can be judged through extraction and matching of the parameters, for example, whether a communication protocol belongs to HTTP or HTTPS, whether a fixed key is used or not can be judged. Again, dynamic analysis methods were employed as a complement to static analysis methods. Because static analysis spreads and analyzes the software under the condition that the software upgrading behavior is not started, the static analysis has own limitations, and the dynamic behavior of partial software upgrading cannot be extracted or is not extracted completely, so that the upgrading security cannot be judged completely. The chapter adopts a dynamic binary instrumentation method to track the upgrading process and further extracts a software upgrading behavior chain. And analyzing the relationship of the upgrade check value or the key data stream between partial upgrade behavior chains by adopting a data stream analysis method, so as to perfect the upgrade behavior chains. Finally, according to the upgrade vulnerability classification model established in the second chapter, vulnerability detection rules for matching are established, wherein the rules comprise software upgrade communication vulnerability detection rules and upgrade package verification vulnerability detection rules.

Claims (6)

1. The online software upgrading method based on the information security big data is characterized by comprising the following steps:
acquiring a target character string, wherein the target character string comprises a character string in the online upgrading process of software;
word segmentation processing is carried out on the target character string, and the continuous character string is split into a plurality of independent word characters;
performing stop word filtering on the plurality of independent word characters, and extracting variants or derivatives of the word characters subjected to the stop word filtering as stems; or extracting the variant or derivative of the word characters after the filtering of the stop words is completed as the root words;
mapping the stem or the root into a vector space to serve as a training sample of a software online upgrading vulnerability classification model;
labeling the training samples, classifying the training samples based on labeling results, and obtaining positive training samples and negative training samples;
performing supervised learning training based on the positive training sample and the negative training sample to generate a software online upgrading vulnerability classification model;
performing vulnerability detection on target software subjected to online upgrading by using the software online upgrading vulnerability classification model, and determining whether the target software has vulnerabilities;
Performing supervised learning training based on the positive training sample and the negative training sample, wherein the process for generating the software online upgrading vulnerability classification model comprises the following steps of:
acquiring a first positive training sample set P and an unlabeled training sample set U;
randomly taking out part of samples from the first positive training sample set P as spy samples S, taking all the remaining positive training samples in the first positive training sample set P as a second positive training sample set Ps, and marking each positive training sample in the second positive training sample set Ps as 1;
adding the spy sample S into the unlabeled training sample set U, and marking each training sample in the unlabeled training sample set added with the spy sample as-1 to obtain a negative training sample set Us;
training a classifier by using the second positive training sample set Ps and the negative training sample set Us, classifying each sample in the unlabeled training sample set by using the trained classifier, and obtaining the probability that each sample in the unlabeled training sample set is positive, and marking the probability as a first probability;
the first probability is weighted and summed to obtain the probability that all samples in the unlabeled training sample set are positive, and the probability is recorded as a second probability;
Comparing the second probability with a preset probability threshold, and when the second probability is larger than the preset probability threshold, combining based on a classifier at the current moment and corresponding classification parameters to generate a software online upgrading vulnerability classification model;
performing vulnerability detection on target software subjected to online upgrading by using the software online upgrading vulnerability classification model, wherein the process for determining whether the target software has vulnerabilities comprises the following steps:
defining the software online upgrade vulnerability classification model as V= { C, P }; wherein, C is the security index of the software upgrading communication line, and P is the checking security index of the upgrading packet; wherein 1 represents security and 0 represents risk of attack;
when v= {1,1}, it indicates that the target software is very secure and no vulnerability exists;
when v= {1,0} or v= {0,1} it indicates that the target software has security risk but no vulnerability;
when v= {0,0}, it indicates that the target software is very unsafe and has a vulnerability.
2. The method for online upgrading software based on information security big data according to claim 1, wherein when training a classifier using the second positive training sample set Ps and the negative training sample set Us, the method further comprises:
Setting the loss function as the hinge loss, the positive sample classification error loss objective function F 1 The method comprises the following steps:
labeling unlabeled samples, and setting a negative sample classification error loss objective function F 2 The method comprises the following steps:
wherein C is + Penalty factor representing positive classification error, C - A penalty factor representing a negative classification error;
y i a label representing an unlabeled specimen;
x i representing the feature vector;
w and b represent natural constants.
3. The method for online upgrading software based on information security big data according to claim 2, wherein before utilizing the software online upgrading vulnerability classification model to perform vulnerability detection on target software for online upgrading, the method further comprises:
traversing the software catalogue of the target software and inputting an executable file path of the target software;
extracting semantic information from the software catalogue and the executable file path, and preprocessing the extracted semantic information;
upgrading semantic recognition is carried out on the preprocessed semantic information by utilizing the software online upgrading vulnerability classification model, and whether traversing of the software catalogue is completed is judged;
if so, extracting a semantic call tree and calculating a positioning function;
If not, continuing traversing the software catalogue of the target software.
4. The method for online upgrading software based on information security big data according to claim 1, wherein when the target software has a bug, the method further comprises:
acquiring a software behavior chain S, upgrading network communication and upgrading packet check function sequence f which are invoked in the online upgrading process of software 1 ,f 2 ,...,f n Wherein f i I-th action representing that target software upgrade behavior is recorded, i=1, 2, n;
will f i Defined as threeTuple { Addr, name, parameters }; addr represents a function address, name represents a function Name, and parameters represent key parameters of the function;
and determining the vulnerability position of the target software by utilizing the software behavior chain S check function address, the function name and the function key parameters.
5. The method for online upgrading software based on information security big data according to claim 4, wherein when the target software has a bug, the method further comprises:
checking the upgrade package text by adopting a hash algorithm, and determining whether the upgrade package text is sent through a plaintext; if the upgrade package text is transmitted through a plaintext, determining that the target software has a transmission vulnerability;
If the upgrade package text does not pass through the plaintext transmission, carrying out key verification on the upgrade package text; if the upgrade package text does not have the key, determining that the target software has key leakage; and if the upgrade package text has a secret key, determining that the target software has a transmission vulnerability.
6. The software online upgrading system based on the information security big data is characterized by comprising the following components:
the character string module is used for acquiring a target character string, wherein the target character string comprises a character string in the online upgrading process of the software;
the word segmentation module is used for carrying out word segmentation processing on the target character string and splitting the continuous character string into a plurality of independent word characters;
the filtering module is used for filtering the stop words of the plurality of independent word characters and extracting variants or derivatives of the word characters subjected to the stop word filtering as stems; or extracting the variant or derivative of the word characters after the filtering of the stop words is completed as the root words;
the training sample module is used for mapping the stem or the root into a vector space and is used as a training sample of the software online upgrading vulnerability classification model;
the labeling module is used for labeling the training samples and classifying the training samples based on labeling results to obtain positive training samples and negative training samples;
The training module is used for performing supervised learning training based on the positive training sample and the negative training sample to generate a software online upgrading vulnerability classification model;
the detection module is used for carrying out vulnerability detection on target software subjected to online upgrading by utilizing the software online upgrading vulnerability classification model and determining whether the target software has vulnerabilities or not;
performing supervised learning training based on the positive training sample and the negative training sample, wherein the process for generating the software online upgrading vulnerability classification model comprises the following steps of:
acquiring a first positive training sample set P and an unlabeled training sample set U;
randomly taking out part of samples from the first positive training sample set P as spy samples S, taking all the remaining positive training samples in the first positive training sample set P as a second positive training sample set Ps, and marking each positive training sample in the second positive training sample set Ps as 1;
adding the spy sample S into the unlabeled training sample set U, and marking each training sample in the unlabeled training sample set added with the spy sample as-1 to obtain a negative training sample set Us;
training a classifier by using the second positive training sample set Ps and the negative training sample set Us, classifying each sample in the unlabeled training sample set by using the trained classifier, and obtaining the probability that each sample in the unlabeled training sample set is positive, and marking the probability as a first probability;
The first probability is weighted and summed to obtain the probability that all samples in the unlabeled training sample set are positive, and the probability is recorded as a second probability;
comparing the second probability with a preset probability threshold, and when the second probability is larger than the preset probability threshold, combining based on a classifier at the current moment and corresponding classification parameters to generate a software online upgrading vulnerability classification model;
performing vulnerability detection on target software subjected to online upgrading by using the software online upgrading vulnerability classification model, wherein the process for determining whether the target software has vulnerabilities comprises the following steps:
defining the software online upgrade vulnerability classification model as V= { C, P }; wherein, C is the security index of the software upgrading communication line, and P is the checking security index of the upgrading packet; wherein 1 represents security and 0 represents risk of attack;
when v= {1,1}, it indicates that the target software is very secure and no vulnerability exists;
when v= {1,0} or v= {0,1} it indicates that the target software has security risk but no vulnerability;
when v= {0,0}, it indicates that the target software is very unsafe and has a vulnerability.
CN202310008447.2A 2023-01-04 2023-01-04 Software online upgrading method and system based on information security big data Active CN116108428B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310008447.2A CN116108428B (en) 2023-01-04 2023-01-04 Software online upgrading method and system based on information security big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310008447.2A CN116108428B (en) 2023-01-04 2023-01-04 Software online upgrading method and system based on information security big data

Publications (2)

Publication Number Publication Date
CN116108428A CN116108428A (en) 2023-05-12
CN116108428B true CN116108428B (en) 2023-09-01

Family

ID=86264974

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310008447.2A Active CN116108428B (en) 2023-01-04 2023-01-04 Software online upgrading method and system based on information security big data

Country Status (1)

Country Link
CN (1) CN116108428B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109886020A (en) * 2019-01-24 2019-06-14 燕山大学 Software vulnerability automatic classification method based on deep neural network
CN110348227A (en) * 2019-07-15 2019-10-18 燕山大学 A kind of classification method and system of software vulnerability
CN113722479A (en) * 2021-08-10 2021-11-30 深圳开源互联网安全技术有限公司 Log detection method and device and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210336987A1 (en) * 2020-04-26 2021-10-28 Bluedon Information Security Technologies Corp. Method for Detecting Structured Query Language (SQL) Injection Based on Big Data Algorithm
US11729198B2 (en) * 2020-05-21 2023-08-15 Tenable, Inc. Mapping a vulnerability to a stage of an attack chain taxonomy

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109886020A (en) * 2019-01-24 2019-06-14 燕山大学 Software vulnerability automatic classification method based on deep neural network
CN110348227A (en) * 2019-07-15 2019-10-18 燕山大学 A kind of classification method and system of software vulnerability
CN113722479A (en) * 2021-08-10 2021-11-30 深圳开源互联网安全技术有限公司 Log detection method and device and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
开源软件漏洞检测的混合深度学习方法;李元诚;崔亚奇;吕俊峰;来风刚;张攀;;计算机工程与应用(第11期);全文 *

Also Published As

Publication number Publication date
CN116108428A (en) 2023-05-12

Similar Documents

Publication Publication Date Title
Demetrio et al. Explaining vulnerabilities of deep learning to adversarial malware binaries
CN109145600B (en) System and method for detecting malicious files using static analysis elements
Shibahara et al. Efficient dynamic malware analysis based on network behavior using deep learning
Devesa et al. Automatic behaviour-based analysis and classification system for malware detection
Yuan Phd forum: Deep learning-based real-time malware detection with multi-stage analysis
US20110041179A1 (en) Malware detection
US11212297B2 (en) Access classification device, access classification method, and recording medium
Ma et al. Active semi-supervised approach for checking app behavior against its description
CN114077741B (en) Software supply chain safety detection method and device, electronic equipment and storage medium
US11544575B2 (en) Machine-learning based approach for malware sample clustering
Qiu et al. Cyber code intelligence for android malware detection
Elkhawas et al. Malware detection using opcode trigram sequence with SVM
Thunga et al. Identifying metamorphic virus using n-grams and hidden markov model
Lo et al. Towards an effective and efficient malware detection system
CN114090406A (en) Electric power Internet of things equipment behavior safety detection method, system, equipment and storage medium
Lounici et al. Optimizing Leak Detection in Open-source Platforms with Machine Learning Techniques.
Park et al. Birds of a feature: Intrafamily clustering for version identification of packed malware
CN113297580B (en) Code semantic analysis-based electric power information system safety protection method and device
CN111049828A (en) Network attack detection and response method and system
CN112817877B (en) Abnormal script detection method and device, computer equipment and storage medium
CN116108428B (en) Software online upgrading method and system based on information security big data
CN110287722B (en) Sensitive permission extraction method for privacy regulation check in iOS application
WO2023072002A1 (en) Security detection method and apparatus for open source component package
CN115277065B (en) Anti-attack method and device in abnormal traffic detection of Internet of things
TW202240453A (en) Method and computer for learning corredpondence between malicious behaviors and execution trace of malware and method for implementing neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20230810

Address after: Units A and C, 22/F, No. 109 Tiyu West Road, Tianhe District, Guangzhou City, Guangdong Province, 510000

Applicant after: GUANGDONG MC. SCIENCE AND TECHNOLOGY CO.,LTD.

Address before: Room 209, 2nd Floor, Building 1, Yard 1, Liangshuihe Road, Changping District, Beijing 102200

Applicant before: Beijing Zongliang Network Technology Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant