CN114021148B - System for predicting industrial control network bugs based on Summary word segmentation characteristics - Google Patents

System for predicting industrial control network bugs based on Summary word segmentation characteristics Download PDF

Info

Publication number
CN114021148B
CN114021148B CN202111358158.2A CN202111358158A CN114021148B CN 114021148 B CN114021148 B CN 114021148B CN 202111358158 A CN202111358158 A CN 202111358158A CN 114021148 B CN114021148 B CN 114021148B
Authority
CN
China
Prior art keywords
vulnerability
str
industrial control
control network
characteristic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111358158.2A
Other languages
Chinese (zh)
Other versions
CN114021148A (en
Inventor
李峰
孙瑞勇
郝丽娜
靳海燕
水沝
张洪铭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Yuntian Safety Technology Co ltd
Original Assignee
Shandong Yuntian Safety Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Yuntian Safety Technology Co ltd filed Critical Shandong Yuntian Safety Technology Co ltd
Priority to CN202111358158.2A priority Critical patent/CN114021148B/en
Publication of CN114021148A publication Critical patent/CN114021148A/en
Application granted granted Critical
Publication of CN114021148B publication Critical patent/CN114021148B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a system for predicting industrial control network bugs based on Summary participle characteristics,step S400 is realized, and a text sequence of each sample vulnerability id in the corresponding Summary is obtained; step S401, for StreProcessing to obtain corresponding word segmentation set AeStep S402, when e =1, according to AeDetermining the number of participles StreCharacteristic weight w ofe(ii) a Step S403, when e>At 1 time, Str is comparede‑1And StreIf they are completely the same, w is sete=we‑1If not identical, the part word set A is dividedeAnd a participle set Ae‑1Performing set difference set operation to obtain AeRelative to Ae‑1Difference set fractional word number Ae‑Ae‑1And A ise‑1Relative to AeDifference set fractional word number Ae‑1‑Ae1Set up we=[(Ae‑Ae‑1)/(Ae‑1‑Ae1)]*we‑1(ii) a Step S404, determining each StreCorresponding Summary characteristic parameter values; s405, training to obtain an industrial control network vulnerability prediction model, and predicting the industrial control network vulnerability outbreak probability.

Description

System for predicting industrial control network bugs based on Summary word segmentation characteristics
Technical Field
The invention relates to the technical field of computers, in particular to a system for predicting industrial control network bugs based on Summary word segmentation characteristics.
Background
With the accelerated fusion of new-generation information technologies such as cloud computing, big data, artificial intelligence, internet of things and the like and manufacturing technologies, industrial control systems are independently opened from original closed, interconnected from a single machine and intelligentized from automation. When industrial enterprises obtain great development kinetic energy, a great deal of potential safety hazards also appear, and from Stuxnet viruses of iran nuclear plants in 2010 to Havex viruses in europe in the year 2014, network (hereinafter referred to as industrial control network) attacks of industrial control systems are more and more severe, and the industrial control systems are urgently required to obtain safety protection.
The system bugs of the industrial control system are important factors influencing the safety of the industrial control network, the bugs of the industrial control network cannot be repaired in time like an IT system, and a large number of bugs exist for a long time. Therefore, if the situation of vulnerability outbreak of the industrial control network cannot be predicted in time, and corresponding defense measures are taken, the safety of the industrial control network cannot be ensured. Therefore, how to accurately and efficiently predict the vulnerability outbreak situation of the industrial control network becomes a technical problem to be solved urgently.
Disclosure of Invention
The invention aims to provide a system for predicting the industrial control network vulnerability based on Summary word segmentation characteristics, which can quickly and accurately predict the explosion probability of the industrial control network vulnerability, thereby realizing that corresponding defense measures are taken based on the explosion probability of the industrial control network vulnerability and improving the safety of the industrial control network.
According to one aspect of the invention, a system for predicting industrial control network bugs based on Summary participle features is provided,
the system comprises a processor, a database and a storage medium storing a computer program, wherein the processor is in communication connection with the database, an internet vulnerability characteristic parameter list and an industrial control network vulnerability characteristic parameter list corresponding to all internet vulnerability ids are stored in the database, and the internet vulnerability characteristic parameters corresponding to the internet vulnerability ids comprise Summary; the computer program stored in the storage medium comprises a fourth computer program that, when executed by the processor, performs the steps of:
step S400, obtaining a text sequence { Str of each sample vulnerability id in corresponding Summary from the database1,Str2,…},StreThe value range of e is 1 to infinity for the Summary text corresponding to the e-th updating period;
step S401, for StrePerforming word segmentation processing, and stopping words by using a preset stopping word bank to obtain StreCorresponding participle set Ae;
Step S402, when e =1, according to AeDetermining the number of participles StreCharacteristic weight w ofe
Step S403, when e>At 1 time, Str is comparede-1And StreIf the text information of (1) is completely consistent with the text information of (1), setting we=we-1If Stre-1And StreIf the text information is not completely consistent, the word set A is dividedeAnd a participle set Ae-1Performing set difference set operation to obtain AeRelative to Ae-1Difference set fractional word number Ae- Ae-1And A ise-1Relative to AeDifference set fractional word number Ae-1-Ae1Set up we=[( Ae- Ae-1)/(Ae-1-Ae1) ]* we-1
Step S404, based on each StreCharacteristic weight w ofeAnd StreDetermining each StreCorresponding Summary characteristic parameter value PCSe=we*g(Stre) Wherein, g (Str)e) Is based on StreDetermining the original characteristic parameter value;
s405, constructing a model input vector based on the Summary characteristic parameter values corresponding to the sample vulnerability ids, training to obtain an industrial control network vulnerability prediction model, and predicting the industrial control network vulnerability outbreak probability based on the industrial control network vulnerability prediction model.
Compared with the prior art, the invention has obvious advantages and beneficial effects. By means of the technical scheme, the system for predicting the industrial control network vulnerability based on the Summary word segmentation characteristics can achieve considerable technical progress and practicability, has industrial wide utilization value and at least has the following advantages:
the system for predicting the industrial control network vulnerability based on the Summary participle characteristics can adjust the Summary characteristic weight according to the participle change relation of the Summary text in continuous periods, and improves the accuracy and the efficiency of obtaining Summary characteristic parameter values, so that the accuracy and the efficiency of training an industrial control network vulnerability prediction model are improved, and the accuracy and the efficiency of predicting the industrial control network vulnerability outbreak probability are improved. Reasonable defense measures are set based on the method, and the safety and the stability of the industrial control network are improved. The system is particularly suitable for application scenes with high Summary update frequency, namely application scenes with the Summary update frequency higher than a preset update frequency threshold.
The foregoing description is only an overview of the technical solutions of the present invention, and in order to make the technical means of the present invention more clearly understood, the present invention may be implemented in accordance with the content of the description, and in order to make the above and other objects, features, and advantages of the present invention more clearly understood, the following preferred embodiments are described in detail with reference to the accompanying drawings.
Drawings
Fig. 1 is a schematic diagram of a system framework for predicting an industrial control network vulnerability according to an embodiment of the present invention;
fig. 2 is a flowchart of predicting an industrial control network vulnerability based on the internet and industrial control network vulnerability parameters according to an embodiment of the present invention;
fig. 3 is a flowchart of predicting an industrial control network vulnerability based on a correction parameter according to a second embodiment of the present invention;
fig. 4 is a flowchart of predicting an industrial control network vulnerability based on Summary length features according to a third embodiment of the present invention;
fig. 5 is a flowchart of predicting an industrial control network vulnerability based on Summary participle features according to the fourth embodiment of the present invention;
fig. 6 is a flowchart of predicting an industrial control network vulnerability based on bitmap according to a fifth embodiment of the present invention;
fig. 7 is a flowchart of predicting an industrial control network vulnerability based on an N-gram according to a sixth embodiment of the present invention.
Detailed Description
To further explain the technical means and effects of the present invention adopted to achieve the predetermined objects, the following detailed description of the embodiments and effects thereof will be made with reference to the accompanying drawings.
The industrial control network is an enterprise intranet, and is communicated with the Internet through a gateway, and a plurality of single-chip microcomputers, DSPs, industrial personal computers, sensors and the like are connected in the industrial control network. The same vulnerability may be exposed on the internet or on the industrial control network. Each vulnerability (CVE) has a corresponding vulnerability id (which may be a unique identifier labeled for each vulnerability by the international standards organization) and characteristic parameters. The vulnerability characteristic parameters comprise Internet vulnerability characteristic parameters crawled from the Internet and industrial control network vulnerability characteristic parameters crawled from an industrial control network. As an example, the internet Vulnerability feature parameters may specifically include a CVSS value set by a Common Vulnerability Scoring System (CVSS for short) for each Vulnerability id, a summery (Vulnerability description text) parameter, a CVSS parameter, a CWE (Common weak Enumeration) parameter, a product group parameter, a reference website domain name parameter, and other customized internet parameters, and the summery parameters may specifically include a summery text length, a summery word segmentation feature, and the like. The industrial control network vulnerability characteristic parameters comprise gateway parameters, internal state parameters of the industrial control network and the like. Each internet vulnerability characteristic parameter and each industrial control network vulnerability characteristic parameter correspond to an own updating period, and huge differences may exist between the updating periods of the parameters.
The embodiment of the invention provides a system for predicting industrial control network vulnerabilities, which comprises a processor, a database and a storage medium stored with a computer program, wherein the processor is in communication connection with the database, as shown in fig. 1. Those skilled in the art will appreciate that the processor is disposed on a server, and the server and the database are not particularly limited to a hardware device and/or a software device, but may be a server cluster, a storage cluster, and the like. It will be appreciated that any computing device or combination of computing devices capable of data processing may be provided with the server and any storage device or combination of storage devices capable of data storage may be considered the database. The server and the database may be separate devices or may share one or more separate devices.
The database stores an internet vulnerability characteristic parameter list P = { P } corresponding to all internet vulnerability ids1,P2,…PMAn update period list TP of the internet vulnerability characteristic parameters = { TP = }1,TP2,…TPMAnd a vulnerability characteristic parameter list Q = { Q) of the industrial control network1,Q2,…QNAnd an industrial control network vulnerability characteristic parameter update period list TQ = { TQ }1,TQ2,…TQN}。PmIs the mth internet vulnerability characteristic parameter, TPmIs PmThe value range of M is 1 to M, and M is the quantity of the internet vulnerability characteristic parameters. QnTQ being the nth industrial control network vulnerability characteristic parameternIs QnThe value range of N is 1 to N, and N is the quantity of vulnerability characteristic parameters of the industrial control network.
The internet vulnerability characteristic parameter list specifically comprises at least one of a CVSS value, a CVSS parameter, a CWE parameter, a product group parameter, a reference website domain name parameter, a summary parameter and the like. And the internet vulnerability characteristic parameter list is updated correspondingly according to the updating period corresponding to each internet vulnerability characteristic parameter. The industrial control network vulnerability characteristic parameter list specifically comprises at least one of gateway parameters, internal state parameters of the industrial control network and the like.
And the industrial control network vulnerability characteristic parameters are correspondingly updated according to the updating period of each industrial control network vulnerability characteristic parameter. It should be noted that the internet vulnerability characteristic parameters are updated slowly, and the industrial control network vulnerability characteristic parameters can be monitored and obtained through the inside of the industrial control network, so that the updating frequency is high, and the accuracy can be up to the hour level or even the minute level. Therefore, the updating period of the internet vulnerability characteristic parameters is far longer than that of the industrial control network vulnerability characteristic parameters, and can be specifically set to max (TP)m)/min(TQn)>D1, wherein D1 is a preset first threshold value, and D1 is set according to a specific application scene. For example, the value range of D1 is set to [10,100 ]]Preferably, D1= 20. Although the industrial control network and the internet are isolated through the gateway, the outbreak trend of the vulnerability on the internet and the outbreak trend of the industrial control network are consistent, and more vulnerability characteristic parameters can be crawled on the internet, so that the outbreak probability of the vulnerability on the industrial control network can be predicted based on the industrial control network vulnerability characteristic parameters and the internet vulnerability characteristic parameters by combining an artificial intelligence training industrial control network vulnerability prediction model.
As an example, the processor when processing the computer program implements the following steps:
step S1, acquiring vulnerability characteristic vectors corresponding to vulnerability ids of each sample from the Internet vulnerability characteristic parameter list and the industrial control network vulnerability characteristic parameter list based on a preset sample vulnerability id set, and constructing a training parameter set;
step S2, training according to the training parameter set to obtain an industrial control network vulnerability prediction model;
and step S3, predicting the explosion probability of the vulnerability id to be tested on the industrial control network based on the industrial control network vulnerability prediction model.
It can be understood that, since the probability of vulnerability outbreak is predicted by using historical data, the sample vulnerability id set may be a part of vulnerability ids in the current vulnerability id set, or may be all vulnerability ids, and is set according to specific requirements.
Because various internet vulnerability characteristic parameters and industrial control network vulnerability characteristic parameters exist and the updating periods are different, different vulnerability characteristic parameters can be selected, and different industrial control network vulnerability prediction models can be obtained through training in different processing modes. The detailed description will be described in detail below by way of a plurality of examples, and the technical contents of the respective examples may be cited as each other unless otherwise specified.
The first embodiment,
The computer program stored in the storage medium includes a first computer program that, when executed by the processor, implements the steps of:
step S101, obtaining a training period T of a training data set0=LCM(TPm) LCM is the function of solving the least common multiple.
Due to different PmAnd QnThe update periods of the TP-based model are different, and if the sliding window is directly used for selecting the training parameters, many parameters are not changed within a certain time, which wastes computing resources and has little significance for model training, so that the TP is selected in the embodimentmThe small common multiple of (c) is used as the training period. It should be noted that, the updating period of the characteristic parameters of the internet vulnerability is far awayThe time window is selected only by considering the updating period of the internet vulnerability characteristic parameters, so that all the internet vulnerability characteristic parameters and the industrial control network vulnerability characteristic parameters can be acquired by each sample vulnerability id, the waste of calculation power is avoided, and the efficiency of model training is improved.
S102, obtaining T before the current moment and corresponding to each sample vulnerability id from the database0Inner PmList of parameter values, QnThe parameter value list and the industrial control network vulnerability outbreak probability truth value are obtained;
the actual explosion probability value of the industrial control network vulnerability refers to the actual explosion probability of the industrial control network vulnerability corresponding to the sample prediction moment, the actual explosion probability of the industrial control network vulnerability is also a parameter updated in the corresponding updating period, and the actual explosion probability of the industrial control network vulnerability is obtained by dividing the number of the industrial control network host equipment reporting the explosion of the vulnerability by the number of all monitored industrial control network host equipment in the updating period of the industrial control network vulnerability explosion probability.
Step S103, P based on vulnerability id of each samplemDetermining P corresponding to sample vulnerability id by parameter value listmTraining parameter value PCmQ based on per sample vulnerability idnDetermining Q corresponding to sample vulnerability id by using parameter value listnTraining parameter value QC ofnBased on PCmAnd QCnGenerating a model input vector of each sample vulnerability id;
step S104, training according to all sample vulnerability id model input vectors and industrial control network vulnerability outbreak probability truth values to obtain an industrial control network vulnerability prediction model:
f(x)=a1*x1+a2*x2+…aM+N*xM+N
wherein x isiP corresponding to sample vulnerability idmTraining parameter value or Q ofnTraining parameter values of aiIs xiThe value range of i is 1 to M + N;
and S105, predicting the explosion probability of the industrial control network vulnerability based on the industrial control network vulnerability prediction model.
As an example, the step S104 includes:
and S114, inputting all sample vulnerability id model input vectors into a preset industrial control network vulnerability prediction model to obtain a sample industrial control network vulnerability outbreak probability prediction value.
Step S124, the weight coefficient is adjusted according to the sample industrial control network vulnerability outbreak probability predicted value and the true value, and the industrial control network vulnerability prediction model is obtained through training.
As an example, in the step S105, a training period T corresponding to the training data set is preferably adopted0In the same period, the same input vector construction strategy from the step S102 to the step S104 is adopted to predict the vulnerability outbreak probability of the industrial control network, and the prediction accuracy is high.
As another example, the selected time period when the industrial control network vulnerability prediction model is trained is much longer than the period of the characteristic parameters of the industrial control network vulnerability, but because the period of the characteristic parameters of the industrial control network vulnerability is broken and the updating frequency is high, when at least one characteristic parameter of the industrial control network vulnerability is updated, the industrial control network vulnerability can be predicted, the prediction sensitivity is high, and the predictability of the newly-outbreak vulnerability is very strong. Specifically, the step S105 includes:
step S134, obtaining the prediction period T1=min(TQn);
Step S144, every interval T1Collecting P of current moment corresponding to vulnerability id to be testedmValue of parameter or QnConstructing an input vector corresponding to the id of the vulnerability to be tested;
step S154, inputting the input vector corresponding to the loophole id to be detected into the industrial control network loophole prediction model to obtain T after the current moment of the loophole id to be detected1And (5) the vulnerability outbreak probability of the industrial control network at the moment.
As a preferred example, in step S103, the distance T before the current time corresponding to each sample vulnerability id is T0Inner PmIs listed as { PCm1,PCm2,…PCmAWhere, PCmaIs PmAt T0The value of the a-th parameter in (a),a ranges from 1 to A, A is PmBefore the current time T0Internally collected PmTotal number of parameter values, a = T0/TPmP based on per sample vulnerability idmDetermining P corresponding to sample vulnerability id by parameter value listmTraining parameter value PCmThe method comprises the following steps:
step S113, determining P based on the following equationmTraining parameter value PCm
Figure DEST_PATH_IMAGE001
Due to different parameter updating periods, most of the internet vulnerability characteristic parameters and industrial control network vulnerability characteristic parameters can obtain a plurality of characteristic parameters in a training period, and therefore reasonable characteristic parameter values need to be determined based on the plurality of characteristic parameters to construct model input. As a preferred example, in step S103, the distance T before the current time corresponding to each sample vulnerability id is T0Inner QnIs listed as { QCn1,QCn2,…QCnB}, QCnbIs QnBefore the current time T0The value range of the B parameter value in the (B) is 1 to B, and B is QnBefore the current time T0Internally collected QnTotal number of parameter values, B = T0/ TQnQ based on per sample vulnerability idnDetermining Q corresponding to sample vulnerability id by using parameter value listnTraining parameter value QC ofnThe method comprises the following steps:
step S123, determining Q based on the following equationnTraining parameter value QC ofn
Figure 21420DEST_PATH_IMAGE002
As time goes on, a new bug may be added, and the bug characteristic parameters and the bug characteristic parameter update period may also change, so that in order to further improve the accuracy of the industrial control network bug prediction model, the processor implements the following steps when executing the first computer program:
step S100, every interval T0And re-executing the step S101 to the step S104 to update the industrial control network vulnerability prediction model.
In the first embodiment, the internet vulnerability characteristic parameters and the industrial control network vulnerability characteristic parameters are obtained by setting a reasonable training period of a training data set, corresponding parameter values are respectively determined based on the internet vulnerability characteristic parameters and the industrial control network vulnerability characteristic parameters in the training period, then the corresponding parameter values are converted into input vectors, and an industrial control network vulnerability prediction model is obtained through training. The method has the advantages that the vulnerability outbreak probability of the industrial control network can be rapidly and accurately predicted based on the internet vulnerability characteristic parameters and the industrial control network vulnerability characteristic parameters with multiple dimensions, reasonable defense measures are set based on the vulnerability outbreak probability, and the safety and the stability of the industrial control network are improved. The method and the device are particularly suitable for application scenes capable of simultaneously obtaining the internet vulnerability characteristic parameters and the industrial control network vulnerability characteristic parameters.
Example II,
It should be noted that the internet vulnerability characteristic parameters are many and easy to obtain, but in some application scenarios, the internet vulnerability characteristic parameters are limited by various factors such as the scale of the industrial control network, and sufficient industrial control network vulnerability characteristic parameters may not be obtained to train the industrial control network vulnerability prediction model. However, because the trend of the same vulnerability outbreak in the industrial control network is consistent with the overall trend of the same vulnerability outbreak in the internet and has relevance, the vulnerability outbreak probability of the industrial control network can be predicted based on the incidence relation between the industrial control network and the internet vulnerability outbreak and by combining with the internet vulnerability characteristic parameters.
Specifically, the computer program stored in the storage medium includes a second computer program, and when the processor executes the second computer program, the following steps are implemented:
step S201, obtaining P corresponding to each sample vulnerability id in a preset training period from the databasemThe method comprises the steps of parameter value list, internet vulnerability actual outbreak probability list, industrial control network vulnerability actual outbreak probability list and industrial control network vulnerability outbreak probability truth value.
The preset training period can be set according to specific training requirements, and can also be determined directly based on a mode of obtaining the least common multiple of the update period of the internet vulnerability characteristic parameters in the embodiment. The actual outbreak probability of the internet vulnerability is also a parameter updated in the corresponding updating period, and is obtained by dividing the number of the internet host devices reporting the outbreak of the vulnerability by the number of all the monitored internet host devices in the updating period of the outbreak probability of the internet vulnerability. The specific algorithm for obtaining the true value of the industrial control network vulnerability outbreak probability and the actual outbreak probability of the industrial control network vulnerability is described in detail in the first embodiment, and is not described herein again.
Step S202, determining a correction parameter P corresponding to each sample vulnerability id based on an Internet vulnerability actual outbreak probability list and an industrial control network vulnerability actual outbreak probability listCVEBased on P corresponding to each sample vulnerability idmDetermines the corresponding training parameter value PCmBased on PCmAnd generating a model input vector of each sample vulnerability id.
Note that, the correction parameter PCVEThe method is used for expressing the incidence relation between the internet actual vulnerability outbreak probability and the industrial control network actual vulnerability outbreak probability value of each sample vulnerability id, and each sample vulnerability id has a corresponding correction parameter PCVEIs a dynamically changing value based on each vulnerability id.
Step S203, correcting parameters P corresponding to all sample loopholes idCVETraining the model input vector and the industrial control network vulnerability outbreak probability truth value to obtain an industrial control network vulnerability prediction model:
f(x)=b0*PCVE+b1*x1+b2*x2+…bM *xM;
wherein x isjP corresponding to sample vulnerability idmTraining parameter values of bjIs xjThe value of j ranges from 1 to M.
It should be noted that the model constructs an input vector based on the characteristic parameters of the internet vulnerability, and represents the actual vulnerability outbreak probability of the internet and the actual industrial control network based on the corresponding characteristic parametersCorrection parameter P of incidence relation between probability values of interpeak vulnerability outbreakCVEMeanwhile, the probability of vulnerability outbreak of the industrial control network is predicted based on the characteristic parameters of the internet vulnerabilities as model input.
And S204, predicting the explosion probability of the industrial control network vulnerability based on the industrial control network vulnerability prediction model.
It can be understood that after the vulnerability prediction model is trained, the probability of vulnerability outbreak in the industrial control network can be predicted based on the input parameters corresponding to the vulnerability at any time.
Based on the correction parameter PCVEAnd predicting the vulnerability outbreak probability of the industrial control network by using the internet characteristic parameters. Therefore, how to determine the reasonably accurate correction parameter P in the model training processCVEThis is particularly important. As an example, in step S202, a correction parameter P corresponding to each sample vulnerability id is determined based on the internet vulnerability actual outbreak probability list and the industrial control network vulnerability actual outbreak probability listCVEThe method comprises the following steps:
step S212, obtaining an Internet and industrial control network vulnerability outbreak association parameter list { R ] according to an Internet vulnerability actual outbreak probability list and an industrial control network vulnerability actual outbreak probability list corresponding to each sample vulnerability id in a preset training period1,R2,…RC},RcThe value range of C is 1 to C for the C-th associated parameter, C is the total number of the associated parameters obtained by the sample vulnerability id in the preset training period, Rc=CHc1/CHc2,CHc1Is the actual outbreak probability value, CH, of the internet vulnerability at the c momentc2And the actual explosion probability of the industrial control network vulnerability at the c-th moment.
Step S222, according to { R1,R2,…RCDetermining a correction parameter P corresponding to a sample vulnerability idCVE
As an example, the step S222 includes:
step S232, obtaining { R1,R2,…RCMean value of RAVGAccording to RAVGAnd { R1,R2,…RCAcquiring a first variation parameter SR 1:
Figure DEST_PATH_IMAGE003
step S242, if SR1 is greater than or equal to the preset second threshold D2, { R [ ] is obtained1,R2,…RCMaximum value R of }maxSetting up PCVE=RmaxOtherwise, set PCVE=RAVG
It should be noted that if SR1 is greater than or equal to the preset second threshold D2, it indicates that there is a sudden burst period in the current training period, and therefore { R is selected1,R2,…RCThe maximum value of the correction parameter PCVEAnd is more accurate. If SR1 is less than the second threshold D2, it indicates that the vulnerability is relatively stable in the current training period, so R is selected1,R2,…RCAs a correction parameter PCVEAnd is more accurate. By reasonably selecting accurate correction parameter PCVEThe accuracy of the industrial control network vulnerability prediction model can be improved, the model has high sensitivity to emerging vulnerability prediction, and the method is particularly suitable for application scenes of emerging vulnerability prediction.
To further improve the selection of the correction parameter PCVEIf SR1 is smaller than D2 in step S242, the following steps may be further performed:
step S252, { R }1,R2,…RCMinimum value R of }minAccording to Rmin、RAVG、RmaxAcquiring a second variation parameter SR 2:
Figure 578041DEST_PATH_IMAGE004
step S262, if SR2 is greater than or equal to 1, setting PCVE=RminOtherwise, set PCVE=Rmax
When SR2 is 1 or more, R will be describedminThe influence of (a) is larger,thus, P is preferredCVE=RminThis situation is typically applicable in scenarios where an existing vulnerability is suddenly fixed, in which case R isminIs more influential, P is selectedCVE=RminThe model accuracy can be improved. If SR2 is less than 1, then it indicates RmaxIs more influential, P is therefore preferredCVE=RmaxTherefore, the model has strong sensitivity to the newly appeared vulnerability prediction, and is particularly suitable for application scenarios of the newly appeared vulnerability prediction.
As an example, in the step S202, P corresponding to each sample vulnerability id is used as a basismDetermines the corresponding training parameter value PCmThe method comprises the following steps:
step S272, corresponding P of each sample vulnerability idmDetermining the maximum value, the minimum value or the average value of all the parameters in the parameter value list as the corresponding training parameter value PCm
In the system of the second embodiment, through the internet vulnerability characteristic parameters and the correction parameter P representing the incidence relation between the internet actual vulnerability outbreak probability and the industrial control network actual vulnerability outbreak probability valueCVEAnd (4) constructing model input parameters, and training to obtain an industrial control network vulnerability prediction model. The method realizes the internet vulnerability characteristic parameters and the correction parameters P based on multiple dimensionsCVETherefore, the vulnerability outbreak probability of the industrial control network is rapidly and accurately predicted, reasonable defense measures are set based on the probability, and the safety and the stability of the industrial control network are improved. The second embodiment is particularly suitable for application scenarios in which vulnerability characteristic parameters of the industrial control network are not easy to obtain. Reasonable defense measures are set based on the method, and the safety and the stability of the industrial control network are improved.
The Summary parameter is the text description of the vulnerability by an authority, and can accurately and reliably reflect vulnerability characteristics, so that the characteristic parameter for predicting the industrial control network vulnerability can be constructed based on Summary. Summary is an unstructured parameter, so a feature parameter value needs to be constructed based on the text feature of Summary, for example, in the prior art, the length of Summary is directly based on the length of Summary, and the longer the length of Summary is, the vulnerability is shown to be damagedThe greater the sex, the greater the urgency of the need for treatment. However, since Summary is generally all Summary updated regularly by the authority, only Summary of vulnerability changed in a period will be changed, and Summary unchanged in other periods is kept consistent with Summary in the previous period when updated. For example, for a serious vulnerability outbreak three years ago, there is long text description information, but no other change occurs within three years, so the Summary description stays in the description state three years ago, and if only the feature parameter value is directly constructed from the Summary text, it is obviously most likely that the constructed Summary feature parameter value is inaccurate. Therefore, the corresponding feature weight value needs to be given according to the change of the Summary, so that the accuracy of building the Summary feature parameter is improved. Therefore, it is important to determine the feature weight value corresponding to each Summary, and the following detailed description is provided by some specific embodiments.
Example III,
The computer program stored in the storage medium comprises a third computer program that, when executed by the processor, performs the steps of:
step S300, obtaining a text sequence { Str of each sample vulnerability id in corresponding Summary from the database1,Str2,…},StreThe value range of e is 1 to infinity for the Summary text corresponding to the e-th updating period.
Step S301, when e =1, according to StreLength of (1) determining StreCharacteristic weight w ofe
Through step S301, can be for each StreAnd setting corresponding initial characteristic weight.
Step S302, when e>At 1 time, Str is comparede-1And StreIf the text information of (2) is completely consistent, judging z x we-1Whether the weight is larger than a preset first characteristic weight threshold value weminIf greater than, set we=z*we-1Wherein z is a preset weight adjustment coefficient, 0<z<1, if z x we-1Is less than or equal to weminThen set we=weminIf Stre-1And StreIf the text information of (1) is not consistent, according to StreLength of (1) determining StreCharacteristic weight w ofe
It should be noted that when Stre-1And StreWhen the text information is completely consistent, it means that Summary is not updated, so it needs to multiply z to reduce the corresponding feature weight. Preferably, z is set to 1/2 but some Summary have long-term non-update conditions and cannot be decreased without limit, so the first feature weight threshold w is seteminWhen w iseWhen the temperature is reduced to a certain degree, the minimum value is selected. When Str is formede-1And StreIf the text information of (2) is inconsistent, it indicates that Summary is changed, so it needs to be directly based on the current StreLength of (1) determining StreCharacteristic weight w ofe
Step S303, based on each StreCharacteristic weight w ofeAnd StreDetermining each StreCorresponding Summary characteristic parameter value PCSe=we*g(Stre) Wherein, g (Str)e) Is based on StreAnd determining the original characteristic parameter value.
In addition, g (Str)e) The parameter value may be obtained directly based on an existing algorithm, that is, a corresponding parameter value is determined directly based on a text feature, and the existing algorithm is not described herein again. w is aeThe method is determined based on the characteristic parameters of the Summary and the change of the Summary in continuous periods, so that the acquired Summary characteristic parameter values PCS are enabled to beeMore accurate and reliable, thereby improving the accuracy of the model.
S304, constructing a model input vector based on the Summary characteristic parameter values corresponding to the sample vulnerability ids, training to obtain an industrial control network vulnerability prediction model, and predicting the industrial control network vulnerability outbreak probability based on the industrial control network vulnerability prediction model.
It can be understood that other required internet vulnerability characteristic parameters and industrial control network vulnerability characteristic parameters may be introduced when the input vector is constructed, and the specific parameter processing may be based on the methods described in the first embodiment and the second embodiment, or may adopt the existing data processing method, which is not described herein again.
As an example, in step S301 and step S302, according to StreLength of (1) determining StreCharacteristic weight w ofeThe method comprises the following steps:
step S311, StreLength L ofeAnd a preset first length threshold value LminAnd a second length threshold LmaxComparison, first length threshold LminLess than a second length threshold LmaxIf L ise<LminThen set we=weminIf L ise>LmaxThen set we=wemax,wemaxIs a preset second feature weight threshold, the second feature weight threshold is greater than the first feature weight threshold, LeIn [ L ]min,Lmax]Within the range, then set we=k1*LeWherein k is1Is a preset first linear change coefficient.
Through step S311, Str can be basedeLength L ofeAn accurate and reliable initial feature weight is determined. Preferably, k is1Is set as (w)emax- wemin)/ wemax
As a preferred example, when e >1, before performing step S311, the method further includes:
step S310, determining we-1Whether or not based on we-1=z*we-2Set, if yes, and Stre-1And StreIf the text information of (1) is not consistent, w is setemin= we-1
When w ise-1=z*we-2Then, it is described that the Summary in the previous two periods has not changed, and the previous period has reduced the weight, and the current period has changed compared with the Summary in the previous period, so the weight of the current period must be greater than the weight of the previous period, at this time, w in the current period can be calculatedeminIs set as we-1The accuracy of obtaining the characteristic weight of the period is improved. It is understood that if not the case in step S310, then weminStill the original preset value.
As an example, in step S304, a model input vector is constructed based on the Summary feature parameter value corresponding to the sample vulnerability id, and the industrial control network vulnerability prediction model is obtained through training, which includes:
step S314, determining a model input vector of each sample vulnerability id according to the Summary characteristic parameter value corresponding to the sample vulnerability id, a preset internet vulnerability characteristic parameter and a preset industrial control network vulnerability characteristic parameter;
step S324, training based on model input vectors corresponding to sample vulnerability ids and the industrial control network vulnerability outbreak probability truth value to obtain the industrial control network vulnerability prediction model.
It can be understood that, after the model sample input is determined, the selected artificial intelligence model can be trained by obtaining the corresponding sample true value, the input parameter can be set to a preset training period, and the processing mode of the input parameter can be processed based on the processing mode in the first embodiment and the second embodiment, or can be processed by the existing processing mode, which is not described herein.
It should be noted that, the algorithm directly executed in step S301 after step S300 is executed is applicable to the vulnerability ids of the corresponding Summary text from the time e =1, but some vulnerability ids are newly added later, and a set of corresponding feature weight determination policy may also be set for such vulnerability ids, as an example, the step S300 further includes:
step S311, if { Str1,Str2… } preceding nr consecutive strs in sequenceeIs empty, Strnr+1If not, Str is setnr+1Characteristic weight w ofnr+1= wemax,wemaxFor a preset second feature weight threshold, then initializing e = nr +2, and performing step S302.
It should be noted that if { Str1,Str2… } preceding nr consecutive strs in sequenceeIs empty, Strnr+1If not, it indicates that the corresponding bug id is a new bug id at nr +1, then Strnr+1Characteristic weight w ofnr+1It is sufficient to directly set the maximum value second feature weight threshold,on the premise of ensuring the accuracy, the data processing amount can be reduced, and the data processing efficiency can be improved.
The third embodiment can adjust the feature weight of the Summary according to the text change and the length change of the Summary in the continuous period, the text change of the Summary is easy to judge, and the length parameter is easy to obtain, so that the accuracy and the obtaining efficiency of obtaining the Summary feature parameter value are improved, the accuracy and the training efficiency of training the industrial control network vulnerability prediction model are improved, and the accuracy and the prediction efficiency of predicting the industrial control network vulnerability outbreak probability are improved. Reasonable defense measures are set based on the method, and the safety and the stability of the industrial control network are improved.
Example four,
The fourth embodiment provides an application scenario more suitable for a high Summary update frequency, that is, the Summary update frequency exceeds a preset update frequency threshold.
The computer program stored in the storage medium comprises a fourth computer program that, when executed by the processor, performs the steps of:
step S400, obtaining a text sequence { Str of each sample vulnerability id in corresponding Summary from the database1,Str2,…},StreThe value range of e is 1 to infinity for the Summary text corresponding to the e-th updating period.
Step S401, for StrePerforming word segmentation processing, and stopping words by using a preset stopping word bank to obtain StreCorresponding participle set Ae
It should be noted that the preset deactivation word library may be a deactivation word library constructed based on technologies, and may be continuously changed according to application requirements. The industrial internet decommissioning word library described in the subsequent embodiment may also be used, and the industrial internet decommissioning word library may also be updated according to the updating method of the industrial internet decommissioning word library described in the sixth embodiment, which is not described herein again.
Step S402, when e =1, according to AeDetermining the number of participles StreCharacteristic weight w ofe
Through the process of the step S402, the process,may be based on AeNumber of participles per StreAnd setting corresponding initial characteristic weight.
Step S403, when e>At 1 time, Str is comparede-1And StreIf the text information of (1) is completely consistent with the text information of (1), setting we=we-1If Stre-1And StreIf the text information is not completely consistent, the word segmentation set A is dividedeAnd a participle set Ae-1Performing set difference set operation to obtain AeRelative to Ae-1Difference set fractional word number Ae- Ae-1And A ise-1Relative to AeDifference set fractional word number Ae-1-Ae1Set up we=[( Ae- Ae-1)/(Ae-1-Ae1) ]* we-1
It should be noted that when Stre-1And StreWhen the text information is completely consistent, it indicates that the Summary is not updated, and because the Summary update frequency is fast, w can be set directlye=we-1. If Stre-1And StreIf the text information is not completely consistent, the text information needs to be based on AeAnd Ae-1The change relationship between the characteristic weight and the weight to determine the characteristic weight change coefficient [ (A)e- Ae-1)/(Ae-1-Ae1) ]And is further based on [ (A)e- Ae-1)/(Ae-1-Ae1) ]And the weight w of the previous cyclee-1To determine we,Ae- Ae-1And Ae-1-Ae1In contrast, if Ae- Ae-1Greater than Ae-1-Ae1Description of AeIn Ae-1Add more words on the basis of (A), if Ae- Ae-1Is less than Ae-1-Ae1Description of AeIn Ae-1Reduce more words on the basis of (A), thus leading to (A)eIn Ae-1When more words are added on the basis, the feature weight becomes larger, AeIn Ae-1When more words are reduced on the basis, the characteristic weight is reduced, and the determined characteristic weight w is improvedeThe accuracy of (2).
Step S404, based on each StreCharacteristic weight w ofeAnd StreDetermining each StreCorresponding Summary characteristic parameter value PCSe=we*g(Stre) Wherein, g (Str)e) Is based on StreAnd determining the original characteristic parameter value.
In addition, g (Str)e) The parameter value may be obtained directly based on an existing algorithm, that is, a corresponding parameter value is determined directly based on a text feature, and the existing algorithm is not described herein again. w is aeIs determined based on the characteristic parameters of the Summary and the change of the Summary in continuous periods, so that the acquired Summary characteristic parameter value PCS is enabled to beeMore accurate and reliable, thereby improving the accuracy of the model.
S405, constructing a model input vector based on the Summary characteristic parameter values corresponding to the sample vulnerability ids, training to obtain an industrial control network vulnerability prediction model, and predicting the industrial control network vulnerability outbreak probability based on the industrial control network vulnerability prediction model.
It can be understood that other required internet vulnerability characteristic parameters and industrial control network vulnerability characteristic parameters may be introduced when the input vector is constructed, and the specific parameter processing may be based on the methods described in the first embodiment and the second embodiment, or may adopt the existing data processing method, which is not described herein again.
As an example, the step S402 includes:
step S412, AeNumber of participles SAeWith a predetermined first threshold value SUminAnd a first threshold value SUmaxComparison, wherein SUmin<SUmaxIf SAe< SUminThen set we= wsmin,wsminIs a preset third feature weight threshold if SAe>SUmaxThen set we= wsmax,wsmaxIs a preset fourth characteristic weight threshold value, if the preset third characteristic weight threshold value is in [ SU ]min,SUmax]Then set we=k2*SAeWherein k is2Is a preset second linear change coefficient.
Preferably, wsminSet to 0, wsmaxThe setting is 1, which facilitates the calculation.
Through step S412, can be based on AeNumber of participles SAeAn accurate and reliable initial feature weight is determined. Preferably, k is2Is set to (ws)max-wsmin)/ wsmax
It should be noted that, the algorithm directly executing step S402 after step S401 is executed is applicable to the vulnerability ids of the corresponding Summary text from the time e =1, but some vulnerability ids are newly added later, and a set of corresponding feature weight determination policy may also be set for such vulnerability ids, as an example, step S401 further includes:
step 422, if { Str1,Str2… } consecutive first ns strseA of (A)eIs empty, Ae+1If not, Str is setns+1Characteristic weight w ofns+1= wsmax,wsmaxFor a preset third feature weight threshold, then initializing e = ns +2, and performing step S403.
It should be noted that if { Str1,Str2… } consecutive first ns strseA of (A)eIs empty, Ae+1If not, it indicates that the corresponding bug id is the newly added bug id in ns +1, then Strns+1Characteristic weight w ofns+1The maximum value third characteristic weight threshold value can be directly set, so that the data processing amount can be reduced and the data processing efficiency can be improved on the premise of ensuring the accuracy.
As an example, in step S405, a model input vector is constructed based on the Summary feature parameter value corresponding to the sample vulnerability id, and an industrial control network vulnerability prediction model is obtained through training, including:
step S415, determining a model input vector of each sample vulnerability id according to the Summary characteristic parameter value corresponding to the sample vulnerability id, a preset internet vulnerability characteristic parameter and a preset industrial control network vulnerability characteristic parameter;
and S425, training based on model input vectors corresponding to sample vulnerability ids and the true value of the vulnerability outbreak probability of the industrial control network to obtain the industrial control network vulnerability prediction model.
The fourth embodiment is particularly suitable for application scenarios with high Summary update frequency, that is, application scenarios with Summary update frequency higher than a preset update frequency threshold. The method has the advantages that the feature weight of the Summary can be adjusted according to the word segmentation change relation of the Summary text in the continuous period, and the accuracy and the efficiency of obtaining the Summary feature parameter values are improved, so that the accuracy and the efficiency of training the industrial control network vulnerability prediction model are improved, and the accuracy and the prediction efficiency of predicting the industrial control network vulnerability outbreak probability are improved. Reasonable defense measures are set based on the method, and the safety and the stability of the industrial control network are improved.
Example V,
The fifth embodiment is particularly suitable for application scenarios with low Summary update frequency, that is, application scenarios with Summary update frequency lower than a preset update frequency threshold.
The system also comprises a bitmap (bitmap) generated by changing the Summary text corresponding to each vulnerability id along with the updating period, and the bitmap is used for storage, so that the data storage space can be saved. If the text of the Summary in the current period is not changed relative to the text of the Summary in the previous period, the position of the current period in the corresponding period of the bitmap is set to be 0, otherwise, the position is set to be 1, BeThe value corresponding to the bitmap in the e-th update period, BeEqual to 0 or 1, e has a value of 1 to infinity.
The computer program stored in the storage medium includes a fifth computer program that, when executed by the processor, implements the steps of:
step S501, determining a Summary cycle detection window TK based on a preset training cycle, wherein TK = a Te, a is a positive integer larger than 1, and Te is a Summary update cycle;
it is understood that the period detection window TK comprises a bits, each bit corresponding to an update period. Preferably, a has a value of 8.
Step S502, with BeAs the a-th bit information in the TK, the e-th cycle detection window information TK is obtainedeBased on BeAnd TKeBit change in (c), determining BeCharacteristic weight w ofe
In addition, the following formula BeAs the a-th bit information in TK, i.e. BeThe last bit of information in the TK. TKeChange of position in (TK)eThe variation of a bits in (1) corresponds to the variation of Summary of a consecutive cycles. For example, if all the a bits are 0, it indicates that the Summary does not change in a consecutive a cycles. As another example, BeIs 1, description BeThe corresponding Summary is changed from the Summary of the previous cycle. As another example, BeIs 0, Be-a1Is 1, Be-a1And BeAll the values in the interval are 0, B is indicatedeThe corresponding Summary continues for a1 cycles without change, and is therefore based on BeAnd TKeA bit change in (B) can be determinedeCharacteristic weight w ofe
Step S503, based on each BeCharacteristic weight w ofeAnd Summary text StreDetermining a Summary characteristic parameter value PCS corresponding to each Summary texte=we*g(Stre) Wherein, g (Str)e) Is based on StreAnd determining the original characteristic parameter value.
In addition, g (Str)e) The parameter value may be obtained directly based on an existing algorithm, that is, a corresponding parameter value is determined directly based on a text feature, and the existing algorithm is not described herein again. w is aeIs determined based on the characteristic parameters of the Summary and the change of the Summary in continuous periods, so that the acquired Summary characteristic parameter value PCS is enabled to beeMore accurate and reliable, thereby improving the accuracy of the model.
Step S504, a model input vector is constructed based on the Summary characteristic parameter values corresponding to the sample vulnerability ids, an industrial control network vulnerability prediction model is obtained through training, and the industrial control network vulnerability outbreak probability is predicted based on the industrial control network vulnerability prediction model.
It can be understood that other required internet vulnerability characteristic parameters and industrial control network vulnerability characteristic parameters may be introduced when the input vector is constructed, and the specific parameter processing may be based on the methods described in the first embodiment and the second embodiment, or may adopt the existing data processing method, which is not described herein again.
As an example, in step S502, the base B iseAnd TKeBit change in (c), determining BeCharacteristic weight w ofeThe method comprises the following steps:
step S512, judging BeIf it is 1, then set we=wbmax,wbmaxIs a preset fifth feature weight threshold, otherwise, step S522 is executed;
step S522, TK is obtainedeIn and BeThe most recent bit taking the value 1 and BeNumber of bits of interval d, judgment (wb)max- wbmin) Whether/d is less than a preset sixth characteristic weight threshold wbminIf it is smaller than, set we= wbminOtherwise set we=(wbmax- wbmin)/d,wbmin< wbmax
Since the update frequency of Summary is low, when Summary is updated compared with Summary in the previous period, it should have high weight, and through steps S512-S522, it can be used for w that the current period changes compared with the previous periodeSet directly to wbmaxTherefore, the accuracy can be ensured, and the calculation amount can be reduced. It is understood that, if a more accurate result is required, the calculation may be specifically performed based on the sum length or the word segmentation result in the third embodiment and the fourth embodiment, and details are not described herein. When the Summary is not updated compared with the Summary of the previous period, determining corresponding weight based on the distance between the current period and the last updating period, and acquiring K based on the bitmapeNeutral and BeMost recent bit value 1 and BeThe number d of the interval bits is small in calculation amount and high in calculation efficiency.
As an example, to further improve the quasi-determination of feature weight acquisition, wb may be dynamically adjusted based on the results of the last cycle detection windowmaxAnd wbminAfter step S522, the method further includes:
step S532, all w obtained based on the TK of the current Summary period detection windoweMax (w) ofe) And minimum value min (w)e) Update wbmax= max(we) Update wbmin= min(we)。
In order to further improve the efficiency of acquiring the feature weight, the bit operation may be performed directly based on bitmap, and as an example, the step S502 includes:
step S542 and BeAs the a-th bit information in the TK, obtaining the bitmap in the e-th period detection windoweInitializing WK as binary, corresponding initial decimal number of WK as 2a-1;
Step S552, judging the current bitmapeIs 0, if yes, go to step S562, if yes, go to step S572 if no, go to step S1;
step S562, the e-th periodic detection window is shifted to the right by one bit, and bitmap is updatedeWK is shifted right by one bit, and the process returns to step S552;
step S572, determining the current WK as BeCharacteristic weight w ofe
Take the value of a as 8, take B as an exampleeAs the a-th bit information in the TK, obtaining the bitmap in the e-th period detection windowe00110000, initializing WK to 10000000, current bitmapeIs 0, will bitmapeRight shift by one bit 00011000, right shift by one bit WK 01000000, and loop execution until bitmapeIs 1, the value of WK is BeCharacteristic weight w ofe. Obtaining the feature weight w by adopting the operation as step S542 to step S572eIncrease the acquisition characteristic weight weThe efficiency of (c).
The fifth embodiment is particularly suitable for application scenarios with low Summary update frequency, that is, application scenarios with Summary update frequency lower than a preset update frequency threshold. The method and the device particularly adopt bitmap to store the periodic change rate of Summary, and greatly reduce the space occupied by data storage. Obtaining feature weights w based on bitmapseThe method has the advantages of high operation speed and high accuracy, and improves the characteristic weight weAccuracy and efficiency of the process. Therefore, the accuracy and the training efficiency of training the industrial control network vulnerability prediction model are improved, and the accuracy and the prediction efficiency of predicting the industrial control network vulnerability outbreak probability are further improved. Reasonable defense measures are set based on the method, and the safety and the stability of the industrial control network are improved.
The third embodiment to the fifth embodiment describe three sets of methods for determining the corresponding Summary feature weight of each Summary text, and the sixth embodiment further describes a method for determining the original feature parameter value g (Str)e) Based on g (Str)e) And determining the corresponding Summary feature parameter value PCS of each Summary text according to the corresponding feature weighte。g(Stre) Specifically, g (Str) can be obtained based on the Summary text feature based on the existing feature processing algorithme). G (Str) can also be obtained based on the sub-scheme described in example sixe)
Example six,
The system comprises a preset industrial internet stop word bank, wherein stop words commonly used in the field of industrial internet are stored in the internet stop word bank. The text sequence of Summary corresponding to each sample vulnerability id is { Str1,Str2,…},StreThe value range of e is 1 to infinity for the Summary text corresponding to the e-th updating period.
The computer program stored in the storage medium includes a sixth computer program that, when executed by the processor, implements the steps of:
step S601, stopping Str based on the industrial InterneteThe industrial internet stop words in (1) are removed, and the Str is paired at the position of the industrial internet stop wordseThe segmentation is carried out to generate a corresponding text segment sequence { Fre1, Fre2,…FreI}, FreiIs StreI ranges from 1 to I, and I is StreTotal number of text segments.
Wherein, taking ABCDEFG as an example of a piece of text, each letter represents a word, and assuming that C and E are stop words in the industrial Internet stop word bank, C and E are removed, and the rest text is divided into three text segments AB, D and FG.
Step S602, for each StreEach Fr ofeiExecuting preset N-gram word segmentation processing, wherein N is a positive integer and the value range is [ Kn1, Kn2]Each StreAll Fr ofeiThe word segmentation is merged and de-duplicated to obtain a corresponding word segmentation vector FBe
It should be noted that if the N-gram word segmentation is directly performed on each Summary, because the number of Summary texts is huge, if the word segmentation results of all the Summary texts N-gram are directly subjected to one-hot coding, the vector dimension is too large, the required calculation amount is large, and the data processing efficiency is low. In this embodiment, each Summary is segmented based on stop words in step S601, and then the segmented text segments are N-gram word segmentation one by one, so that vector dimensionality can be greatly reduced, and data processing efficiency can be improved. The specific word segmentation process of the N-gram is prior art and will not be described herein. Preferably, kn1A value of 3, kn2The value is 6.
Step S603, all FBseAnd combining and de-duplicating the participles in the database to obtain a participle set FC, and determining the number of the participles of the FC as the dimension of one-hot coding.
Step S604, carrying out FB on participle vectors based on one-hot coded dimension paireOne-hot coding to obtain each StreThe original characteristic parameter value of (2).
The specific encoding process of one-hot encoding is prior art and will not be described herein. It will be appreciated that when the word vector FB is bisectedeAfter one-hot coding, the corresponding Str can be obtained based on the coding resulteThe original characteristic parameter value of (2).
Step S605, Str corresponding to vulnerability id based on sampleeAnd establishing a model input vector by using the original characteristic parameter values, training to obtain an industrial control network vulnerability prediction model, and predicting the industrial control network vulnerability outbreak probability based on the industrial control network vulnerability prediction model.
The step S605 can be directly based on StreThe original characteristic parameter value is combined with other vulnerability characteristic parametersModel input vectors are built, and in order to further improve accuracy of Summary characteristic parameter values, each Str can be subjected toeGiving corresponding weight, as an example, in the step S605, based on the Str corresponding to the sample vulnerability ideThe original characteristic parameter value modeling input vector comprises:
step S615, Str corresponding to vulnerability id based on sampleeOriginal characteristic parameter value g (Str) ofe) And corresponding feature weights weDetermining each StreCorresponding Summary characteristic parameter value PCSe=we*g(Stre) And constructing a model input vector based on the Summary characteristic parameter values corresponding to the sample vulnerability id.
Wherein, StreCorresponding feature weight weBased on StreAnd the current Summary text is determined based on changes in the historical Summary text. Specifically, at least one of the third, fourth and fifth embodiments may be adopted to determine weAnd will not be described herein.
In order to further improve the processing efficiency and accuracy of the Summary parameter value, the industrial internet decommissioned thesaurus may be updated, and for example, when the processor executes the sixth computer program, the processor further implements an industrial internet decommissioned thesaurus updating process, including the following steps:
step S600, initializing N = Kn2 in the N-gram,
s610, dividing the Summary texts corresponding to all vulnerability ids into text segments based on the industrial internet stop word library, removing industrial internet stop words, and performing N-gram word segmentation processing on each text segment to obtain an N-gram word segmentation quantity list;
step S620, adding the N-gram participles with the N-gram participle quantity larger than a preset participle quantity threshold value D3 into the industrial Internet disabled word stock, and judging whether Kn is larger than Kn or not1If yes, setting Kn = Kn-1, returning to step S610, if Kn is equal to Kn1And ending the process of updating the lexicon by using the industrial internet.
Through the steps S600-S620, the industrial internet disabled word stock is updated by combining N-gram processing on all the Summary texts, so that the disabled word stock is synchronously updated based on the updating conditions of the Summary texts, and the processing efficiency and accuracy of obtaining Summary parameter values are improved.
Preferably, D3= f [ [ solution ] ]
Figure DEST_PATH_IMAGE005
,SN,avg(Kn)]Wherein D3 is substituted with
Figure 233144DEST_PATH_IMAGE006
Is positively correlated with SN, is negatively correlated with avg (Kn) D3, and is the total number of total Summary of all holes, LjAvg (N) is the average of all values of N in the N-gram.
The sixth embodiment reduces the number of word segments after N-gram processing on all Summary texts by processing the stop words and segments of the Summary texts through the industrial internet stop word bank, thereby reducing the word segment vector FBeThe coding dimension of one-hot coding is improved, and Str acquisition is improvedeThe efficiency and the accuracy of the original characteristic parameter values are improved, so that the accuracy and the training efficiency of training the industrial control network vulnerability prediction model are improved, the accuracy and the prediction efficiency of predicting the vulnerability outbreak probability of the industrial control network are improved, reasonable defense measures are set based on the method, and the safety and the stability of the industrial control network are improved.
Example seven,
A server comprising the system of at least one of embodiments one through six.
The server can quickly and accurately train the industrial control network vulnerability prediction model based on the internet vulnerability characteristic parameters and the industrial control network vulnerability characteristic parameters, so that the industrial control network vulnerability outbreak probability can be quickly and accurately predicted based on the industrial control network vulnerability prediction model, reasonable defense measures are set based on the method, and the safety and the stability of the industrial control network are improved.
It should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the steps as a sequential process, some of the steps may be performed in parallel, concurrently or at the same time. In addition, the order of the steps may be rearranged. A process may be terminated when its operations are completed, but may have additional steps not included in the figure. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc.
It is to be understood that the invention is not limited to the specific embodiments disclosed, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention as defined by the appended claims.

Claims (7)

1. A system for predicting industrial control network bugs based on Summary word segmentation features is characterized in that,
the system comprises a processor, a database and a storage medium which stores a computer program, wherein the processor is in communication connection with the database, an internet vulnerability characteristic parameter list and an industrial control network vulnerability characteristic parameter list which correspond to all internet vulnerability ids are stored in the database, the internet vulnerability characteristic parameters which correspond to the internet vulnerability ids comprise Summary, and the Summary represents a vulnerability description text; the computer program stored in the storage medium includes a fourth computer program that, when executed by the processor, performs the steps of:
step S400, obtaining a text sequence { Str) of each sample vulnerability id in corresponding Summary from the database1,Str2,…},StreThe value range of e is 1 to infinity for the Summary text corresponding to the e-th updating period;
step S401, for StrePerforming word segmentation processing, and stopping words by using a preset stopping word bank to obtain StreCorresponding participle set Ae
Step S402, when e is equal to 1, according to AeDetermining the number of participles StreCharacteristic weight w ofe
Step S403, when e>At 1 time, Str is comparede-1And StreIf the text information of (1) is completely consistent with the text information of (1), setting we=we-1If Stre-1And StreIf the text information is not completely consistent, the word segmentation set A is dividedeAnd a participle set Ae-1Performing set difference set operation to obtain AeRelative to Ae-1Difference set fractional word number Ae-Ae-1And A ise-1Relative to AeDifference set part word number A ofe-1- AeSet up we=[(Ae-Ae-1)/(Ae-1- Ae)]*we-1
Step S404, based on each StreCharacteristic weight w ofeAnd StreDetermining each StreCorresponding Summary characteristic parameter value PCSe=we*g(Stre) Wherein, g (Str)e) Is based on StreDetermining the original characteristic parameter value;
s405, constructing a model input vector based on the Summary characteristic parameter values corresponding to the sample vulnerability ids, training to obtain an industrial control network vulnerability prediction model, and predicting the industrial control network vulnerability outbreak probability based on the industrial control network vulnerability prediction model.
2. The system of claim 1,
the step S402 includes:
step S412, AeNumber of participles SAeWith a predetermined first threshold value of the number of sub-words SUminAnd a second participle number threshold SUmaxComparison, wherein SUmin<SUmaxIf SAe<SUminThen set we=wsmin,wsminIs a preset third feature weight threshold if SAe>SUmaxThen set we=wsmax,wsmaxIs a preset fourth characteristic weight threshold value, if the preset third characteristic weight threshold value is [ SU ]min,SUmax]Then set we=k2*SAeWherein k is2Is a preset second linear change coefficient.
3. The system of claim 2,
k2is set to (ws)max-wsmin)/wsmax
4. The system of claim 1,
the step S401 further includes:
step 422, if { Str1,Str2… } consecutive first ns strseA of (A)eIs empty, Ae+1If not, Str is setns+1Characteristic weight w ofns+1=wsmax,wsmaxIs a preset third feature weight threshold, then, e-ns +2 is initialized, and step S403 is performed.
5. The system of claim 2,
wsminset to 0, wsmaxIs set to 1.
6. The system of claim 1,
in step S405, a model input vector is constructed based on the Summary characteristic parameter values corresponding to the sample vulnerability ids, and an industrial control network vulnerability prediction model is obtained through training, including:
step S415, determining a model input vector of each sample vulnerability id according to the Summary characteristic parameter value corresponding to the sample vulnerability id, a preset internet vulnerability characteristic parameter and a preset industrial control network vulnerability characteristic parameter;
and S425, training based on model input vectors corresponding to sample vulnerability ids and the true value of the vulnerability outbreak probability of the industrial control network to obtain the industrial control network vulnerability prediction model.
7. A server, characterized in that it comprises a system according to any one of claims 1 to 6.
CN202111358158.2A 2021-11-17 2021-11-17 System for predicting industrial control network bugs based on Summary word segmentation characteristics Active CN114021148B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111358158.2A CN114021148B (en) 2021-11-17 2021-11-17 System for predicting industrial control network bugs based on Summary word segmentation characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111358158.2A CN114021148B (en) 2021-11-17 2021-11-17 System for predicting industrial control network bugs based on Summary word segmentation characteristics

Publications (2)

Publication Number Publication Date
CN114021148A CN114021148A (en) 2022-02-08
CN114021148B true CN114021148B (en) 2022-07-01

Family

ID=80065022

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111358158.2A Active CN114021148B (en) 2021-11-17 2021-11-17 System for predicting industrial control network bugs based on Summary word segmentation characteristics

Country Status (1)

Country Link
CN (1) CN114021148B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107704763A (en) * 2017-09-04 2018-02-16 中国移动通信集团广东有限公司 Multi-source heterogeneous leak information De-weight method, stage division and device
CN109886020A (en) * 2019-01-24 2019-06-14 燕山大学 Software vulnerability automatic classification method based on deep neural network
WO2020114429A1 (en) * 2018-12-07 2020-06-11 腾讯科技(深圳)有限公司 Keyword extraction model training method, keyword extraction method, and computer device
CN112035846A (en) * 2020-09-07 2020-12-04 江苏开博科技有限公司 Unknown vulnerability risk assessment method based on text analysis

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10733302B2 (en) * 2017-12-15 2020-08-04 Mastercard International Incorporated Security vulnerability analytics engine
CN108259482B (en) * 2018-01-04 2019-05-28 平安科技(深圳)有限公司 Network Abnormal data detection method, device, computer equipment and storage medium
CN113591093B (en) * 2021-07-22 2023-05-16 燕山大学 Industrial software vulnerability detection method based on self-attention mechanism

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107704763A (en) * 2017-09-04 2018-02-16 中国移动通信集团广东有限公司 Multi-source heterogeneous leak information De-weight method, stage division and device
WO2020114429A1 (en) * 2018-12-07 2020-06-11 腾讯科技(深圳)有限公司 Keyword extraction model training method, keyword extraction method, and computer device
CN109886020A (en) * 2019-01-24 2019-06-14 燕山大学 Software vulnerability automatic classification method based on deep neural network
CN112035846A (en) * 2020-09-07 2020-12-04 江苏开博科技有限公司 Unknown vulnerability risk assessment method based on text analysis

Also Published As

Publication number Publication date
CN114021148A (en) 2022-02-08

Similar Documents

Publication Publication Date Title
US11416268B2 (en) Aggregate features for machine learning
US12014282B2 (en) Data processing method and apparatus, electronic device, and storage medium
TW202004497A (en) Computer system prediction machine learning models
CN110677433A (en) Method, system, equipment and readable storage medium for predicting network attack
US10713140B2 (en) Identifying latent states of machines based on machine logs
CN109213476B (en) Installation package generation method, computer readable storage medium and terminal equipment
Zhao et al. Reliability analysis using chaotic particle swarm optimization
Custode et al. A co-evolutionary approach to interpretable reinforcement learning in environments with continuous action spaces
Dong et al. Fully convolutional spatio-temporal models for representation learning in plasma science
CN114021148B (en) System for predicting industrial control network bugs based on Summary word segmentation characteristics
CN113792300B (en) System for predicting industrial control network bugs based on internet and industrial control network bug parameters
CN114021149B (en) System for predicting industrial control network bugs based on correction parameters
CN114021151B (en) System for predicting industrial control network bugs based on Summary length features
CN114021150B (en) System for predicting industrial control network bugs based on N-gram
CN113537614A (en) Construction method, system, equipment and medium of power grid engineering cost prediction model
CN114021147B (en) System for predicting industrial control network vulnerability based on bitmap
Wei et al. Smart contract fuzzing based on taint analysis and genetic algorithms
Cao et al. A boundary identification approach for the feasible space of structural optimization using a virtual sampling technique-based support vector machine
JP7420244B2 (en) Learning device, learning method, estimation device, estimation method and program
WO2022070105A1 (en) Systems and methods for enforcing constraints to predictions
US11201874B2 (en) Information processing apparatus, control method, and program
Barbuti et al. Encoding threshold Boolean networks into reaction systems for the analysis of gene regulatory networks
Shaik et al. Integrating Random Forest and Support Vector Regression Models for Optimized Energy Consumption Evaluation in Cloud Computing Data Centers
CN113554184A (en) Model training method and device, electronic equipment and storage medium
Barbuti et al. Encoding threshold boolean networks into reaction systems for the analysis of gene regulatory networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant