CN116257455A - Method, device, equipment and storage medium for generating fuzzy test case - Google Patents

Method, device, equipment and storage medium for generating fuzzy test case Download PDF

Info

Publication number
CN116257455A
CN116257455A CN202310519301.4A CN202310519301A CN116257455A CN 116257455 A CN116257455 A CN 116257455A CN 202310519301 A CN202310519301 A CN 202310519301A CN 116257455 A CN116257455 A CN 116257455A
Authority
CN
China
Prior art keywords
industrial protocol
protocol
training
generating
industrial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310519301.4A
Other languages
Chinese (zh)
Inventor
肖棋元
于佳文
朱强
王峥瀛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gezhouba Electric Power Rest House
China Three Gorges Corp
Original Assignee
Beijing Gezhouba Electric Power Rest House
China Three Gorges Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gezhouba Electric Power Rest House, China Three Gorges Corp filed Critical Beijing Gezhouba Electric Power Rest House
Priority to CN202310519301.4A priority Critical patent/CN116257455A/en
Publication of CN116257455A publication Critical patent/CN116257455A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3684Test management for test design, e.g. generating new test cases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Quality & Reliability (AREA)
  • Computer Hardware Design (AREA)
  • Medical Informatics (AREA)
  • Physiology (AREA)
  • Genetics & Genomics (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Communication Control (AREA)

Abstract

The invention provides a method, a device, equipment and a storage medium for generating a fuzzy test case, wherein the method for generating the fuzzy test case comprises the following steps: collecting an industrial protocol data packet, preprocessing the industrial protocol data packet, and generating a protocol entry; constructing an industrial protocol task set based on the protocol entry, training the industrial protocol task set, and generating optimal initialization parameters; constructing a hierarchical clustering model based on the optimal initialization parameters, inputting the acquired unknown industrial protocol into the hierarchical clustering model, and generating industrial protocol characteristics; and carrying out genetic variation on the industrial protocol characteristics to generate a fuzzy test case. Under the condition that the number of the industrial protocols for learning and training is small, the method ensures the learning capability of the industrial protocol features, reduces the threshold of a tester for testing the industrial protocols by using a fuzzy testing tool, can adapt to the unknown industrial protocol structure type, and improves the pertinence of the fuzzy test on the industrial protocols.

Description

Method, device, equipment and storage medium for generating fuzzy test case
Technical Field
The present invention relates to the field of fuzzy test technologies, and in particular, to a method, an apparatus, a device, and a storage medium for generating a fuzzy test case.
Background
Fuzzy testing is an automatic or semi-automatic testing technique commonly used to discover errors and security problems in software/operating system/network code, where the input of random data and illegal data is called "FUZZ".
In an industrial control system, the industrial control network has specificity, the industrial communication protocol is a foundation stone for industrial control network communication, and most of communication protocols do not have relevant regulations on network safety. Therefore, a security fuzzy test must be performed for the industrial communication protocol to find potential security risks, so as to take risk countermeasures in time, prevent the security risks from being further utilized, and reduce the probability of utilizing the security risks of the industrial control system.
The focus of current fuzzing techniques is mainly on case generation, and fuzzers (fuzzers) for fuzzing are divided into two categories: one type is a mutation-based fuzzy tester that creates test cases by mutating existing data samples; the other is based on a generated fuzzy tester that models the protocol or file format used by the system under test, generates inputs based on the model, and creates test cases accordingly. The existing fuzzy technology utilizing machine learning has the defects that a training model needs to be reconstructed aiming at each new situation, or a situation is exhausted as far as possible before model training is performed manually, and testers for carrying out fuzzy test on industrial protocols need to comprehensively and deeply understand the industrial protocols in the market; meanwhile, the effective data volume for fuzzy test by using an industrial protocol is limited, and the fuzzy test method is not applicable to a machine learning model requiring a large amount of data for training; and the model space-time overhead is large, so that the model cannot be quickly migrated to a new protocol scene, and the model is not adaptive to various protocols.
Disclosure of Invention
Therefore, the technical scheme of the invention mainly solves the defects that the existing industrial protocol has limited effective data volume for fuzzy test, is not applicable to a machine learning model which needs a large amount of data for training, and the current model cannot be self-adaptive to various protocols, thereby providing a method, a device, equipment and a storage medium for generating a fuzzy test case.
In a first aspect, an embodiment of the present invention provides a method for generating a fuzzy test case, including:
collecting an industrial protocol data packet, preprocessing the industrial protocol data packet, and generating a protocol entry;
constructing an industrial protocol task set based on the protocol entry, training the industrial protocol task set, and generating optimal initialization parameters;
constructing a hierarchical clustering model based on the optimal initialization parameters, inputting the acquired unknown industrial protocol into the hierarchical clustering model, and generating industrial protocol characteristics;
and carrying out genetic variation on the industrial protocol characteristics to generate a fuzzy test case.
According to the method for generating the fuzzy test case, provided by the embodiment of the invention, the industrial protocol task set is trained to generate the optimal initialization parameters, the hierarchical clustering model constructed by the optimal initialization parameters is used to generate the industrial protocol characteristics, and finally the industrial protocol characteristics are subjected to genetic variation to generate the fuzzy test case, so that the learning capacity of the industrial protocol characteristics is ensured under the condition that the number of the industrial protocols for learning and training is small, the threshold of testing the industrial protocol by using a fuzzy test tool by a tester is reduced, the unknown industrial protocol structure type can be adapted, and the pertinence of the fuzzy test on the industrial protocol is improved.
With reference to the first aspect, in a possible implementation manner, the preprocessing the industrial protocol data packet to generate a protocol entry includes:
unpacking the industrial protocol data packet to generate an industrial protocol message;
dividing the industrial protocol message according to bytes to generate an entry list;
determining weights of all the entries in the entry list based on the entry list and the industrial protocol message;
and sorting weights of all the entries in the entry list from big to small, and selecting the protocol entries based on a sorting result.
With reference to the first aspect, in another possible implementation manner, the determining weights of the terms in the term list based on the term list and the industrial protocol message includes:
the method comprises the steps of obtaining the total number of industrial protocol messages, the total word number of the industrial protocol messages and the number of industrial protocol messages containing each term, and calculating the weight of each term in the term list based on the number of times each term in the term list appears in the industrial protocol messages, the total number of the industrial protocol messages, the total word number of the industrial protocol messages and the number of the industrial protocol messages containing each term.
With reference to the first aspect, in another possible implementation manner, the building an industrial protocol task set based on the protocol entry, and training the industrial protocol task set, generating an optimal initialization parameter includes:
constructing an industrial protocol task set based on the protocol entry; wherein the data in each task in the industrial protocol task set comprises a training data set and a testing data set;
acquiring initialization model parameters, and performing internal circulation training update on an initial model based on the initialization model parameters, the training data set and the test data set;
and after the internal circulation training is updated, carrying out external circulation training updating on the initial model to generate the optimal initialization parameters.
With reference to the first aspect, in another possible implementation manner, the performing, based on the initialization model parameters, the training data set, and the test data set, an inner loop training update on the initial model includes:
training: acquiring initialization model parameters, taking the training data set as input of an initial model, performing forward propagation on the initialization model parameters, and calculating a current loss function based on a prediction result of the forward propagation and a training label;
a first updating step: updating the initialization model parameters by using a gradient descent method based on the current loss function;
and (3) internal circulation: and adding one to the internal circulation iteration number to generate a current iteration number, comparing the current iteration number with the maximum internal circulation updating number, and when the current iteration number does not reach the maximum internal circulation updating number, repeating the training step and the first updating step by taking the updated initialization model parameter as an initial parameter, otherwise, performing external circulation training updating.
With reference to the first aspect, in another possible implementation manner, after the inner loop training update, performing an outer loop training update on the initial model to generate the optimal initialization parameter includes:
a second updating step: taking the test data set as input of an initial model, generating a loss function under the test data set in each task, summing the loss functions under the test data set in each task, generating an outer circulation loss function, and updating the parameters of the initial model by using a gradient descent method based on the outer circulation loss function;
and (3) an outer circulation step: and taking the updated initialization model parameters as the initialization model parameters of the next internal circulation, and repeating the training step, the first updating step, the internal circulation step and the second updating step until the circulation times reach the maximum training period of the external circulation, so as to generate the optimal initialization parameters.
With reference to the first aspect, in another possible implementation manner, the performing genetic variation on the industrial protocol feature to generate a fuzzy test case includes:
taking the industrial protocol characteristic as a variation bit of an unknown industrial protocol;
and taking the variation position of the unknown industrial protocol as a basic position, and utilizing a genetic variation algorithm to perform variation on the unknown industrial protocol to generate the fuzzy test case.
In a second aspect, an embodiment of the present invention further provides a device for generating a fuzzy test case, including:
the acquisition module is used for acquiring an industrial protocol data packet, preprocessing the industrial protocol data packet and generating a protocol entry;
the training module is used for constructing an industrial protocol task set based on the protocol entry, training the industrial protocol task set and generating an optimal initialization parameter;
the construction module is used for constructing a hierarchical clustering model based on the optimal initialization parameters, inputting the acquired unknown industrial protocol into the hierarchical clustering model and generating industrial protocol characteristics;
and the genetic variation module is used for carrying out genetic variation on the industrial protocol characteristics to generate a fuzzy test case.
In a third aspect, an embodiment of the present invention further discloses an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor to cause the at least one processor to perform steps of a method for generating fuzzy test cases according to the first aspect or any optional implementation manner of the first aspect.
In a fourth aspect, the present invention further discloses a computer readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the steps of the method for generating a fuzzy test case according to the first aspect or any optional embodiment of the first aspect.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a method for generating a fuzzy test case according to an embodiment of the present invention;
fig. 2 is a flowchart of S101 provided in an embodiment of the present invention;
FIG. 3 is a flowchart of S102 provided in an embodiment of the present invention;
FIG. 4 is a flowchart of S1022 provided in an embodiment of the present invention;
fig. 5 is a flowchart of S1023 provided in an embodiment of the present invention;
FIG. 6 is a flowchart of S104 provided in an embodiment of the present invention;
FIG. 7 is a block diagram of a device for generating a fuzzy test case according to an embodiment of the present invention;
fig. 8 is a diagram illustrating an embodiment of an electronic device according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made apparent and fully in view of the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In the description of the present invention, it should be noted that the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In addition, unless explicitly stated or limited otherwise, the terms "mounted," "connected," "coupled," and "connected" are to be construed broadly, and may be, for example, fixedly connected, mechanically connected, or electrically connected; or can be directly connected, or can be indirectly connected through an intermediate medium, or can be communication between the two elements, or can be wireless connection or wired connection. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.
The embodiment of the invention provides a method for generating a fuzzy test case, which is shown in fig. 1 and comprises the following steps:
s101, collecting an industrial protocol data packet, preprocessing the industrial protocol data packet, and generating a protocol entry.
And the industrial protocol data packet containing various industrial protocol messages is obtained by transmitting and grabbing in real time on site and downloading on the Internet.
S102, constructing an industrial protocol task set based on the protocol entry, and training the industrial protocol task set to generate an optimal initialization parameter.
Wherein, the optimal initialization parameters comprise parameters of defining how many clusters, affinity algorithm, linking algorithm and the like.
S103, constructing a hierarchical clustering model based on the optimal initialization parameters, and inputting the acquired unknown industrial protocol into the hierarchical clustering model to generate industrial protocol characteristics.
Specifically, the similarity between the optimal initialization parameters is calculated, two optimal initialization parameters with higher similarity are combined, a new class is established, the steps are repeated until the stopping condition is met, the hierarchical class is obtained, and then the hierarchical clustering model is generated.
S104, carrying out genetic variation on the industrial protocol characteristics to generate a fuzzy test case.
According to the method for generating the fuzzy test case, the industrial protocol task set is trained to generate the optimal initialization parameters, the hierarchical clustering model constructed by the optimal initialization parameters is used to generate the industrial protocol features, and finally the industrial protocol features are subjected to genetic variation to generate the fuzzy test case.
As an optional embodiment of the present invention, as shown in fig. 2, S101, that is, the preprocessing the industrial protocol data packet, generates a protocol entry, and includes:
s1011, unpacking the industrial protocol data packet to generate an industrial protocol message.
Specifically, unpacking the grabbed protocol data packet, and unpacking a message of a specific application layer protocol, namely an industrial protocol message.
S1012, the industrial protocol message is segmented according to bytes, and an entry list is generated.
S1013, determining weights of all the entries in the entry list based on the entry list and the industrial protocol message.
Specifically, the total number of industrial protocol messages, the total word number of the industrial protocol messages and the number of industrial protocol messages containing each term are obtained, and the weight of each term in the term list is calculated based on the number of times each term appears in the industrial protocol messages in the term list, the total number of industrial protocol messages, the total word number of the industrial protocol messages and the number of industrial protocol messages containing each term.
Further, the weight of each term in the term list = (number of times a term appears in an industrial protocol message/total number of terms in the industrial protocol message)
Figure SMS_1
log (total number of industrial protocol messages/(number of industrial protocol messages containing each entry+1)).
S1014, sorting weights of the terms in the term list from big to small, and selecting the protocol terms based on the sorting result.
Specifically, a preset number of protocol entries in the sorting result are selected, and the protocol features and the data types of the protocol entries are known and can be used for making category and marking data.
As an optional embodiment of the present invention, as shown in fig. 3, S102, that is, the step of constructing an industrial protocol task set based on the protocol entry, and training the industrial protocol task set to generate the optimal initialization parameters includes:
s1021, constructing an industrial protocol task set based on the protocol entry; wherein the data within each task in the industrial protocol task set includes a training data set and a test data set.
Wherein the training data set is used for training an identification model specific to the classification task; the test data set is used for obtaining the adaptation degree of the model to the classification task, if the classification loss value is smaller, the model is better in adaptation to the classification task, otherwise, the model is poor in adaptation, and the recognition requirement of the data cannot be met.
S1022, acquiring initialization model parameters, and performing inner loop training update on the initial model based on the initialization model parameters, the training data set and the test data set.
Specifically, a plurality of tasks are arbitrarily selected from the industrial protocol task set, and training data sets in the plurality of tasks are input into the initial model to realize learning and training processes of the plurality of tasks.
S1023, after the internal circulation training is updated, the external circulation training is updated on the initial model, and the optimal initialization parameters are generated.
As an alternative embodiment of the present invention, as shown in fig. 4, the step S1022 of performing an inner loop training update on the initial model based on the initialized model parameters, the training data set and the test data set includes:
s10221, training steps: and acquiring initialization model parameters, taking the training data set as input of an initial model, performing forward propagation on the initialization model parameters, and calculating a current loss function based on a prediction result of the forward propagation and a training label.
Specifically, any one task is selected from the industrial protocol task set, and a training data set in the task is input into the initial model.
S10222, a first updating step: and updating the initialization model parameters by using a gradient descent method based on the current loss function.
Specifically, the current loss function is derivative to generate a parameter gradient, and the parameter gradient is returned to each network layer in the initial model to obtain updated parameters specific to the current task, namely the updated initialized model parameters.
S10223, inner circulation step: and adding one to the internal circulation iteration number to generate a current iteration number, comparing the current iteration number with the maximum internal circulation updating number, and when the current iteration number does not reach the maximum internal circulation updating number, repeating the training step and the first updating step by taking the updated initialization model parameter as an initial parameter, otherwise, performing external circulation training updating.
As an alternative embodiment of the present invention, as shown in fig. 5, the step S1023 of performing the outer loop training update on the initial model after the inner loop training update to generate the optimal initialization parameters includes:
s10231, a second updating step: and taking the test data set as input of an initial model, generating a loss function under the test data set in each task, summing the loss functions under the test data set in each task, generating an outer circulation loss function, and updating the parameters of the initial model by using a gradient descent method based on the outer circulation loss function.
Specifically, the test data set is used as the input of the initial model, and the loss function under the test data set in each task is generated as the same as the training step.
Further, the model parameters are updated and initialized by using a gradient descent algorithm to minimize the overall loss value, and the parameter updating process considers the applicability of the model to each task, and the updated parameters can be regarded as the general model parameters of all tasks in the inner loop.
S10232, outer circulation step: and taking the updated initialization model parameters as the initialization model parameters of the next internal circulation, repeating the training step, the first updating step, the internal circulation step and the second updating step until the circulation times reach the maximum training period of the external circulation, and generating the optimal initialization parameters.
As an alternative embodiment of the present invention, as shown in fig. 6, the generating a fuzzy test case in S104, that is, the genetic variation of the industrial protocol feature, includes:
s1041, taking the industrial protocol characteristic as a variation bit of an unknown industrial protocol.
Specifically, the general communication message includes a plurality of bits, for example, a modbus protocol rtu data request message is 09 03 00 00 00 01 85 42, where 09 represents a device address, 03 represents a function code, 00 is a start address, 00 is a number of registers, 85 42 represents a check bit, and message structures of other protocols are similar; the industrial protocol features determine which bits in the various messages are the most representative bits, and then the determined bits are used as variant bits.
S1042, using the variation position of the unknown industrial protocol as a basic position, and utilizing a genetic variation algorithm to perform variation on the unknown industrial protocol to generate the fuzzy test case.
Specifically, the mutation operation in the genetic algorithm refers to the replacement of the gene value at some loci in the individual chromosome coding string with other alleles at that locus, thereby forming a new individual; for example binary coding: 101101001011001 may become a new code after genetic mutation: 001101011011001. the mutation operation of the genetic algorithm comprises basic position mutation, uniform mutation, boundary mutation and the like. The basic position variation is to perform variation operation on a certain position or a plurality of positions of the gene loci in the individual coding strings according to variation probability and random assignment; the mutation position determined by the embodiment of the invention takes the mutation position as the basic position of the genetic algorithm, thus realizing the mutation operation of the message sample in the optimal message sample set and generating a more targeted fuzzy test case; compared with the scheme of directly adopting a genetic variation algorithm to carry out random variation in the prior art, the method reduces the generation of redundant test cases.
Further, capturing a large number of data packets of an unknown protocol, analyzing the data packets to obtain message samples, learning the captured message samples, constructing an initial seed pool of a fuzzy test seed, and calculating the initial seed pool by adopting a set coverage algorithm to obtain an optimal message sample set Si with the minimum cost, namely an initialized population of a test case; calculating the fitness of an individual in the optimal message sample set; taking the variation bit of the unknown industrial protocol as a basic bit to generate a new individual in the optimal message sample set Si, so as to obtain Si'; calculating individual fitness of Si', judging whether a preset index is met, and if so, outputting fuzzy test cases to obtain a test case set; if not, repeatedly calculating the individual fitness in the optimal message sample set Si until a preset index is met; and obtaining the union set of the fuzzy test case sets of each subset to obtain the fuzzy test case sets of all unknown protocols.
In the above optional implementation manner, the obtained fuzzy test case can adapt to an unknown industrial protocol structure type, so that the situation that the repeatability of the randomly generated test case is too high and the coverage is not targeted is avoided, the redundancy of the test case is avoided, and the pertinence of the fuzzy test on the industrial communication protocol is improved.
The embodiment of the invention also discloses a device for generating the fuzzy test case, which is shown in fig. 7 and comprises the following steps:
the collection module 71 is configured to collect an industrial protocol data packet, and perform preprocessing on the industrial protocol data packet to generate a protocol entry.
S101, collecting an industrial protocol data packet, preprocessing the industrial protocol data packet, and generating a protocol entry.
And the industrial protocol data packet containing various industrial protocol messages is obtained by transmitting and grabbing in real time on site and downloading on the Internet.
The training module 72 is configured to construct an industrial protocol task set based on the protocol entry, and train the industrial protocol task set to generate an optimal initialization parameter.
Wherein, the optimal initialization parameters comprise parameters of defining how many clusters, affinity algorithm, linking algorithm and the like.
The construction module 73 is configured to construct a hierarchical clustering model based on the optimal initialization parameter, and input the acquired unknown industrial protocol into the hierarchical clustering model to generate an industrial protocol feature.
Specifically, the similarity between the optimal initialization parameters is calculated, two optimal initialization parameters with higher similarity are combined, a new class is established, the steps are repeated until the stopping condition is met, the hierarchical class is obtained, and then the hierarchical clustering model is generated.
The genetic variation module 74 is configured to perform genetic variation on the industrial protocol feature to generate a fuzzy test case.
According to the device for generating the fuzzy test case, the industrial protocol task set is trained to generate the optimal initialization parameters, the hierarchical clustering model constructed by the optimal initialization parameters is used to generate the industrial protocol features, and finally the industrial protocol features are subjected to genetic variation to generate the fuzzy test case, so that the learning capacity of the industrial protocol features is guaranteed under the condition that the number of the industrial protocols for learning and training is small, the threshold of the testers for testing the industrial protocols by using the fuzzy test tool is reduced, the unknown industrial protocol structure type can be adapted, and the pertinence of the fuzzy test on the industrial protocols is improved.
As an alternative embodiment of the present invention, the acquisition module 71 includes:
and the unpacking sub-module is used for unpacking the industrial protocol data packet to generate an industrial protocol message.
Specifically, unpacking the grabbed protocol data packet, and unpacking a message of a specific application layer protocol, namely an industrial protocol message.
And the segmentation module is used for segmenting the industrial protocol message according to bytes to generate an entry list.
And the determining submodule is used for determining the weight of each term in the term list based on the term list and the industrial protocol message.
Specifically, the total number of industrial protocol messages, the total word number of the industrial protocol messages and the number of industrial protocol messages containing each term are obtained, and the weight of each term in the term list is calculated based on the number of times each term appears in the industrial protocol messages in the term list, the total number of industrial protocol messages, the total word number of the industrial protocol messages and the number of industrial protocol messages containing each term.
Further, the weight of each term in the term list = (number of times a term appears in an industrial protocol message/total number of terms in the industrial protocol message)
Figure SMS_2
log (total number of industrial protocol messages/(number of industrial protocol messages containing each entry+1)).
And the sorting sub-module is used for sorting the weights of the terms in the term list from large to small, and selecting the protocol terms based on the sorting result.
Specifically, a preset number of protocol entries in the sorting result are selected, and the protocol features and the data types of the protocol entries are known and can be used for making category and marking data.
As an alternative embodiment of the present invention, the training module 72 includes:
the construction submodule is used for constructing an industrial protocol task set based on the protocol entry; wherein the data within each task in the industrial protocol task set includes a training data set and a test data set.
Wherein the training data set is used for training an identification model specific to the classification task; the test data set is used for obtaining the adaptation degree of the model to the classification task, if the classification loss value is smaller, the model is better in adaptation to the classification task, otherwise, the model is poor in adaptation, and the recognition requirement of the data cannot be met.
And the internal circulation sub-module is used for acquiring the parameters of the initialization model and carrying out internal circulation training update on the initial model based on the parameters of the initialization model, the training data set and the test data set.
Specifically, a plurality of tasks are arbitrarily selected from the industrial protocol task set, and training data sets in the plurality of tasks are input into the initial model to realize learning and training processes of the plurality of tasks.
And the outer circulation sub-module is used for carrying out outer circulation training update on the initial model after the inner circulation training update to generate the optimal initialization parameters.
As an alternative embodiment of the present invention, the above-mentioned inner circulation sub-module includes:
and the training unit is used for acquiring the initialized model parameters, taking the training data set as the input of the initial model, carrying out forward propagation on the initialized model parameters, and calculating the current loss function based on the prediction result of the forward propagation and the training label.
Specifically, any one task is selected from the industrial protocol task set, and a training data set in the task is input into the initial model.
And the first updating unit is used for updating the initialization model parameters by using a gradient descent method based on the current loss function.
Specifically, the current loss function is derivative to generate a parameter gradient, and the parameter gradient is returned to each network layer in the initial model to obtain updated parameters specific to the current task, namely the updated initialized model parameters.
And the internal circulation unit is used for adding one to the internal circulation iteration number to generate the current iteration number, comparing the current iteration number with the maximum internal circulation updating number, and repeating the steps in the training unit and the first updating unit by taking the updated initialization model parameter as an initial parameter when the current iteration number does not reach the maximum internal circulation updating number, or else, performing external circulation training updating.
As an alternative embodiment of the present invention, the external circulation sub-module includes:
and the second updating unit is used for taking the test data set as the input of the initial model, generating a loss function under the test data set in each task, summing the loss functions under the test data set in each task, generating an outer circulation loss function, and updating the parameters of the initial model by using a gradient descent method based on the outer circulation loss function.
Specifically, the step of generating the loss function under the test data set in each task using the test data set as an input of the initial model is the same as the step in the training unit.
Further, the model parameters are updated and initialized by using a gradient descent algorithm to minimize the overall loss value, and the parameter updating process considers the applicability of the model to each task, and the updated parameters can be regarded as the general model parameters of all tasks in the inner loop.
And the external circulation unit is used for taking the updated initialization model parameters as the initialization model parameters of the next internal circulation, repeating the steps in the training unit, the first updating unit, the internal circulation unit and the second updating unit until the circulation times reach the maximum training period of the external circulation, and generating the optimal initialization parameters.
As an alternative embodiment of the present invention, the genetic variation module 74 includes:
and the generation unit is used for taking the industrial protocol characteristics as variation bits of an unknown industrial protocol.
And the mutation unit is used for using the mutation position of the unknown industrial protocol as a basic position, and using a genetic mutation algorithm to mutate the unknown industrial protocol so as to generate the fuzzy test case.
Specifically, the mutation operation in the genetic algorithm refers to the replacement of the gene value at some loci in the individual chromosome coding string with other alleles at that locus, thereby forming a new individual; for example binary coding: 101101001011001 may become a new code after genetic mutation: 001101011011001. the mutation operation of the genetic algorithm comprises basic position mutation, uniform mutation, boundary mutation and the like. The basic position variation is to perform variation operation on a certain position or a plurality of positions of the gene loci in the individual coding strings according to variation probability and random assignment; the mutation position determined by the embodiment of the invention takes the mutation position as the basic position of the genetic algorithm, thus realizing the mutation operation of the message sample in the optimal message sample set and generating a more targeted fuzzy test case; compared with the scheme of directly adopting a genetic variation algorithm to carry out random variation in the prior art, the method reduces the generation of redundant test cases.
Further, capturing a large number of data packets of an unknown protocol, analyzing the data packets to obtain message samples, learning the captured message samples, constructing an initial seed pool of a fuzzy test seed, and calculating the initial seed pool by adopting a set coverage algorithm to obtain an optimal message sample set Si with the minimum cost, namely an initialized population of a test case; calculating the fitness of an individual in the optimal message sample set; taking the variation bit of the unknown industrial protocol as a basic bit to generate a new individual in the optimal message sample set Si, so as to obtain Si'; calculating individual fitness of Si', judging whether a preset index is met, and if so, outputting fuzzy test cases to obtain a test case set; if not, repeatedly calculating the individual fitness in the optimal message sample set Si until a preset index is met; and obtaining the union set of the fuzzy test case sets of each subset to obtain the fuzzy test case sets of all unknown protocols.
In addition, an electronic device is provided in an embodiment of the present invention, as shown in fig. 8, where the electronic device may include a processor 110 and a memory 120, where the processor 110 and the memory 120 may be connected by a bus or other manner, and in fig. 8, the connection is exemplified by a bus. In addition, the electronic device further includes at least one interface 130, where the at least one interface 130 may be a communication interface or other interfaces, and the embodiment is not limited thereto.
The processor 110 may be a central processing unit (Central Processing Unit, CPU). The processor 110 may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSPs), application specific integrated circuits (Application Specific Integrated Circuit, ASICs), field programmable gate arrays (Field-Programmable Gate Array, FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or a combination of the above.
The memory 120 is used as a non-transitory computer readable storage medium for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the video compositing method according to the embodiments of the present invention. The processor 110 executes various functional applications of the processor and data processing by running non-transitory software programs, instructions, and modules stored in the memory 120, that is, implements a fuzzy test case generation method in the above-described method embodiments.
Memory 120 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created by the processor 110, etc. In addition, memory 120 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 120 may optionally include memory located remotely from processor 110, which may be connected to processor 110 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
In addition, at least one interface 130 is used for communication of the electronic device with external devices, such as with a server or the like. Optionally, at least one interface 130 may also be used to connect peripheral input, output devices, such as a keyboard, display screen, etc.
The one or more modules are stored in the memory 120 and when executed by the processor 110, perform a method of generating fuzzy test cases as in the embodiment of FIG. 1.
The specific details of the electronic device may be understood correspondingly with respect to the corresponding related descriptions and effects in the embodiment shown in fig. 1, which are not repeated herein.
It will be appreciated by those skilled in the art that implementing all or part of the above-described embodiment method may be implemented by a computer program to instruct related hardware, where the program may be stored in a computer readable storage medium, and the program may include the above-described embodiment method when executed. The storage medium may be a magnetic Disk, an optical disc, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a Flash Memory (Flash Memory), a Hard Disk (HDD), a Solid State Drive (SSD), or the like; the storage medium may also comprise a combination of memories of the kind described above.
It is apparent that the above examples are given by way of illustration only and are not limiting of the embodiments. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. While still being apparent from variations or modifications that may be made by those skilled in the art are within the scope of the invention.

Claims (10)

1. The method for generating the fuzzy test case is characterized by comprising the following steps of:
collecting an industrial protocol data packet, preprocessing the industrial protocol data packet, and generating a protocol entry;
constructing an industrial protocol task set based on the protocol entry, training the industrial protocol task set, and generating optimal initialization parameters;
constructing a hierarchical clustering model based on the optimal initialization parameters, inputting the acquired unknown industrial protocol into the hierarchical clustering model, and generating industrial protocol characteristics;
and carrying out genetic variation on the industrial protocol characteristics to generate a fuzzy test case.
2. The method for generating a fuzzy test case of claim 1, wherein the preprocessing the industrial protocol data packet to generate a protocol entry includes:
unpacking the industrial protocol data packet to generate an industrial protocol message;
dividing the industrial protocol message according to bytes to generate an entry list;
determining weights of all the entries in the entry list based on the entry list and the industrial protocol message;
and sorting weights of all the entries in the entry list from big to small, and selecting the protocol entries based on a sorting result.
3. The method for generating a fuzzy test case of claim 2, wherein determining weights of terms in the term list based on the term list and the industrial protocol message comprises:
the method comprises the steps of obtaining the total number of industrial protocol messages, the total word number of the industrial protocol messages and the number of industrial protocol messages containing each term, and calculating the weight of each term in the term list based on the number of times each term in the term list appears in the industrial protocol messages, the total number of the industrial protocol messages, the total word number of the industrial protocol messages and the number of the industrial protocol messages containing each term.
4. The method for generating a fuzzy test case according to claim 1, wherein the constructing an industrial protocol task set based on the protocol entry and training the industrial protocol task set to generate the optimal initialization parameter includes:
constructing an industrial protocol task set based on the protocol entry; wherein the data in each task in the industrial protocol task set comprises a training data set and a testing data set;
acquiring initialization model parameters, and performing internal circulation training update on an initial model based on the initialization model parameters, the training data set and the test data set;
and after the internal circulation training is updated, carrying out external circulation training updating on the initial model to generate the optimal initialization parameters.
5. The method for generating a fuzzy test case of claim 4, wherein the performing an inner loop training update on the initial model based on the initialization model parameters, the training data set, and the test data set comprises:
training: acquiring initialization model parameters, taking the training data set as input of an initial model, performing forward propagation on the initialization model parameters, and calculating a current loss function based on a prediction result of the forward propagation and a training label;
a first updating step: updating the initialization model parameters by using a gradient descent method based on the current loss function;
and (3) internal circulation: and adding one to the internal circulation iteration number to generate a current iteration number, comparing the current iteration number with the maximum internal circulation updating number, and when the current iteration number does not reach the maximum internal circulation updating number, repeating the training step and the first updating step by taking the updated initialization model parameter as an initial parameter, otherwise, performing external circulation training updating.
6. The method for generating a fuzzy test case of claim 5, wherein the generating the optimal initialization parameter by performing an outer loop training update on the initial model after the inner loop training update comprises:
a second updating step: taking the test data set as input of an initial model, generating a loss function under the test data set in each task, summing the loss functions under the test data set in each task, generating an outer circulation loss function, and updating the parameters of the initial model by using a gradient descent method based on the outer circulation loss function;
and (3) an outer circulation step: and taking the updated initialization model parameters as the initialization model parameters of the next internal circulation, and repeating the training step, the first updating step, the internal circulation step and the second updating step until the circulation times reach the maximum training period of the external circulation, so as to generate the optimal initialization parameters.
7. The method for generating a fuzzy test case according to claim 1, wherein the generating the fuzzy test case by performing genetic variation on the industrial protocol features comprises:
taking the industrial protocol characteristic as a variation bit of an unknown industrial protocol;
and taking the variation position of the unknown industrial protocol as a basic position, and utilizing a genetic variation algorithm to perform variation on the unknown industrial protocol to generate the fuzzy test case.
8. The utility model provides a generating device of fuzzy test case which characterized in that includes:
the acquisition module is used for acquiring an industrial protocol data packet, preprocessing the industrial protocol data packet and generating a protocol entry;
the training module is used for constructing an industrial protocol task set based on the protocol entry, training the industrial protocol task set and generating an optimal initialization parameter;
the construction module is used for constructing a hierarchical clustering model based on the optimal initialization parameters, inputting the acquired unknown industrial protocol into the hierarchical clustering model and generating industrial protocol characteristics;
and the genetic variation module is used for carrying out genetic variation on the industrial protocol characteristics to generate a fuzzy test case.
9. An electronic device comprising a processor and a memory, the memory coupled to the processor;
the memory has stored thereon computer readable program instructions which, when executed by the processor, implement the method of any of claims 1 to 7.
10. A computer readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, implements the method according to any of claims 1 to 7.
CN202310519301.4A 2023-05-10 2023-05-10 Method, device, equipment and storage medium for generating fuzzy test case Pending CN116257455A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310519301.4A CN116257455A (en) 2023-05-10 2023-05-10 Method, device, equipment and storage medium for generating fuzzy test case

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310519301.4A CN116257455A (en) 2023-05-10 2023-05-10 Method, device, equipment and storage medium for generating fuzzy test case

Publications (1)

Publication Number Publication Date
CN116257455A true CN116257455A (en) 2023-06-13

Family

ID=86688268

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310519301.4A Pending CN116257455A (en) 2023-05-10 2023-05-10 Method, device, equipment and storage medium for generating fuzzy test case

Country Status (1)

Country Link
CN (1) CN116257455A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117827685A (en) * 2024-03-05 2024-04-05 国网浙江省电力有限公司丽水供电公司 Fuzzy test input generation method, device, terminal and medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113837000A (en) * 2021-08-16 2021-12-24 天津大学 Small sample fault diagnosis method based on task sequencing meta-learning
CN114122563A (en) * 2022-01-26 2022-03-01 中国长江三峡集团有限公司 Temperature control method and temperature control system for energy storage system
CN114708637A (en) * 2022-04-02 2022-07-05 天津大学 Face action unit detection method based on meta-learning
CN115729825A (en) * 2022-11-25 2023-03-03 中国长江三峡集团有限公司 Fuzzy test case generation method and device of industrial protocol and electronic equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113837000A (en) * 2021-08-16 2021-12-24 天津大学 Small sample fault diagnosis method based on task sequencing meta-learning
CN114122563A (en) * 2022-01-26 2022-03-01 中国长江三峡集团有限公司 Temperature control method and temperature control system for energy storage system
CN114708637A (en) * 2022-04-02 2022-07-05 天津大学 Face action unit detection method based on meta-learning
CN115729825A (en) * 2022-11-25 2023-03-03 中国长江三峡集团有限公司 Fuzzy test case generation method and device of industrial protocol and electronic equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117827685A (en) * 2024-03-05 2024-04-05 国网浙江省电力有限公司丽水供电公司 Fuzzy test input generation method, device, terminal and medium
CN117827685B (en) * 2024-03-05 2024-04-30 国网浙江省电力有限公司丽水供电公司 Fuzzy test input generation method, device, terminal and medium

Similar Documents

Publication Publication Date Title
CN112003870B (en) Network encryption traffic identification method and device based on deep learning
CN110166462B (en) Access control method, system, electronic device and computer storage medium
CN109726763B (en) Information asset identification method, device, equipment and medium
CN112417439A (en) Account detection method, device, server and storage medium
US11381588B2 (en) Cybersecurity vulnerability classification and remediation based on installation base
CN111488577B (en) Model building method and risk assessment method and device based on artificial intelligence
CN111741002B (en) Method and device for training network intrusion detection model
CN109426700B (en) Data processing method, data processing device, storage medium and electronic device
CN116257455A (en) Method, device, equipment and storage medium for generating fuzzy test case
CN111935185B (en) Method and system for constructing large-scale trapping scene based on cloud computing
CN114448830A (en) Equipment detection system and method
CN111898129A (en) Malicious code sample screener and method based on Two-Head anomaly detection model
CN114726823A (en) Domain name generation method, device and equipment based on generation countermeasure network
CN116992299A (en) Training method, detecting method and device of blockchain transaction anomaly detection model
KR102548178B1 (en) Apparatus for few-shot classification with clustering function and meta-learning method thereof
CN115660073B (en) Intrusion detection method and system based on harmony whale optimization algorithm
CN113312619B (en) Malicious process detection method and device based on small sample learning, electronic equipment and storage medium
CN115729825A (en) Fuzzy test case generation method and device of industrial protocol and electronic equipment
CN112541548B (en) Method, device, computer equipment and storage medium for generating relational network
WO2021189362A1 (en) Time series data generation method and device based on multi-condition constraints, and medium
CN115496180A (en) Training method, generating method and device of network traffic characteristic sequence generating model
CN114339689A (en) Internet of things machine card binding pool control method and device and related medium
CN111639277A (en) Automated extraction method of machine learning sample set and computer-readable storage medium
CN118381682B (en) Industrial control network attack event comprehensive analysis tracing method and device
CN114697086B (en) Mining Trojan detection method based on depth typical correlation analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20230613