CN113821840A

CN113821840A - Bagging-based hardware Trojan detection method, medium and computer

Info

Publication number: CN113821840A
Application number: CN202110935464.1A
Authority: CN
Inventors: 李康; 陈嘉伟; 潘伟涛; 史江义; 董勐; 王杰; 温聪; 张焱; 高一鸣
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2021-08-16
Filing date: 2021-08-16
Publication date: 2021-12-21

Abstract

The invention belongs to the field of circuit design, and discloses a Bagging-based hardware Trojan horse detection method, a medium and a computer, wherein an effective characteristic matrix of a hardware Trojan horse circuit is extracted from a Verilog-based gate-level netlist file; preprocessing the feature data set by using an SMOTH technology so as to balance the number of Trojan circuit features and common circuit features; establishing a detection model based on a Bagging integration method by using an LSTM neural network model; and detecting the gate-level netlist by using the constructed Bagging integrated Trojan horse detection model. The invention extracts and predicts the characteristics of each logic gate and register of the gate-level netlist, can accurately determine the position of the Trojan horse circuit in the gate-level netlist, and is convenient for designers to check and modify the gate-level netlist. In addition, the feature extraction method does not need the information of the whole netlist, so that the Trojan detection is convenient for the local netlist circuit.

Description

Bagging-based hardware Trojan detection method, medium and computer

Technical Field

The invention belongs to the field of circuit design, and particularly relates to a Bagging-based hardware Trojan horse detection method, medium and computer.

Background

Currently, globalization of the integrated circuit industry presents an increasing challenge to hardware security. For example, Intellectual Property (IP) cores and EDA tools provided by third parties are widely used in IC design to reduce development costs and shorten marketing cycles. Since the third party IP core is not designed by itself, its internal circuitry is not available to the using company. Therefore, a competitor can easily insert some malicious logic, namely a hardware trojan, into the IP core.

A hardware trojan is typically a very small logic structure in a large-scale integrated circuit design, and generally comprises two parts, namely a trojan trigger and a trojan payload. The trojan trigger is responsible for monitoring the signal to determine if the trigger signal has arrived. If the trigger of the Trojan horse is not activated, the hardware Trojan horse is in a dormant state and cannot influence the original circuit. If a trojan trigger is activated, the trojan payload will perform certain malicious operations, such as changing functionality, degrading performance, and revealing secret information. The hardware trojan is usually set to be very harsh in the trigger condition of the trigger so as not to be discovered by people, so that the detection of the trojan structure in the circuit is very challenging. In addition, the use of chips in military, financial or traffic infrastructure has led to an increased threat of hardware trojans. The Trojan detection method is also more important through the above.

At present, machine learning is mainly applied to hardware Trojan Detection in 6 directions, namely Reverse Engineering (reversal Engineering Improvement), Real-Time Detection HT (Real-Time Detection), a gold Model-Free method (Golden Model-Free methods), Gate-Level netlist Trojan Detection (Gate-Level Netlists Detection) and a Classification method (Classification Approaches).

In recent years, machine learning is widely applied to the Trojan detection of gate-level netlists, and among the more common machine learning models, there are: SVM, K-means, random forest, etc. Although the accuracy of the Trojan detection of the gate-level netlist is greatly improved by the addition of the machine learning, the missing detection rate and the misjudgment rate of the traditional machine learning model are still high.

Through the above analysis, the problems and defects of the prior art are as follows: in the prior art, the missing detection rate and the false judgment rate of the hardware Trojan horse detection technology are too high.

The difficulty in solving the above problems and defects is: the hardware trojan is usually hidden in the common trojan, and the traditional detection method is avoided by using the nodes with small triggering probability. Although the effect of the existing machine learning detection method is improved, the average TPR is still difficult to be more than 80 percent, and the average TNR is more than 95 percent.

The significance of solving the problems and the defects is as follows: the invention effectively improves the detection capability of the hardware trojan horse, thereby improving the safety of using the third-party IP.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a Bagging-based hardware Trojan horse detection method, medium and computer.

The invention is realized in such a way that a Bagging-based hardware Trojan horse detection method comprises the following steps:

extracting an effective characteristic matrix of a hardware Trojan horse circuit from a gate-level netlist file based on Verilog;

the function is as follows: and obtaining a characteristic matrix of the node as the input of the model by traversing the edges around the node in a two-way mode. By classifying the node functions, the input feature matrix of the model is controlled within a reasonable size, so that the required storage space is reduced, and the efficiency of traversal calculation is improved.

Secondly, preprocessing the characteristic data set by using an SMOTH technology so as to balance the number of Trojan horse circuit characteristics and common circuit characteristics;

the function is as follows: because the hardware trojan circuit structure is small, the number ratio is not uniform compared with a normal network. This results in a polarity imbalance of the directly extracted data set. If the data set is directly used for training the model, the phenomenon of overfitting of the model can be caused, and the detection capability is lost. By using the SMOTH technology, the problem of overfitting of the model can be effectively relieved, and the detection capability of the model is improved.

Establishing a detection model based on a Bagging integration method by using an LSTM neural network model;

the function is as follows: and training the model by using the prepared data set, so that the model has hardware Trojan horse detection capability.

And step four, detecting the gate-level netlist by using the constructed Bagging integrated Trojan horse detection model.

Further, in the step one, the specific process of extracting the effective feature matrix of the hardware trojan circuit is as follows:

converting the circuit into a directed graph format according to the circuit netlist structure, and performing function classification on the directed graph vertexes according to the circuit logic gate and register types; and performing bidirectional traversal on each directed graph vertex to extract a feature matrix.

Further, in the second step, the specific process is as follows:

preprocessing the feature data set by using an SMOTH technology so as to balance the number of Trojan circuit features and common circuit features;

for each Trojan circuit feature matrix X, calculating the range from each Trojan circuit feature matrix X to other Trojan circuit feature matrices X by taking the Euclidean distance as a standard_iIs a distance ofTo its k neighbor;

and setting a sampling ratio according to the quantity unbalance ratio of the Trojan horse circuit characteristic matrix and the normal circuit characteristic matrix.

Further, for each Trojan circuit feature matrix X, randomly selecting a plurality of Trojan circuit feature matrices X from k neighbors of the Trojan circuit feature matrix X_n。

Further, the pair of each randomly selected neighbor X_nAnd respectively carrying out the following calculation with the log horse circuit characteristic matrix X to obtain a new approximate Trojan horse characteristic matrix:

X_new＝X+rand(0，1)×(X_n-X)。

further, in the third step, a detection model based on a Bagging integration method is established by using an LSTM neural network model, and the specific process is as follows:

6 LSTM models are built, a Bagging integrated model framework is built, and a scheme of a summation decision tree is built for Trojan horse classification.

Further, the specific process of constructing 6 LSTM models is as follows:

dividing a data set into a training set and a test set, and extracting training subsets from the training set, wherein the size of the training subsets is set to be one third of the training set, and the number of the training subsets is 6; training an LSTM model for each training subset to obtain 6 LSTM models, and integrating the LSTM models into a Bagging model;

inputting the whole training set into a trained Bagging model to obtain 6 groups of probability outputs; summing 6 groups of probability outputs, using the sum as a training set of a decision tree, and training the decision tree; and finally obtaining a complete Trojan horse detection model.

Further, in the fourth step, the gate-level netlist is detected by using the constructed Bagging integrated trojan detection model, and the specific process is as follows:

with a 1 result, the node is considered to be part of the trojan circuit; the result is 0 and the node is considered as a normal circuit.

Another object of the present invention is to provide a program storage medium for receiving user input, wherein the stored computer program enables an electronic device to execute the Bagging-based hardware Trojan horse detection method, and the method comprises the following steps:

It is another object of the present invention to provide a computer program product stored on a computer readable medium, comprising a computer readable program for providing a user input interface to implement the Bagging-based hardware Trojan horse detection method when the computer program product is executed on an electronic device.

By combining all the technical schemes, the invention has the advantages and positive effects that: the invention extracts and predicts the characteristics of each logic gate and register of the gate-level netlist, can accurately determine the position of the Trojan horse circuit in the gate-level netlist, and is convenient for designers to check and modify the gate-level netlist. In addition, the feature extraction method does not need the information of the whole netlist, so that the Trojan detection is convenient for the local netlist circuit.

The invention uses Bagging integration model and SMOTH preprocessing technology, reduces the overfitting phenomenon of the LSTM model and improves the prediction accuracy of the LSTM model. In addition, the scheme of the summation decision tree is used for replacing the traditional voting classification method, and the missing rate of the Bagging model is effectively reduced. In conclusion, compared with the traditional Trojan horse detection method, the Trojan horse detection method has better detection capability, higher accuracy and lower omission factor.

Drawings

Fig. 1 is a flowchart of a Bagging-based hardware trojan detection method according to an embodiment of the present invention.

Fig. 2 is a schematic flow chart of a hardware Trojan horse detection method according to an embodiment of the present invention.

FIG. 3 is an exemplary diagram illustrating conversion of a gate-level netlist into a directed graph in netlist extraction by a hardware Trojan detection method according to an embodiment of the present invention;

in fig. 3: FIG. a, original circuit diagram; and b, directed graph.

Fig. 4 is a functional classification diagram of a hardware Trojan horse detection method based on an integration method according to an embodiment of the present invention.

Fig. 5 is a schematic model diagram of a hardware Trojan horse detection method based on an integration method according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Aiming at the problems in the prior art, the invention provides a Bagging-based hardware Trojan horse detection method, a medium and a computer, and the invention is described in detail with reference to the attached drawings.

Persons of ordinary skill in the Bagging-based hardware Trojan detection method provided by the present invention may also use other steps to implement, and the Bagging-based hardware Trojan detection method provided by the present invention in fig. 1 is only a specific embodiment.

As shown in fig. 1, the Bagging-based hardware Trojan horse detection method provided in the embodiment of the present invention includes the following steps:

s101: and extracting the effective characteristic matrix of the hardware Trojan horse circuit from the gate-level netlist file based on Verilog.

S102: the feature data set is preprocessed using SMOTH techniques (synthesizing a few classes of oversampling techniques) to balance the number of trojan circuit features and common circuit features.

S103: and establishing a detection model based on a Bagging integration method by using an LSTM neural network model.

S104: and detecting the gate-level netlist by using the constructed Bagging integrated Trojan horse detection model.

In S101 provided by the embodiment of the present invention, a specific process of extracting an effective feature matrix of a hardware trojan circuit is as follows:

and converting the circuit into a directed graph format according to the circuit netlist structure, and carrying out function classification on the directed graph vertexes according to the circuit logic gate and register types.

And performing bidirectional traversal on each directed graph vertex to extract a feature matrix.

In S102 provided by the embodiment of the present invention, the specific process is:

the feature data set is preprocessed using SMOTH techniques (synthesizing a few classes of oversampling techniques) to balance the number of trojan circuit features and common circuit features.

For each Trojan circuit feature matrix X, calculating the characteristic matrix X from the Trojan circuit feature matrix X to other Trojan circuit feature matrices by taking the Euclidean distance as a standard_iGet its k neighbors.

And setting a sampling ratio according to the quantity unbalance ratio of the Trojan horse circuit characteristic matrix and the normal circuit characteristic matrix. For each Trojan horse circuit feature matrix X, randomly selecting a plurality of Trojan horse circuit feature matrices X from k neighbors of the Trojan horse circuit feature matrices X_n。

For each randomly selected neighbor X_nAnd respectively carrying out the following calculation with the log horse circuit characteristic matrix X to obtain a new approximate Trojan horse characteristic matrix:

X_new＝X+rand(0，1)×(X_nxX)。

in S103 provided by the embodiment of the present invention, a detection model based on a Bagging integration method is established by using an LSTM neural network model, and the specific process is as follows:

6 LSTM models are constructed, and a Bagging integrated model framework is established.

And constructing a scheme of a summation decision tree to classify the Trojan.

The whole model building process is to divide the data set into a training set and a testing set. And extracting training subsets from the training set, wherein the size of the training subsets is set to be one third of the training set, and the number of the training subsets is 6. Each training subset can train one LSTM model, so a total of 6 LSTM models can be obtained, which are integrated into a Bagging model. Then, the whole training set is input into the trained Bagging model to obtain 6 groups of probability outputs. And summing 6 groups of probability outputs, and taking the sum as a training set of the decision tree to train the decision tree. And finally obtaining a complete Trojan horse detection model.

In S104 provided by the embodiment of the present invention, the gate level netlist is detected by using the established Bagging integrated trojan detection model, and the specific process is as follows:

ideally, a 1 indicates that the node is considered part of the trojan circuit; the result is 0, indicating that the node is considered to be a normal circuit.

The technical solution of the present invention will be described in detail with reference to the following specific examples.

The embodiment provides a hardware Trojan horse detection method based on an integration method, which includes but is not limited to the following steps:

s1: and extracting the effective characteristic matrix of the hardware Trojan horse circuit from the gate-level netlist file based on Verilog.

S11: the invention converts the circuit into a directed graph format according to the circuit netlist structure, as shown in FIG. 3, and performs function classification on the directed graph vertexes according to the circuit logic gate and register types. First, logic gates, registers, and the like in the circuit netlist are set as vertices of a directed graph, circuit connections are taken as edges of the directed graph, and circuit inputs and outputs are taken as path directions. Then, the implementation of the hardware trojan in the gate-level netlist is mainly related to the logic function of the gate composing the trojan, and the information related to the configuration, the process library and the like of the gate composing the trojan is small. Therefore, the present invention classifies the functions of the vertices of the directed graph into 11 functional classes and other functional classes according to the main logic gate type and register type, as shown in fig. 4: "dff", "mux", "xnor", "oai", "nand", "nor", "and", "or", "buf", "inv", "ao", "others", i.e. registers, multiplexers, xor gates, or nand gates, nor gates, and gates, or gates, buffers, inverters, and gates and other types of functional logic gates. For convenience of description, the symbols "a", "B", "C", etc. are used instead. As shown in fig. 3, fig. 3(a) is an original circuit diagram, which obtains the directional diagram form of fig. 3(b) through the above operation.

S12: and performing bidirectional traversal on each directed graph vertex to extract a feature matrix. First, a vertex is used as a center, and here, a circled type B door in FIG. 3(B) is taken as an example. A traversal of depth 2 is performed to the output direction and the input direction, respectively. The number of times in which the logic gate type is to be the same or different logic gate type is extracted and recorded in the sequential co-occurrence matrix. The co-occurrence matrix is now a 12 x 12 matrix, where the row vector represents the number of times the row type receives the corresponding type output as an input, and the column vector represents the number of times the column type receives the corresponding type output as an input. Taking the column vector of A as an example, it can be seen that there are 1 circuit structure of A- > C and 2 circuit structures of A- > D near the center gate. The representation mode can well reflect the circuit structure near each gate, and is favorable for detecting the hardware Trojan horse. Finally, the matrix is transposed and supplemented after the rows, i.e. the row vector and column vector data are merged, e.g. the final vector of type a is denoted [001200000000000001100000 ]. And performing feature extraction on each vertex of the whole directed graph, namely performing feature extraction on all gates in the netlist to obtain a feature set of the whole netlist.

Table 1 co-occurrence matrix of circuit configurations

S2: the feature data set is preprocessed using SMOTH techniques (synthesizing a few classes of oversampling techniques) to balance the number of trojan circuit features and common circuit features.

Since the number of Trojan horse structures is very small compared to the normal structures, the specific column is generally less than 1/100, which can cause the neural network training sample to be unbalanced. Therefore, the synthetic features similar to the characteristics of the Trojan horse are generated by using the SMOTH (synthetic Minauthority optimization technique), so that the generalization capability of the model is improved, and the overfitting is prevented. The specific process comprises the following steps:

s21: for each Trojan circuit feature matrix X, calculating the characteristic matrix X from the Trojan circuit feature matrix X to other Trojan circuit feature matrices by taking the Euclidean distance as a standard_iGet its k neighbors.

S22: and setting a sampling ratio according to the quantity unbalance ratio of the Trojan horse circuit characteristic matrix and the normal circuit characteristic matrix. For each Trojan horse circuit feature matrix X, randomly selecting a plurality of Trojan horse circuit feature matrices X from k neighbors of the Trojan horse circuit feature matrices X_n。

S23: for each randomly selected neighbor X_nAnd respectively carrying out the following calculation with the log horse circuit characteristic matrix X to obtain a new approximate Trojan horse characteristic matrix:

X_new＝X+rand(0，1)×(X_n-X)。

s3: and establishing a detection model based on a Bagging integration method by using an LSTM neural network model.

S31: 6 LSTM models are constructed, and a Bagging integrated model framework is established. And extracting training subsets from the training set, wherein the size of the training subsets is set to be one third of the training set, and the number of the training subsets is 6. Each training subset can train one LSTM model, so a total of 6 LSTM models can be obtained, which are integrated into a Bagging model. The output of each LSTM is the probability that the node is judged to be the Trojan horse, and the closer the output value is to 1, the higher the probability of the Trojan horse of the node is; conversely, the greater the probability that the node is a normal circuit.

S32: and constructing a scheme of a summation decision tree to classify the Trojan. The probability outputs of the 6 LSTM models are summed and then classified by a decision tree to obtain the final result. The method can effectively avoid the phenomenon of missing judgment caused by the fact that a small number obeys a majority.

S4: and detecting the gate-level netlist by using the constructed Bagging integrated Trojan horse detection model.

Referring to FIG. 5, a set of features of the netlist under test, alternatively referred to as a test set, is obtained by extracting features from the gate-level netlist. And respectively putting the test sets into 6 LSTM models for continuous prediction to obtain 6 groups of probability results, and summing the 6 groups of probability results. And putting the obtained summation result into a decision tree for classification. If the result is 1, the node is considered as a part of the Trojan horse circuit; the result is 0, indicating that the node is considered to be a normal circuit.

The technical effects of the present invention will be described in detail with reference to experiments.

The reference circuit of Trust-HUB is used to prove the effectiveness of the invention. From the Trust-HUB, 14 reference circuits were taken as tests, and the reference circuit information is shown in the following Table 2:

TABLE 2Trust-HUB reference Circuit information

Using 13 netlists as the training set, the classification results can be classified as True Negative (TN), False Positive (FP), False Negative (FN), and True Positive (TP). TN is the number of normal circuits correctly identified as normal circuits. FP is the number of normal circuits that are misidentified as trojan circuits. FN is the number of trojan circuits that are misidentified as normal circuits. TP is the number of correctly identified trojan circuits. In addition to these indices, several indices are calculated using these four indices: true Positive Rate (TPR) and True Negative Rate (TNR). TPR is TP/(TP + FN), TNR is TN/(TN + FP).

The model was constructed using the scimit-learn library of python. And testing 1 netlist by using 13 netlists as a training set in turn. The final experimental results were as follows:

TABLE 3 results of the experiment

Netlist names	TPR	TNR
			RS232-T1000	100	95.6
RS232-T1100	100	94.6
			RS232-T1200	92.9	95.6
RS232-T1300	100	95.6
			RS232-T1400	100	97
RS232-T1500	100	95.5
			RS232-T1600	91.7	94.1
s15850-T100	70.4	99.3
			s35932-T100	85.7	99.8
s35932-T200	100	99.9
			s35932-T300	80.6	99.9
s38417-T100	50	99.9
			s38417-T200	100	99.9
s38417-T300	70.5	99.5
			Mean value of	88.7	97.6

As can be seen from Table 3, the TPR of the present invention is between 50% and 100%, while the TNR is between 95.6% and 99.9%. The mean TPR was 88.7% and the mean TNR was 97.6%. It can be seen that the method has better detection capability for most hardware trojans, and has lower misjudgment rate for common circuits.

It should be noted that the embodiments of the present invention can be realized by hardware, software, or a combination of software and hardware. The hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the apparatus and methods described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided on a carrier medium such as a disk, CD-or DVD-ROM, programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier, for example. The apparatus and its modules of the present invention may be implemented by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., or by software executed by various types of processors, or by a combination of hardware circuits and software, e.g., firmware.

The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.

Claims

1. A Bagging-based hardware Trojan detection method is characterized by comprising the following steps:

2. The Bagging-based hardware Trojan horse detection method according to claim 1, wherein in the first step, the specific process of extracting the effective feature matrix of the hardware Trojan horse circuit is as follows: converting the circuit into a directed graph format according to the circuit netlist structure, and performing function classification on the directed graph vertexes according to the circuit logic gate and register types; and performing bidirectional traversal on each directed graph vertex to extract a feature matrix.

3. The Bagging-based hardware Trojan horse detection method according to claim 1, wherein in the second step, the specific process is as follows:

for each Trojan circuit feature matrix X, calculating the range from each Trojan circuit feature matrix X to other Trojan circuit feature matrices X by taking the Euclidean distance as a standard_iObtaining k neighbors of the distance;

4. The Bagging-based hardware Trojan horse detection method as claimed in claim 3, wherein for each Trojan horse circuit feature matrix X, a plurality of Trojan horse circuit feature matrices X are randomly selected from k neighbors of each Trojan horse circuit feature matrix X_n。

5. The Bagging-based hardware Trojan horse detection method of claim 4, wherein the neighbor X is selected randomly for each neighbor X_nAnd respectively carrying out the following calculation with the log horse circuit characteristic matrix X to obtain a new approximate Trojan horse characteristic matrix:

X_new＝X+rand(0，1)×(X_n-X)。

6. the Bagging-based hardware Trojan horse detection method according to claim 1, wherein in the third step, a Bagging integration method-based detection model is established by using an LSTM neural network model, and the specific process is as follows:

7. The Bagging-based hardware Trojan horse detection method according to claim 6, wherein the specific process of constructing 6 LSTM models is as follows:

dividing a data set into a training set and a test set, and extracting training subsets from the training set, wherein the size of the training subsets is set to be one third of the training set, and the number of the training subsets is 6; training an LSTM model by each training subset to obtain 6 LSTM models in total, and integrating the 6 LSTM models into a Bagging model;

8. The Bagging-based hardware Trojan detection method according to claim 1, wherein in the fourth step, the constructed Bagging integrated Trojan detection model is used to detect the gate-level netlist, and the specific process is as follows:

9. A program storage medium for receiving user input, the stored computer program causing an electronic device to execute the Bagging-based hardware trojan detection method according to any one of claims 1 to 8, comprising the steps of:

10. A computer program product stored on a computer readable medium, comprising a computer readable program for providing a user input interface to implement a Bagging-based hardware trojan detection method as claimed in any one of claims 1 to 8 when executed on an electronic device.