CN114065308A

CN114065308A - Gate-level hardware Trojan horse positioning method and system based on deep learning

Info

Publication number: CN114065308A
Application number: CN202111412498.9A
Authority: CN
Inventors: 董晨; 张媛媛; 许熠; 黄槟鸿; 黄小刚
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2021-11-25
Filing date: 2021-11-25
Publication date: 2022-02-18

Abstract

The invention relates to a gate-level hardware Trojan horse positioning method and system based on deep learning, wherein the method comprises the steps of firstly, obtaining seven open gate-level netlist files to obtain a training set and a test set; then preprocessing is carried out, the netlist file is converted into a path statement by using a depth-first search algorithm, and path generation is completed; then constructing and training a TextCNN model for detection and positioning; inputting the path set of the test set into the model to obtain a pre-detection result; dividing paths of the pre-detection result and constructing virtual positioning coordinates to obtain a short path set for positioningSL(ii) a Finally will beSLInputting the TextCNN model to obtain a positioning resultP. The invention realizes quick and effective evaluation of the safety of the integrated circuitFull performance and even discovery and targeting of threats.

Description

Gate-level hardware Trojan horse positioning method and system based on deep learning

Technical Field

The invention relates to the field of computer hardware protection and system-on-chip security, in particular to a gate-level hardware Trojan horse positioning method and system based on deep learning.

Background

Integrated Circuits (ICs) are the core components that make up the computer hardware, and are complex in design and manufacture. To reduce costs, many manufacturers choose to outsource a portion of the IC manufacturing process, so-called third party vendors, which undoubtedly introduces a significant security threat to hardware security. A Hardware Trojan (HT) is a small piece of circuitry that an attacker inserts into the original IC layout for some malicious purpose. HT may be inserted at any stage of IC manufacture with security threats including changing circuit functionality, causing information leakage, denial of service, etc. Currently, studies on HT detection can be roughly divided into pre-silicon detection and post-silicon detection, wherein the pre-silicon detection is to perform security detection before the IC chip is finished, and similarly, the post-silicon detection is to perform security detection after the IC chip is finished. Obviously, pre-silicon testing can reduce cost more, thereby achieving a win-win relationship between safety and profit. The pre-silicon detection is mainly performed in the design stage of the IC, and the gate level is the last link of the design stage, and it is very effective to detect HT at the gate level.

In the IC design, the IC is divided according to the level of abstraction, which is, in order from high to low: system level, algorithm level, register transfer level, gate level, transistor level. Gate-level detection is a common static detection method, and a new Trojan horse detection method is explored by analyzing the logic structure of a circuit through a gate-level netlist. The key to detecting HT at the gate level is to obtain a netlist file describing the level, i.e., a gate level netlist. The gate-level netlist is used to describe the interconnection relationships between circuit elements that contain logic gates or other elements at the same level as the logic gates. To date, there have been many efforts to propose methods for prevention and detection of HT at the gate level. The most common method is to utilize a gate-level netlist to mine the features of HT, and then input the features into a deep learning model to perform feature learning, so as to effectively detect HT. Numerous detection studies have achieved considerable results, but staying in the detection phase alone is not really resistant to HT, finding specific locations of HT is a prerequisite to more accurately combat them, however, studies to locate HT associations are still scarce.

Disclosure of Invention

In view of the above, the present invention provides a gate-level hardware trojan positioning method and system based on deep learning, which can position a hardware trojan at the gate level.

In order to achieve the purpose, the invention adopts the following technical scheme:

a gate-level hardware Trojan horse positioning method based on deep learning comprises the following steps:

step A: acquiring seven public gate-level netlist files, and dividing a data set by using a leave-one method to obtain a training set Tr and a test set Ts;

and B: b, preprocessing the gate-level netlist files of the training set Tr and the test set Ts obtained in the step A, and combining a depth-first search algorithm to obtain a training set Tr path set

And path set of test set Ts

And C: constructing and initializing a TextCNN model for detecting and positioning HT, and based on the path set of the training set Tr obtained in the step B

Training;

step D: collecting the paths of the test set Ts obtained in the step B

Inputting the textCNN model trained in the step C to obtain the pre-detectionThe result is;

step E: d, dividing paths of the pre-detection result obtained in the step D and constructing virtual positioning coordinates to obtain a short path set SL for positioning;

step F: and E, inputting the short path set SL obtained in the step E into the textCNN model trained in the step D to obtain a positioning result P.

Further, the step B is specifically as follows:

step B1: traversing the netlist file by using a depth-first search algorithm, and taking a wire network as an intermediary to obtain a tree graph G representing the interconnection relation of different logic gates;

step B2: based on the tree diagram G obtained in the step B1, the situation of the real circuit can be restored and a plurality of label-free paths can be obtained, and then the label-free paths are combined into a label-free path set of the netlist;

step B3: and B1 and B2 are carried out on the gate-level netlist files of the training set Tr and the test set Ts obtained in the step A, and finally the unlabeled path set of the training set Tr and the test set Ts is obtained

And

step B4: labeling the unlabeled path obtained in the step B3 based on the information of the gate-level netlist of the training set Tr and the test set Ts obtained in the step A to obtain a labeled path set of the training set Tr and the test set Ts

And

further, the step C specifically includes:

step C1: path set of training set Tr obtained in step B

Generating a vocabulary table so that the TextCNN model can extract features;

step C2: constructing and initializing a TextCNN model;

step C3: based on the path set of the training set Tr obtained in the step B

The TextCNN model can learn the characteristics of the Trojan path and the Trojan-free path respectively to complete the training of the model.

Further, the step C1 is specifically:

step C11: firstly, the path set of the training set Tr obtained in the step B is collected

Converting into text content;

step C12: reading the words one by one and calculating the frequency of each word based on the text content obtained in the step C11;

step C13: according to the frequency of the words, marking a sequence number for each word from high to low to finish the vectorization representation of the words;

step C14: and packaging the words and the corresponding serial numbers into a dictionary type, writing the dictionary type into a vocabulary file, and finishing the generation of the vocabulary.

Further, the step D specifically includes:

step D1: based on the textCNN model trained in the step C, adding storage operation for the last full connection layer of the model, so as to record a pre-detection result conveniently;

step D2: gathering paths of test set

Inputting the textCNN model trained in the step C to obtain a primary test result set { P_TP，P_FP，P_TN，P_FNIn which P is_TPIs a set of correctly identified trojan paths, P_FPIs a set of Trojan-free paths, P, identified as Trojan-containing_TNIs a correctly identified set of Trojan-free paths, P_FNIs a set of trojan paths identified as Trojan-free;

step D3: initial measurement result set { P) obtained based on step D2_TP，P_FP，P_TN，P_FNSelecting a set P of Trojan horse paths in which to be correctly identified_TPAs a result of the preliminary detection.

Further, the step E specifically includes:

step E1: d, numbering the paths in the pre-detection result obtained in the step D to obtain an original long path set LL for positioning, wherein the original long path set LL is { LL } LL_iTP, which is the set P of correctly identified trojan paths from step D2_TPThe number of paths contained in;

step E2: setting the length of the division cutlen, and enabling the long path LL_iSequentially dividing a group of cutlen logic gates to obtain a plurality of short paths and setting virtual positioning coordinates for the short paths;

step E3: and E2 is executed on each original long path set LL obtained in the step E1 to obtain a short path set SL and a virtual positioning coordinate set, and path division and virtual positioning coordinate construction are completed.

Further, the step E2 specifically includes:

step E21: setting the length of the division cutlen;

step E22: for long path LL_iCalculating the number of short paths num that can be generated after it is divided_iThe formula is as follows:

wherein, length_iIndicating a long path LL_iLength of (d);

step E23: will long path LL_iSequentially dividing a group of cutlen logic gates to obtain a plurality of short paths

Wherein j is the short pathIndex indicating that the jth short path is from the long path LL_LIs divided into (1);

step E24: according to the results of step E22 and step E23, it is a short path

Setting virtual positioning coordinates

To record possible trojan locations, wherein

And

the calculation formula of (a) is as follows:

wherein t is_iIs the original long path LL_iThe t-th division of (1);

step E25: step E24 is repeated to complete num_iAnd setting virtual positioning coordinates of the short path.

Further, the step F specifically includes:

step F1: one path in the short path set SL

Inputting the textCNN model trained in the step D, and predicting the classification result;

step F2: if the prediction result output by the TextCNN model is that the Trojan path exists, the corresponding virtual positioning coordinate is used

Recording the positioning result P;

step F3: and F1 and step F2 are repeated until all the short paths execute the operation, and the final positioning result P is output to finish positioning.

A deep learning based gate level hardware trojan horse positioning system comprising:

the path generation module is used for generating path statements representing circuit routing and comprises a search submodule, a temporary path submodule and a label submodule; firstly, preprocessing an input gate-level netlist file of a training set Tr and a test set Ts, performing depth-first search on the gate-level netlist file through a search submodule to obtain a tree diagram G representing interconnection relations of different logic gates, and then generating a label-free path set of the training set Tr and the test set Ts through a temporary path submodule

And

finally, labeling the non-labeled paths through a label submodule to generate a labeled path set of a training set Tr and a test set Ts

And

the model generation module is used for constructing and training a TextCNN model and comprises a vectorization submodule, a model construction submodule and a model training submodule; first, for the path set of the training set Tr generated by the label module

Generating a vocabulary file through a vectorization sub-module, constructing and initializing a TextCNN model through a model constructing sub-module, and finally gathering paths through a model training sub-module

Inputting and completing modelsTraining;

the pre-detection module is used for obtaining a pre-detection result of the test set Ts and comprises an increase and storage sub-module, a pre-detection sub-module and an output sub-module; firstly, adding storage operation to the last full connection layer of the TextCNN model constructed by the model construction submodule through the adding and storing submodule so as to record a pre-detection result, and then, collecting a path set through the pre-detection submodule

The path in (1) is pre-detected to obtain an initial detection result set { P }_TP，P_FP，P_TN，P_FNH, the set P of Trojan paths to be correctly identified_TPAs the pre-detection result, the output sub-module outputs the pre-detection result;

the path dividing module is used for dividing a result path output by the output module into short paths and reducing the positioning range and comprises a sequencing submodule, a dividing submodule and a quasi-coordinate submodule; for the pre-detection result P output by the output module_TPThe paths are numbered through a sequencing submodule, then are divided into a plurality of short paths through a dividing submodule, and finally a virtual positioning coordinate is set for each short path through a quasi-coordinate submodule;

the positioning module is used for positioning the Trojan horse and comprises a loading submodule and an output submodule; firstly, a short path is loaded into a TextCNN model trained by a model generation module through a loading submodule, a predicted result is output through an output submodule, then a path predicted to be a Trojan is selected, a corresponding virtual positioning coordinate is output, and positioning is finished.

Compared with the prior art, the invention has the following beneficial effects:

1. the invention realizes the detection of the hardware Trojan horse by using the application of the convolutional neural network in text classification;

2. the invention converts the detection problem of the hardware Trojan horse into a two-classification problem, and enables the convolutional neural network to learn the context characteristics of circuit path statements and autonomously explore the characteristics of a Trojan horse path and a Trojan-free path so as to classify. Secondly, on the basis of detection, the positioning of the hardware trojan is explored, the path segmentation technology is considered to be applied to the positioning problem, and the long path in the circuit is divided into a plurality of short paths, so that the positioning range of the hardware trojan is narrowed;

3. the invention can realize further positioning work on the basis of detection, breaks through the situation that the hardware trojan is roughly manufactured and positioned only from the image of the integrated circuit in the past, can realize positioning on the gate level, and more effectively resists the hardware trojan from the design stage of the integrated circuit;

4. the invention can be used in integrated circuit security detection systems to evaluate the security performance of an integrated circuit and even discover and target threats for designers to take steps against such threats, and the like.

Drawings

FIG. 1 is a schematic flow diagram of the process of the present invention;

fig. 2 is a schematic diagram of a system structure according to an embodiment of the present invention.

Detailed Description

The invention is further explained below with reference to the drawings and the embodiments.

Referring to fig. 1, the present invention provides a gate-level hardware Trojan horse positioning method based on deep learning, which includes the following steps:

step A: firstly, acquiring seven public gate-level netlist files, and dividing a data set by using a leave-one-out method to obtain a training set Tr and a test set Ts;

and B: b, preprocessing the gate-level netlist files of the training set Tr and the test set Ts obtained in the step A, and combining a depth-first search algorithm to obtain a path set of the training set Tr and the test set Ts

And

completing the generation of the path;

step B1: traversing the netlist file by using a depth-first search algorithm, and taking a wire mesh as a 'medium' to obtain a tree diagram G representing the interconnection relation of different logic gates;

And

And

and C: constructing and initializing a TextCNN model for detecting and positioning HT, and inputting the path set of the training set Tr obtained in the step B

Completing the construction and training of the model;

step C1: path set of training set Tr obtained in step B

Generating a vocabulary table so that the TextCNN model can extract features;

Converting into text content;

step C12: reading the words one by one and calculating the frequency of each word based on the text content obtained in the step C1;

Step C2: constructing and initializing a TextCNN model;

step C3: based on the path set of the training set Tr obtained in the step B

Step D: collecting the paths of the test set Ts obtained in the step B

Inputting the textCNN model trained in the step C to obtain a pre-detection result;

step D2: gathering paths of test set

Inputting the textCNN model trained in the step C to obtain a primary test result set { P }_TP，P_FP，P_TN，P_FNIn which P is_TPIs a set of correctly identified trojan paths, P_FPIs a set of Trojan-free paths, P, identified as Trojan-containing_TNIs a correctly identified set of Trojan-free paths, P_FNIs a set of trojan paths identified as Trojan-free;

step D3: initial measurement result set { P) obtained based on step D2_TP，P_FP，P_TN，P_FNSelecting only the set P of correctly identified Trojan paths therein_TPAs a result of the pre-detection for subsequent positioning.

step E21: setting the length of the division cutlen;

wherein, length_iIndicating a long path LL_iLength of (d);

Where j is the index of the short path, indicating that the j-th short path is from the long path LL_iIs divided into (1);

step E24: according to the results of step E22 and step E23, it is a short path

Setting virtual positioning coordinates

To record possible trojan locations, wherein

And

the calculation formula of (a) is as follows:

wherein t is_iIs the original long path LL_iThe t-th division of (1);

Step F1: one path in the short path set SL

Recording the positioning result P;

The invention also provides a gate-level hardware Trojan horse positioning system based on deep learning, which comprises:

And

And

Inputting a model and finishing the training of the model;

preliminary detectionThe module is used for obtaining a pre-detection result of the test set Ts and comprises an increase and storage submodule, a pre-detection submodule and an output submodule; firstly, adding storage operation to the last full connection layer of the TextCNN model constructed by the model construction submodule through the adding and storing submodule so as to record a pre-detection result, and then, collecting a path set through the pre-detection submodule

The above description is only a preferred embodiment of the present invention, and all equivalent changes and modifications made in accordance with the claims of the present invention should be covered by the present invention.

Claims

1. A gate-level hardware Trojan horse positioning method based on deep learning is characterized by comprising the following steps:

And path set of test set Ts

Training;

step D: collecting the paths of the test set Ts obtained in the step B

Inputting the textCNN model trained in the step C;

2. The deep learning-based gate-level hardware Trojan horse positioning method according to claim 1, wherein the step B is specifically as follows:

step B3: obtained by step AThe gate-level netlist files of the training set Tr and the test set Ts are subjected to the operations of the step B1 and the step B2, and finally the unlabeled path set of the training set Tr and the test set Ts is obtained

And

And

3. the deep learning based gate-level hardware Trojan horse positioning method according to claim 1, wherein the step C is specifically as follows:

step C1: path set of training set Tr obtained in step B

Generating a vocabulary table so that the TextCNN model can extract features;

step C2: constructing and initializing a TextCNN model;

step C3: based on the path set of the training set Tr obtained in the step B

4. The deep learning based gate-level hardware Trojan horse positioning method of claim 3, wherein: the step C1 specifically includes:

Converting into text content;

5. The deep learning based gate-level hardware Trojan horse positioning method of claim 1, wherein: the step D is specifically as follows:

step D2: gathering paths of test set

C, inputting the textCNN model trained in the step C in a set mode to obtain a primary test result set { P_TP，P_FP，P_TN，P_FNIn which P is_TPIs a set of correctly identified trojan paths, P_FPIs a set of Trojan-free paths, P, identified as Trojan-containing_TNIs a correctly identified set of Trojan-free paths, P_FNIs a set of trojan paths identified as Trojan-free;

6. The deep learning based gate-level hardware Trojan horse positioning method of claim 1, wherein: the step E specifically comprises the following steps:

7. The deep learning based gate-level hardware Trojan horse positioning method of claim 6, wherein: the step E2 specifically includes:

step E21: setting the length of the division cutlen;

wherein, length_iIndicating a long path LL_iLength of (d);

Where j is the index of the short path, indicating that the j-th short path is from the long path LL_iMiddle divisionComing out;

step E24: according to the results of step E22 and step E23, it is a short path

Setting virtual positioning coordinates

To record possible trojan locations, wherein

And

the calculation formula of (a) is as follows:

wherein t is_iIs the t-th partition of the original long path LLi;

8. The deep learning based gate-level hardware Trojan horse positioning method of claim 1, wherein: the step F specifically comprises the following steps:

step F1: one path in the short path set SL

Recording the positioning result P;

9. A gate-level hardware trojan positioning system based on deep learning, comprising:

a path generation module: the path statement is used for generating a path statement for representing circuit routing and comprises a searching submodule, a temporary path submodule and a label submodule; firstly, preprocessing an input gate-level netlist file of a training set Tr and a test set Ts, performing depth-first search on the gate-level netlist file through a search submodule to obtain a tree diagram G representing interconnection relations of different logic gates, and then generating a label-free path set of the training set Tr and the test set Ts through a temporary path submodule

And

finally, labeling the label-free paths through a label submodule to generate a labeled path set of a training set Tr and a test set Ts

And

a model generation module: the system is used for constructing and training a TextCNN model and comprises a vectorization submodule, a model construction submodule and a model training submodule; first, for the path set of the training set Tr generated by the label module

Inputting a model and finishing the training of the model;

a pre-detection module: the system is used for obtaining a pre-detection result of the test set Ts and comprises an increase and storage submodule, a pre-detection submodule and an output submodule; firstly, adding storage operation to the last full connection layer of the TextCNN model constructed by the model construction submodule through the adding and storing submodule so as to record a pre-detection result, and then, collecting a path set through the pre-detection submodule

The path in (1) is pre-detected to obtain an initial detection result set { P }_TP，P_FP，P_TN，P_FNFinally, taking the set PTP of the correctly identified Trojan horse paths as a pre-detection result, and outputting the result by an output sub-module;

a path division module: the device is used for dividing a result path output by the output module into short paths and reducing a positioning range and comprises a sequencing submodule, a dividing submodule and a quasi-coordinate submodule; for the pre-detection result P output by the output module_TPThe paths are numbered through a sequencing submodule, then are divided into a plurality of short paths through a dividing submodule, and finally a virtual positioning coordinate is set for each short path through a quasi-coordinate submodule;

a positioning module: completing the positioning of the Trojan horse, wherein the positioning comprises a loading submodule and an output submodule; firstly, a short path is loaded into a TextCNN model trained by a model generation module through a loading submodule, a predicted result is output through an output submodule, then a path predicted to be a Trojan is selected, a corresponding virtual positioning coordinate is output, and positioning is finished.