CN114722932A

CN114722932A - Commercial cipher algorithm identification method, system, medium, equipment and terminal

Info

Publication number: CN114722932A
Application number: CN202210346681.1A
Authority: CN
Inventors: 向广利; 施奕滨; 袁景凌; 张莎; 李承德; 张凯; 战炳全; 罗凯凡; 张力文
Original assignee: Wuhan University of Technology WUT
Current assignee: Wuhan University of Technology WUT
Priority date: 2022-04-02
Filing date: 2022-04-02
Publication date: 2022-07-08

Abstract

The invention belongs to the technical field of cryptosystem identification, and discloses a commercial cryptographic algorithm identification method, a commercial cryptographic algorithm identification system, a commercial cryptographic algorithm identification medium, a commercial cryptographic algorithm identification device and a commercial cryptographic algorithm identification terminal, wherein the commercial cryptographic algorithm identification method comprises the following steps: quantizing and mapping the ciphertext string to form a ciphertext mapping matrix; carrying out convolution and pooling operations on the ciphertext mapping matrix to obtain final 64-dimensional, 75-dimensional and 192-dimensional ciphertext characteristics; constructing random forest RF; constructing a LeNet5 neural network; after the ciphertext matrix is subjected to regularization processing, a data set is divided into a training set and a test set, and the training set is sent to a LeNet5-RF model for training. The invention innovatively combines a LeNet5 neural network model and a random forest model; and (3) extracting ciphertext features at fine granularity levels by using a convolutional neural network to obtain ciphertext embedding, and improving the accuracy by about 15% compared with the traditional randomness detection and ciphertext entropy feature classification.

Description

Commercial cipher algorithm identification method, system, medium, equipment and terminal

Technical Field

The invention belongs to the technical field of cryptosystem identification, and particularly relates to a commercial cryptographic algorithm identification method, a commercial cryptographic algorithm identification system, a commercial cryptographic algorithm identification medium, commercial cryptographic algorithm identification equipment and a commercial cryptographic algorithm identification terminal based on LeNet 5-RF.

Background

At present, a commercial cryptographic algorithm is a commercial cipher, which refers to a technology capable of realizing functions of encryption, decryption, authentication and the like of the commercial cryptographic algorithm, and includes cryptographic algorithm programming technology, cryptographic algorithm chip, encryption card and other implementation technologies. For guaranteeing the safety of information transmission in the fields of finance, medical treatment and the like, a national commercial password management office establishes a series of password standards, including SM1(SCB2), SM2, SM3, SM4, SM7, SM9, ZUC (ZUC) and the like, wherein SSF33, SM1, SM4 and SM7 are symmetric algorithms, SM2 and SM9 are asymmetric algorithms, and SM3 is a hash algorithm.

Cryptanalysis (cryptoanalysis) is a discipline that studies attacks, deciphers, and falsifications of encrypted messages, and its main subjects are encryption algorithms and ciphertexts. Since the birth of cryptography, the cryptanalysis technology is continuously improved, and is complementary with the research of cryptology, so that a perfect system is formed at present. In order to specifically make a solution for ciphertext data analysis, identifying the encryption algorithm to which the ciphertext belongs becomes a primary task faced by ciphertext data analysts. Therefore, the development of the encryption algorithm identification research has important theoretical significance and practical application value.

Machine learning is one of the important methods for developing cryptosystem identification. Machine learning or statistical design schemes are mostly adopted for ciphertext algorithm identification at present. Machine learning forms specific attributes by receiving externally input data, learns the relevance from a large amount of training data according to a certain algorithm, and obtains deep, efficient and understandable knowledge so as to predict or classify a new sample. And complicated retrieval and optimization problems exist in the cryptoalgorithm analysis, and the cryptoalgorithm analysis has higher degree of contact with machine learning. The scholars propose that a support vector machine is used for identifying five encryption algorithms of DES, AES, Blowfish, TDES and RC5 through the histogram characteristics of texts, pictures and audio files; scholars propose a cryptographic algorithm recognition technology based on Bayes decision; the learner uses the C4.5 decision tree algorithm to identify 6 commonly used block ciphers, two classical ciphers, RC4, and RSA ciphers; the scholars put forward a hierarchical classification mode of a symmetric encryption algorithm, put emphasis on dimension reduction and feature extraction to be innovatively described, and also put forward a non-parameter feature extraction angle, so that the method has a great inspiring effect on the patent. Some scholars perform random metric value analysis on the five block ciphers and provide a scheme of layered identification. The learners propose a block cipher system identification scheme based on randomness test, summarize the randomness characteristics of the predecessors and have a guiding function. The inventor invented an image encryption patent combining a convolutional neural network and a commercial cryptographic algorithm, in which a part of the commercial cryptographic algorithm was well processed.

Although more and more researchers apply the machine learning method to the field of cryptosystem recognition, the current research only mode-based puts the cryptosystem recognition task into the frame of the machine learning classification task, ignores the particularity of the cryptosystem and the ciphertext, and needs to consider two parts of ciphertext feature extraction and single recognition. However, most of the research is still only carried out from machine learning, so that the accuracy of cryptosystem identification needs to be improved. And lack specific research into commercial cryptographic algorithm identification based on neural networks. Therefore, a new commercial cryptographic algorithm identification method and system are needed to overcome the shortcomings of the prior art.

Through the above analysis, the problems and defects of the prior art are as follows:

(1) the current research only modernizes the recognition task of the cryptosystem to be nested into the frame of the machine learning classification task, and ignores the particularity of the cryptosystem and the ciphertext.

(2) Most of the research still only develops from machine learning, so that the accuracy of cryptosystem identification needs to be improved.

(3) The existing commercial cipher algorithm recognition and analysis has the problems of low efficiency, low accuracy, less effective recognition information and the like, and specific research of commercial cipher algorithm recognition based on a neural network is lacked.

The difficulty in solving the above problems and defects is: a model capable of effectively classifying commercial cryptographic algorithms is built in the system; system building of a network model and selection planning of feature extraction; and (4) preprocessing data.

The significance of solving the problems and the defects is as follows: the accuracy and the efficiency of the commercial cryptographic algorithm identification of the patent are improved; the data applicability of the commercial cryptographic algorithm is improved.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a commercial cipher algorithm identification method, a commercial cipher algorithm identification system, a commercial cipher algorithm identification medium, a commercial cipher algorithm identification device and a commercial cipher algorithm identification terminal, in particular to a commercial cipher algorithm system identification method, a commercial cipher algorithm identification system, a commercial cipher algorithm identification medium, a commercial cipher algorithm identification device and a commercial cipher algorithm identification terminal based on LeNet5-RF algorithm under a ciphertext scene, and aims to solve the problems of low efficiency, low accuracy, less effective identification information and the like in the existing commercial cipher algorithm identification and analysis.

The invention is realized in such a way that a commercial cryptographic algorithm identification method based on LeNet5-RF comprises the following steps: in the safety data detection of the bank electronic account system, a LeNet5 neural network model and a random forest model are used for training a commercial cipher algorithm, a ciphertext classification platform is built on the safety system, and the safety of various data adopting digital signature systems is enhanced. The method specifically comprises the following steps:

step one, ciphertext preprocessing: after converting a ciphertext to be identified submitted by a user into a 01 string, quantizing and mapping the ciphertext string to form a ciphertext mapping matrix;

step two, CnPo feature extraction: carrying out convolution and pooling operations on the ciphertext mapping matrix obtained by preprocessing to obtain final ciphertext features of 64-dimensional CnPo _64, 75-dimensional CnPo _75 and 192-dimensional CnPo _ 192;

step three, constructing random forest RF;

step four, constructing a LeNet5 neural network;

step five, data processing: after the ciphertext matrix is regularized, according to the following steps of 8: 2, the data set is divided into a training set and a testing set, and the training set is sent to a LeNet5-RF model for training.

Further, in the first step, the commercial cryptographic algorithm system of the ciphertext file to be identified includes at least one of SM2, SM3, and SM 4.

The ciphertext preprocessing comprises the following steps:

(1) partitioning an original ciphertext into blocks according to 8 bits, 16 bits or 32 bits by adopting simple linear transformation;

(2) quantizing the divided ciphertext blocks by adopting accumulation summation to obtain 1024-dimensional converted ciphertext data;

(3) and sequentially dividing the ciphertext data to obtain a ciphertext mapping matrix with the size of 32 multiplied by 32.

Further, in step two, the CnPo feature extraction includes:

(1) for each ciphertext file, performing convolution once on a 32 × 32 ciphertext matrix obtained by preprocessing, wherein an input channel is 1, and an output channel is 3;

(2) obtaining 3 8 multiplied by 8 matrixes by twice pooling downsampling;

(3) the matrix is flattened to obtain 192-dimensional feature vectors.

Further, in step three, the constructing the random forest RF includes:

(1) training the model by using a training set, and testing the accuracy of model prediction by using a test set;

(2) optimizing parameters of the model, selecting parameters which can enable the accuracy rate to be high and the time performance to be good, and selecting proper feature dimensions according to different encryption algorithms;

(3) and storing the trained model for later use.

Further, in step four, the constructing the LeNet5 neural network includes:

(1) inputting a 32 × 32 × 1 gray image, and performing a first convolution C1 by a 6 × 5 × 5 filter to obtain 28 × 28 × 6 output;

(2) sending the output to a pooling layer P1 for down-sampling of 2 × 2, and performing a second convolution with a 16 × 5 × 5 filter C2 to obtain an output of 10 × 10 × 16;

(3) sending the output to a P2 pooling layer which is the same as P1, and unfolding to obtain a 400-dimensional feature vector;

(4) and (4) completing three times of full connection operation, finally obtaining 10-dimensional output, and classifying by using softmax to obtain corresponding weight.

Further, in the fifth step, the minimum value x of the single ciphertext bit is x ═ block × 1024/8000; wherein Block takes 8, 16, 32 … ….

Another object of the present invention is to provide a LeNet5-RF based commercial cryptographic algorithm recognition system applying the LeNet5-RF based commercial cryptographic algorithm recognition method, wherein the LeNet5-RF based commercial cryptographic algorithm recognition system comprises:

the ciphertext preprocessing module is used for converting a ciphertext to be identified submitted by a user into a 01 string, and then carrying out quantization and mapping preprocessing on the ciphertext string to form a ciphertext mapping matrix;

the CnPo characteristic extraction module is used for performing convolution and pooling operations on the ciphertext mapping matrix obtained through preprocessing to obtain final 64-dimensional CnPo _64, 75-dimensional CnPo _75 and 192-dimensional CnPo _192 ciphertext characteristics;

the random forest constructing module is used for constructing random forest RF;

the neural network construction module is used for constructing a LeNet5 neural network;

and the data processing module is used for regularizing the ciphertext matrix according to the following steps of 8: 2, the data set is divided into a training set and a testing set, and the training set is sent to a LeNet5-RF model for training.

It is a further object of the invention to provide a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of:

after a ciphertext submitted by a user and to be identified is converted into a 01 string, carrying out quantization and mapping pretreatment on the ciphertext string to form a ciphertext mapping matrix; carrying out convolution and pooling operations on the ciphertext mapping matrix obtained by preprocessing to obtain final ciphertext features of 64-dimensional CnPo _64, 75-dimensional CnPo _75 and 192-dimensional CnPo _ 192; constructing random forest RF; constructing a LeNet5 neural network; after the ciphertext matrix is regularized, according to the following steps of 8: 2, the data set is divided into a training set and a testing set, and the training set is sent to a LeNet5-RF model for training.

It is another object of the present invention to provide a computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:

after converting a ciphertext to be identified submitted by a user into a 01 string, carrying out quantization and mapping pretreatment on the ciphertext string to form a ciphertext mapping matrix; carrying out convolution and pooling operations on the ciphertext mapping matrix obtained by preprocessing to obtain final ciphertext features of 64-dimensional CnPo _64, 75-dimensional CnPo _75 and 192-dimensional CnPo _ 192; constructing random forest RF; constructing a LeNet5 neural network; after the ciphertext matrix is regularized, according to the following steps of 8: 2, the data set is divided into a training set and a testing set, and the training set is sent to a LeNet5-RF model for training.

Another object of the present invention is to provide an information data processing terminal for implementing the commercial cryptographic algorithm identification system.

By combining all the technical schemes, the invention has the advantages and positive effects that:

firstly, preprocessing ciphertext data to enable a large amount of ciphertext data to meet the requirements of feature extraction and model training; convolution and pooling in feature extraction not only well extract features, but also achieve the effects of reducing dimension and preventing overfitting; the LeNet5 network structure and the random forest structure are organically combined, and the advantages of high efficiency and high accuracy of the LeNet5 network structure and the random forest structure are considered. The CnPo suitable for the commercial cipher ciphertext is adopted for feature extraction, and compared with other features, the feature extraction method has higher recognition degree and reliability.

Secondly, compared with the traditional block cipher text recognition model, the commercial cipher algorithm recognition method provided by the invention innovatively combines the LeNet5 neural network model and the random forest model. The method uses the convolutional neural network to extract the ciphertext characteristics at a fine granularity level to obtain ciphertext imbedding. Meanwhile, compared with the traditional randomness detection and ciphertext entropy feature classification accuracy rate, the method has the advantage that the accuracy rate is improved by about 15%. Compared with other ciphertext classification algorithms, the second classification is improved by about 15%, and the multi-classification is improved by 8%.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments of the present invention will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow chart of a commercial cipher algorithm identification method based on LeNet5-RF according to an embodiment of the present invention;

FIG. 2 is a block diagram of a commercial cryptographic algorithm identification system based on LeNet5-RF according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating 4 processes for ciphertext preprocessing according to an embodiment of the present invention;

FIG. 4 is a graph of the rate of change of accuracy for different block sizes provided by an embodiment of the present invention;

FIG. 5 is a schematic diagram of a CnPo _192 extraction process provided by an embodiment of the invention;

FIG. 6 is a CnPo _192 eigenvalue distribution graph provided by an embodiment of the present invention;

FIG. 7 is a schematic diagram of a random forest parameter adaptive process provided in an embodiment of the present invention;

FIG. 8 is a flow chart of neural network training provided by an embodiment of the present invention;

fig. 9 is a schematic structural diagram of a LeNet5 neural network provided by an embodiment of the present invention;

FIG. 10 is a flow chart of data processing provided by an embodiment of the present invention;

FIG. 11 is a schematic diagram of a process for generating a decision tree in a random forest according to an embodiment of the present invention;

FIG. 12 is a schematic diagram of the ratio of each data set of the raw data according to an embodiment of the present invention;

in the figure: 1. a ciphertext preprocessing module; 2. a CnPo feature extraction module; 3. a random forest construction module; 4. a neural network construction module; 5. and a data processing module.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In view of the problems in the prior art, the present invention provides a commercial cipher algorithm identification method, system, medium, device and terminal, and the present invention is described in detail below with reference to the accompanying drawings.

As shown in fig. 1, the commercial cryptographic algorithm identification method based on LeNet5-RF according to the embodiment of the present invention includes the following steps:

s101, ciphertext preprocessing: after converting a ciphertext to be identified submitted by a user into a 01 string, quantizing and mapping the ciphertext string to form a ciphertext mapping matrix;

and converting a ciphertext training set provided by a user into a 01 bit stream in a coding and decoding mode, wherein the ciphertext training set comprises ciphertext formed by media such as text data, images, videos, audios and the like. And normalizing the data, and forming a ciphertext mapping matrix according to the mapping relation to adapt to the input requirements of subsequent convolution and pooling.

S102, CnPo feature extraction: carrying out convolution and pooling operations on the ciphertext mapping matrix obtained by preprocessing to obtain final ciphertext features of 64-dimensional CnPo _64, 75-dimensional CnPo _75 and 192-dimensional CnPo _ 192;

and taking the ciphertext mapping matrix as input, carrying out convolution calculation according to the selected convolution kernel which is also in a matrix form to obtain a result matrix, and playing a role in extracting the features. And performing pooling operation, and calculating the result matrix and the pooling layer to obtain dimension-reduced ciphertext features of 64 dimensions CnPo _64, 75 dimensions CnPo _75 and 192 dimensions CnPo _ 192.

S103, constructing a random forest RF;

firstly, randomly and repeatedly extracting training samples from a ciphertext training set, creating a plurality of decision trees, and combining all decision tree models.

S104, constructing a LeNet5 neural network;

the input, convolutional, pooling, and full-link layers of the LeNet5 network were built for eight layers total.

S105, data processing: after the ciphertext matrix is regularized, according to the following steps of 8: 2, the data set is divided into a training set and a testing set, and the training set is sent to a LeNet5-RF model for training.

And replacing the last full connection layer of the LeNet5 network structure with a random forest model to form a LeNet5-RF model. After the ciphertext matrix is subjected to regularization processing, according to the following steps of 8: the ratio of 2 will be divided into training set and test set, and sent to LeNet5-RF model for training.

As shown in fig. 2, the LeNet 5-RF-based commercial cryptographic algorithm identification system provided by the embodiment of the present invention includes:

the ciphertext preprocessing module 1 is used for converting a ciphertext submitted by a user to be identified into a 01 string, and then carrying out quantization and mapping preprocessing on the ciphertext string to form a ciphertext mapping matrix;

the CnPo characteristic extraction module 2 is used for performing convolution and pooling operations on the ciphertext mapping matrix obtained through preprocessing to obtain final 64-dimensional CnPo _64, 75-dimensional CnPo _75 and 192-dimensional CnPo _192 ciphertext characteristics;

the random forest constructing module 3 is used for constructing random forest RF;

the neural network construction module 4 is used for constructing a LeNet5 neural network;

and the data processing module 5 is configured to perform regularization processing on the ciphertext matrix, and then perform regularization processing on the ciphertext matrix according to the following steps of 8: 2, the data set is divided into a training set and a testing set, and the training set is sent to a LeNet5-RF model for training.

The technical solution of the present invention is further described below with reference to specific examples.

Example 1

Aiming at the problems of low efficiency, low accuracy, less effective identification information and the like of the existing commercial cipher algorithm identification and analysis, the invention provides a commercial cipher algorithm system identification method based on LeNet5-RF algorithm.

The invention provides a commercial cipher algorithm system identification method based on LeNet5-RF algorithm, which comprises the following steps:

step 1, ciphertext preprocessing: firstly, converting a ciphertext submitted by a user into a 01 string, and then quantizing and mapping the ciphertext string to form a ciphertext mapping matrix;

step 2, CnPo feature extraction: carrying out convolution and pooling operations on the ciphertext mapping matrix obtained by preprocessing to obtain final 64-dimensional (CnPo _64), 75-dimensional (CnPo _75) and 192-dimensional (CnPo _192) ciphertext features;

step 3, constructing a Random Forest (RF);

step 4, constructing a LeNet5 neural network;

and 5, data processing: firstly, carrying out regularization processing on a ciphertext matrix, and then, according to the following steps of 8: 2, the data set is divided into a training set and a testing set, and the training set is sent to a LeNet5-RF model for training. The minimum value x of the individual ciphertexts bit is x block × 1024/8000. Wherein Block takes 8, 16, 32 … ….

Further, the commercial cryptographical algorithm system of the ciphertext file to be identified comprises at least one of SM2, SM3 and SM 4.

Further, the step 1 comprises:

step 1.1: adopting simple linear transformation, firstly partitioning an original ciphertext into blocks according to 8bit, 16bit or 32 bit;

step 1.2: quantizing the divided ciphertext blocks by adopting accumulation summation to obtain 1024-dimensional converted ciphertext data;

step 1.3: and sequentially dividing the ciphertext data to obtain a ciphertext mapping matrix with the size of 32 multiplied by 32.

Further, the step 2 comprises:

step 2.1: for each ciphertext file, performing convolution once on a 32 × 32 ciphertext matrix obtained by preprocessing, wherein an input channel is 1, and an output channel is 3;

step 2.2: obtaining 3 8 multiplied by 8 matrixes through twice pooling downsampling;

step 2.3: and flattening the matrix to obtain 192-dimensional feature vectors.

Further, the step 3 comprises:

step 3.1: training the model by using a training set, and testing the accuracy of model prediction by using a test set;

step 3.2: optimizing parameters of the model, selecting parameters which can enable the accuracy rate to be high and the time performance to be good, and selecting proper feature dimensions according to different encryption algorithms;

step 3.3: and storing the trained model for later use.

Further, the step 4 comprises:

step 4.1: inputting a 32 × 32 × 1 gray image, and performing first convolution (C1) by using a 6 × 5 × 5 filter to obtain 28 × 28 × 6 output;

step 4.2: sending the output to a pooling layer (P1), performing 2 × 2 down-sampling, and performing a second convolution (C2) with a 16 × 5 × 5 filter to obtain a 10 × 10 × 16 output;

step 4.3: sending the output to a P2 pooling layer which is the same as the P1, and unfolding to obtain a 400-dimensional feature vector;

step 4.4: and (4) completing three times of full connection operation, finally obtaining 10-dimensional output, and classifying by using softmax to obtain corresponding weight.

Compared with the traditional block cipher text recognition model, the invention innovatively combines the LeNet5 neural network model and the random forest model. And extracting ciphertext features at fine granularity levels by using a convolutional neural network to obtain ciphertext embedding. Compared with the traditional randomness detection and ciphertext entropy characteristic classification accuracy rate, the method improves the accuracy rate by about 15%.

Example 2

The commercial cipher algorithm system identification method based on LeNet5-RF algorithm provided by the embodiment of the invention comprises the following steps:

s101, ciphertext preprocessing: firstly, a ciphertext submitted by a user is converted into a 01 string, and then the ciphertext string is quantized and mapped to form a ciphertext mapping matrix. The flow is shown in FIG. 3;

specifically, ciphertext preprocessing mainly involves selecting the size of a ciphertext block. According to the 8, 16 and 32bit blocking methods, corresponding ciphertext matrixes are respectively generated for experiments, and the optimal blocking is selected according to a loss function. The method selects a LeNet5 network model to execute SM2-SM3 two classification tasks, tests the accuracy of the 3-block method, and the experimental result is shown in figure 4;

as can be seen from the figure, the block division effect of 8 bits is better, and the training completion time and the accuracy are better than those of 16 and 32bit block division. In addition, the size of the ciphertext required by 8-bit block processing is minimum, and classification under a complex ciphertext scene is facilitated, so that the 8-bit block processing method is used for ciphertext preprocessing subsequently.

S102, CnPo feature extraction: and performing convolution and pooling operations on the ciphertext mapping matrix obtained by preprocessing to obtain final 64-dimensional (CnPo _64), 75-dimensional (CnPo _75) and 192-dimensional (CnPo _192) ciphertext features. The CnPo _192 extraction process is shown in FIG. 5;

specifically, to illustrate the better discrimination of CnPo features between different encryption algorithms, fig. 6 shows the distribution of a one-dimensional feature of CnPo _192 for two about 3600 encrypted ciphertexts, SM2 and SM 3. It can be seen that the eigenvalues of the two ciphertexts have a clear boundary, the eigenvalue of SM2 is concentrated in the range of 0-0.75, and the eigenvalue of SM3 is concentrated in the range of-0.75-0.

S103, constructing a Random Forest (RF);

specifically, the Random-Forest-Classifier and the Decision-Tree-Classifier in the sklern are adopted to construct the Random Forest. Gini index is used as a criterion for each set partition. The training set divided before is transmitted to train, and then the trained model is tested by using the test set to obtain the prediction accuracy of the model;

fig. 7 shows the adaptation process of the maximum depth of each decision tree, the number of decision trees and the minimum information gain. The three coordinate axes respectively represent three parameters of ne, mid and md, the color of the point represents the accuracy, the darker the color is, the higher the accuracy is represented, the lighter the accuracy is, and the accuracy range is 55-99%.

S104, constructing a LeNet5 neural network, wherein the neural network training process is shown in FIG. 8;

specifically, the invention determines that the sigmoid function used by the original model is selected as the activation function, the maximum pooling layer is selected as the pooling layer, the loss function is selected as Cross EntropyLoss, and the optimizer is selected as Adam. The LeNet5 neural network structure is shown in FIG. 9;

s105, data processing: firstly, carrying out regularization processing on a ciphertext matrix, and then, according to the following steps of 8: 2, the data set is divided into a training set and a testing set, and the training set is sent to a LeNet5-RF model for training. The minimum value x of the single ciphertext bit is x block × 1024/8000. Wherein Block was taken to be 8. The data processing flow chart is shown in fig. 10.

Specifically, the block cryptosystem of the ciphertext file to be identified comprises at least one of SM2, SM3 and SM 4;

compared with the traditional block cipher text recognition model, the invention innovatively combines the LeNet5 neural network model and the random forest model. And extracting ciphertext features at a fine granularity level by using a convolutional neural network to obtain ciphertext embedding. Compared with the traditional randomness detection and ciphertext entropy characteristic classification accuracy rate, the method improves the accuracy rate by about 15%.

On the basis of the above embodiment, the embodiment of the present invention further optimizes the step S101, and specifically includes:

s1011: adopting simple linear transformation, firstly partitioning an original ciphertext into blocks according to 8bit, 16bit or 32 bit;

s1012: quantizing the divided ciphertext blocks by adopting accumulation summation to obtain 1024-dimensional converted ciphertext data;

s1013: and sequentially dividing the ciphertext data to obtain a ciphertext mapping matrix with the size of 32 multiplied by 32.

On the basis of the above embodiment, the embodiment of the present invention further optimizes the step S102, and specifically includes:

s1021: for each ciphertext file, performing convolution once on a 32 × 32 ciphertext matrix obtained by preprocessing, wherein an input channel is 1, and an output channel is 3;

s1022: obtaining 3 8 multiplied by 8 matrixes through twice pooling downsampling;

s1023: and flattening the matrix to obtain 192-dimensional feature vectors.

On the basis of the foregoing embodiment, the embodiment of the present invention further optimizes step S103, and specifically includes:

s1031: training the model by using a training set, and testing the accuracy of model prediction by using a test set;

s1032: optimizing parameters of the model, selecting parameters which can enable the accuracy rate to be high and the time performance to be good, and selecting proper feature dimensions according to different encryption algorithms;

s1033: and storing the trained model for later use. The generation process of a decision tree in a random forest is shown in fig. 11;

on the basis of the above embodiment, the embodiment of the present invention further optimizes the step S104, and specifically includes:

s1041: inputting a 32 × 32 × 1 gray image, and performing first convolution (C1) by using a 6 × 5 × 5 filter to obtain 28 × 28 × 6 output;

s1042: sending the output to a pooling layer (P1), performing 2 × 2 down-sampling, and performing a second convolution (C2) with a 16 × 5 × 5 filter to obtain a 10 × 10 × 16 output;

s1043: sending the output to a P2 pooling layer which is the same as the P1, and unfolding to obtain a 400-dimensional feature vector;

s1044: and (4) completing three times of full connection operation, finally obtaining 10-dimensional output, and classifying by using softmax to obtain corresponding weight.

The invention adopts LeNet5-RF algorithm to construct a classifier, takes the extracted packet ciphertext characteristic data as the input of the classifier, and finally completes the recognition task of a packet cryptosystem through the training and testing of a classification model.

In practical applications, ciphertext algorithm recognition is generally used in the fields of financial security, network security, information security and the like. The protection of privacy, the security of the digital signature system and the like are enhanced by the identification of commercial cryptographic algorithms.

The technical solution of the present invention is further described below with reference to simulation experiments.

In order to verify the effectiveness of the block cipher system identification method based on the improved random forest algorithm, the following verification experiments are provided.

Data preparation

In view of the difficulty in obtaining a large amount of encrypted ciphertext data on the network and the problems of data security, algorithm security and the like, the invention self-prepares a large amount of ciphertext data to complete model training. The selection of the "plaintext" (hereinafter referred to as raw data) data set of these ciphertexts refers to the data set used by the partial literature. In consideration of the fact that most ciphertext data in real life are obtained by encrypting multimedia such as texts, pictures, audio and the like, the data set used in the model training of the invention also comprises the original data in the formats. Considering that the information contained in the pictures and audio files has more channel characteristics, in order to avoid the learning of the model to the channel, the data for training the model emphasizes the use of various audio and picture raw data. All the original data are downloaded from the corresponding official website, and the corresponding address is given. The original data are encrypted by three encryption algorithms of SM2, SM3 and SM4, and finally a ciphertext data set of about 49.5G is obtained, and about 18.76 thousands of ciphertext files are obtained. The ratios of the data sets of the raw data are shown in fig. 12.

The invention integrates the encryption toolkits provided by the current large open source platforms. The symmetric encryption and Hash encryption interface is realized based on a Cipher plate in a Cryptodome third-party library. The character string cryptograph is subjected to byte conversion before encryption, and different processing is performed on the encrypted object according to different encryption properties. Different symmetric encryption objects set different encryption block sizes and key lengths. The finally obtained ciphertext is a 16-system text file.

According to a ciphertext picture, audio and text mixing ratio of 10: 10: a ratio blend of 1 was tested, which was the same as the training data set blend. In addition, the method crawls partial pictures and texts on the network to be encrypted and then tests.

(II) evaluation criteria for classification results

In the cipher system identification task researched by the invention, attention is paid to the accuracy of the classifier on all cipher system identifications, and the precision/recall ratio and F1 are used as standards for evaluating the performance of the classifier.

Precision/recall and F1, the model is evaluated using a confusion matrix, the purpose of which is to obtain an effective metric for the classification model by showing the correct and predicted classification numbers for each encryption algorithm. The confusion matrix yields a total of four results, TP (true positive), TN (true negative), FP (false positive) and FN (false negative), as shown in table 1.

TABLE 1 Classification confusion matrix

And selecting cosine similarity for comparison according to specific feature discrimination. The index is only used for auxiliary explanation in the experimental process, and the PCA is directly used for reducing the dimension of the CnPo characteristic in practical application.

(III) results of the experiment

TABLE 2 ciphertext feature accuracy comparison

As can be seen from Table 2, compared with randomness detection and entropy characteristics used in the existing literature, the CnPo characteristics provided by the method are higher than the randomness detection and entropy characteristics by about 10-20% on average in accuracy rate, and have better classification effect. Compared with the following, the CnPo feature extraction method provided by the method completely exceeds the previous literature in effect.

TABLE 3 random forest dichotomy test results

TABLE 4 random forest trisection test results

Classification	LeNet5-RF
		SM2,SM3,SM4	91.34％

As can be seen from Table 3, the LeNet5-RF classification method has high accuracy, which exceeds 95%, and this also exceeds the average accuracy of other ciphertext classification algorithms by 88%. As can be seen from Table 4, the accuracy of the LeNet5-RF classification method is high, exceeding 90%, which also exceeds the average accuracy of 82% of other ciphertext classification algorithms.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When used in whole or in part, can be implemented in a computer program product that includes one or more computer instructions. When loaded or executed on a computer, cause the flow or functions according to embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL), or wireless (e.g., infrared, wireless, microwave, etc.)). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

The above description is only for the purpose of illustrating the embodiments of the present invention, and the scope of the present invention should not be limited thereto, and any modifications, equivalents and improvements made by those skilled in the art within the technical scope of the present invention as disclosed in the present invention should be covered by the scope of the present invention.

Claims

1. A commercial cipher algorithm identification method based on LeNet5-RF is characterized in that the commercial cipher algorithm identification method based on LeNet5-RF comprises the following steps: in the safety data detection of the bank electronic account system, a LeNet5 neural network model and a random forest model are used for training a commercial cryptographic algorithm, a ciphertext classification platform is built, and the safety of various data adopting a digital signature system is enhanced.

2. The LeNet5-RF based commercial cipher algorithm recognition method of claim 1, wherein the LeNet5-RF based commercial cipher algorithm recognition method comprises the steps of: step one, ciphertext preprocessing: after converting a ciphertext to be identified submitted by a user into a 01 string, quantizing and mapping the ciphertext string to form a ciphertext mapping matrix;

step three, constructing random forest RF;

step four, constructing a LeNet5 neural network;

3. The LeNet 5-RF-based commercial cryptographic algorithm recognition method of claim 2, wherein in the first step, the commercial cryptographic algorithm system of the ciphertext file to be recognized includes at least one of SM2, SM3 and SM 4;

the ciphertext preprocessing comprises the following steps:

(3) sequentially dividing the ciphertext data to obtain a ciphertext mapping matrix with the size of 32 multiplied by 32;

in the second step, the CnPo feature extraction comprises the following steps:

(2) obtaining 3 8 multiplied by 8 matrixes through twice pooling downsampling;

(3) the matrix is flattened to obtain 192-dimensional feature vectors.

4. The LeNet5-RF based commercial cipher algorithm recognition method of claim 2, wherein in step three, the constructing random forest RF comprises:

(3) and storing the trained model for later use.

5. The LeNet5-RF based commercial cipher algorithm recognition method of claim 2, wherein in step four, the constructing LeNet5 neural network comprises:

(1) inputting a 32 × 32 × 1 gray image, and performing a first convolution on the gray image through a 6 × 5 × 5 filter to obtain 28 × 28 × 6 output;

(2) sending the output to a pooling layer P1 for 2 × 2 down-sampling, and performing a second convolution C2 by a 16 × 5 × 5 filter to obtain a 10 × 10 × 16 output;

(3) sending the output to a P2 pooling layer which is the same as the P1, and unfolding to obtain a 400-dimensional feature vector;

6. The LeNet 5-RF-based commercial cipher algorithm recognition method of claim 2, wherein in step five, the minimum value x of a single cipher text bit is x block x 1024/8000; wherein Block takes 8, 16 and 32.

7. A commercial cipher algorithm recognition system based on LeNet5-RF, which applies the commercial cipher algorithm recognition method based on LeNet5-RF according to any one of claims 1 to 6, wherein the commercial cipher algorithm recognition system based on LeNet5-RF comprises:

and the data processing module is used for regularizing the ciphertext matrix and then performing the regularization processing according to the following steps of 8: 2, the data set is divided into a training set and a testing set, and the training set is sent to a LeNet5-RF model for training.

8. A computer device, characterized in that the computer device comprises a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to carry out the steps of:

9. A computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:

10. An information data processing terminal characterized by being used to implement the commercial cryptographic algorithm recognition system of claim 7.