CN117634501A

CN117634501A - Computer file confidentiality checking method and system

Info

Publication number: CN117634501A
Application number: CN202410089364.5A
Authority: CN
Inventors: 路成刚; 胡现龙
Original assignee: Qingdao University of Technology
Current assignee: Qingdao University of Technology
Priority date: 2024-01-23
Filing date: 2024-01-23
Publication date: 2024-03-01

Abstract

The invention relates to the technical field of code security measures, in particular to a computer file confidentiality checking method and system, comprising the following steps: based on deep learning, a transformer algorithm and a convolutional neural network are adopted to conduct deep semantic analysis and key feature extraction on text and image content, and feature data integration is conducted to generate text and image feature data. According to the invention, through deep learning, a converter algorithm and an image processing technology, deep semantic analysis and key feature extraction of texts and images are realized, the accuracy of judging secret information in the files is improved by combining natural language processing and an image recognition algorithm, the false alarm rate and the missing rate are reduced, an anti-network simulation mechanism is generated, the early warning capability of potential security holes and attack behaviors of the system is enhanced, the monitoring and prevention effects of data leakage are improved by a data flow analysis method, the file integrity verification is carried out by combining a blockchain technology, and the file security and the non-tamper modification are increased.

Description

Computer file confidentiality checking method and system

Technical Field

The present invention relates to the field of code security measures, and in particular, to a method and a system for checking confidentiality of a computer file.

Background

The technical field of code security measures is an important branch of the field of information security, focusing on ensuring the security of software, applications and computer systems. It encompasses a series of measures and techniques to prevent potential threats (including hackers, malware, internal threats, etc.) from gaining unauthorized access, information or control from code, applications or systems. This area encompasses a number of aspects including authentication, authorization, encryption, vulnerability analysis, auditing, code analysis, document confidentiality, etc.

Computer file security inspection methods are part of the code security technology field and are a group of techniques and programs for inspecting and protecting computer files, including source code, configuration files, data files, etc. of application programs. The primary task of these methods is to ensure that these files are not accessed, viewed, copied, or modified by unauthorized users or systems. Its purposes include protecting risk information, intellectual property, reducing legal and business risks, and ensuring compliance. The methods achieve the effect by implementing means such as access control, code audit, encryption, security development practice, alarm monitoring and the like, thereby ensuring confidentiality of computer files, reducing potential threats and risks, and protecting information security and compliance. This is especially important in the context of today's growing network threats and regulatory requirements.

Most of the existing computer file security checking methods are based on traditional text and image analysis technologies, and the technologies often have limitations on complex and deep semantic information extraction, so that larger false alarm or omission exists in the determination of the security information. At the same time, many approaches have not incorporated the generation of a challenge network to simulate a system vulnerability, such that the identification and response to potential threats is not rapid enough. In addition, the data flow analysis stays at a relatively primary stage in many existing methods, and full-chain flow of data is not monitored comprehensively, so that the risk of data leakage is increased. The file integrity verification is not generally distributed by adopting a blockchain technology, is easily threatened by a centralized server, and reduces the reliability of the integrity verification.

Disclosure of Invention

The invention aims to solve the defects in the prior art and provides a computer file confidentiality checking method and system.

In order to achieve the above purpose, the present invention adopts the following technical scheme: a computer file security check method comprising the steps of:

s1: based on deep learning, adopting a converter algorithm and a convolutional neural network to perform deep semantic analysis and key feature extraction on text and image content, and performing feature data integration to generate text and image feature data;

S2: based on the text and image characteristic data, adopting a natural language processing technology and an image recognition algorithm to judge the secret information of the text and the image in the file, and carrying out classification processing to generate a secret information report;

s3: based on the secret information report, generating an countermeasure network, simulating potential security vulnerabilities and attack behaviors, enhancing a pre-warning mechanism of a system, and integrating vulnerability data to generate security vulnerability simulation data;

s4: based on the security hole simulation data, adopting a data flow analysis algorithm to dynamically monitor and statically analyze file operation and system behavior, searching potential data leakage points, and integrating data flows to generate a data flow analysis report;

s5: based on the original data of the file, carrying out integrity check on the file by adopting an SHA-256 hash algorithm, comparing the hash value with the hash value stored on the block chain, determining the integrity of the file, and generating an integrity check record;

s6: combining the integrity check record, adopting a role-based access control and dynamic encryption technology to carry out access control and encryption processing on the file, setting access rights and generating an access control and encryption strategy;

The converter algorithm is specifically a BERT and GPT series model and is used for semantic understanding of text contents, the convolutional neural network is used for extracting key features from images, the natural language processing technology is used for identifying text type secret information comprising private API keys and passwords, the image identification algorithm is used for identifying secret contents in the images, the data flow analysis algorithm is specifically used for tracking flow and storage paths of secret data in a system and identifying illegal access or falsification attempts, the dynamic encryption technology comprises AES (advanced encryption standard) for encrypting the files, and RSA (rivest-Shamir-Adleman) is used for managing the keys of the AES.

As a further scheme of the invention, based on deep learning, a converter algorithm and a convolutional neural network are adopted to carry out deep semantic analysis and key feature extraction on text and image content, and feature data integration is carried out, so that the steps of generating text and image feature data are specifically as follows:

s101: based on a deep learning framework, performing primary processing on a text by adopting a converter algorithm, converting the text into an intermediate vector representation, and performing feature extraction to generate a text intermediate vector;

s102: based on the text intermediate vector, performing deep learning training by adopting a BERT model, extracting deep semantic features of the text, and generating a text deep feature vector;

S103: performing primary feature extraction on image content by adopting a convolutional neural network, converting an image into a primary feature matrix, and generating an image primary feature matrix;

s104: based on the text depth feature vector and the image primary feature matrix, performing feature fusion by adopting a self-encoder, integrating key features of the text and the image, and generating text and image feature data;

the text intermediate vector is specifically a vectorized expression of text content, the text depth feature vector is specifically a deep learning feature representation of original text, the image primary feature matrix is specifically a feature representation of image content, and the text and image feature data comprises a fusion feature vector representation of text and image.

As a further scheme of the invention, based on the text and image characteristic data, adopting a natural language processing technology and an image recognition algorithm to judge the secret information of the text and the image in the file, and carrying out classification processing, the step of generating a secret information report comprises the following steps:

s201: based on the text and image characteristic data, marking potential risk information in the text content by adopting a natural language processing technology, and generating a text confidentiality preliminary report;

S202: based on the text and the image characteristic data, marking potential risk information in the image content by adopting an image recognition algorithm, and generating an image confidentiality preliminary report;

s203: based on the text secret preliminary report and the image secret preliminary report, adopting a clustering algorithm to summarize similar secret information, and generating secret information classification data;

s204: based on the classified data of the secret information, carrying out information integration by adopting a statistical method to form a secret information overview and details and generate a secret information report;

the text secret preliminary report specifically comprises the position and content of potential risk words, sentences or paragraphs, the image secret preliminary report specifically refers to potential risk areas or objects marked in the image, and the secret information classification data specifically comprises secret information classified by type, source or importance.

As a further scheme of the invention, based on the secret information report, the steps of generating an countermeasure network, simulating potential security vulnerabilities and attack behaviors, enhancing a pre-warning mechanism of a system, integrating vulnerability data and generating security vulnerability simulation data are specifically as follows:

s301: based on the secret information report, generating an countermeasure network, simulating potential security vulnerabilities and generating a potential security vulnerabilities simulation scene;

S302: based on the potential security vulnerability simulation scene, adopting reinforcement learning to simulate the behavior of an attacker and generating simulated attack behavior data;

s303: based on the simulated attack behavior data, identifying the weaknesses of the system by adopting pattern identification, and generating system loopholes and a weaknesses identification result;

s304: based on the system vulnerability and the vulnerability identification result, data integration is carried out, an early warning mechanism is optimized, and an optimized early warning mechanism and vulnerability assessment report are generated;

the generation of the countermeasure network is specifically to capture data distribution by using a generator and a discriminator and to simulate an attack scene, the pattern recognition is specifically to automatically recognize and classify attack patterns by using a support vector machine and a decision tree algorithm, and the optimized early warning mechanism and vulnerability assessment report comprise descriptions of vulnerability recognition, influence assessment and suggested protection measures.

As a further scheme of the invention, based on the security hole simulation data, a data stream analysis algorithm is adopted to dynamically monitor and statically analyze file operation and system behavior, potential data leakage points are found, and data stream integration is performed, so that the step of generating a data stream analysis report is specifically as follows:

S401: based on the optimized early warning mechanism and the vulnerability assessment report, adopting a data flow analysis algorithm to dynamically monitor file operation in the system and generate dynamic file operation monitoring data;

s402: based on the dynamic file operation monitoring data, continuously adopting a data flow analysis algorithm to perform static analysis on system behaviors and generating system behavior static analysis data;

s403: based on the system behavior static analysis data, marking potential data leakage points and generating potential data leakage point marks;

s404: based on the potential data leakage point marks, carrying out data flow integration, evaluating the data security risk of the whole system, and generating a data flow analysis report;

the data flow analysis algorithm is specifically used for analyzing a behavior mode of a data flow in the system, the system behavior static analysis data is specifically used for analyzing the behavior of the system without external input through a static method, the potential data leakage point marks are specifically used for leading to a risk area with information leakage in the system behavior, and the data flow analysis report comprises the overall distribution of the data flow, the marked leakage point and suggested repairing measures.

As a further scheme of the invention, based on the original data of the file, the method adopts an SHA-256 hash algorithm to carry out integrity check on the file, compares the hash value with the hash value stored on a block chain, determines the integrity of the file, and generates an integrity check record, wherein the method specifically comprises the following steps:

S501: based on file data, extracting file contents by adopting a binary reading method, formatting the file contents, and generating an original data extraction report;

s502: based on the original data extraction report, performing file hash calculation by adopting an SHA-256 hash algorithm, formatting a hash value, and generating a file hash value;

s503: based on a block chain network interface, a hash searching method is adopted to search the association with the file hash value, hash value comparison and preparation are carried out, and a block chain hash value is generated;

s504: based on the file hash value and the blockchain hash value, a hash comparison method is adopted to confirm the integrity of the file, and the integrity is checked to generate an integrity check record;

the original data extraction report is specifically in a byte stream form of a file, the file hash value is specifically a character string formed by 64-bit characters, the blockchain hash value is specifically a hash record for storing the file on a blockchain, and the integrity check record specifically indicates whether the file is tampered or damaged.

As a further scheme of the present invention, in combination with the integrity check record, a role-based access control and dynamic encryption technology is adopted to perform access control and encryption processing on a file, and access rights are set, so that the steps of generating access control and encryption policies are specifically as follows:

S601: based on the integrity check record, defining file roles by adopting a role analysis method, setting authorities and generating a role definition and authority allocation table;

s602: based on the role definition and the permission allocation table, adopting a role-based access control algorithm to allocate access permissions, setting access policies and generating file access control policies;

s603: based on the file access control strategy, adopting an AES dynamic encryption method to encrypt the file, integrating the encrypted data, and generating encrypted file data;

s604: based on the encrypted file data and the file access control strategy, adopting a strategy integration method to make a safe storage strategy of the file, and confirming the strategy to generate an access control and encryption strategy;

the role definition and authority allocation table comprises a role name, a role description and an access authority, the file access control strategy is specifically a role and the level of the access authority, the encrypted file data is specifically a byte stream after the original file data is encrypted, and the access control and encryption strategy is specifically a complete strategy and method for accessing and decrypting the file.

The system comprises a feature extraction and fusion module, a risk information labeling module, a risk vulnerability simulation module, a system behavior analysis module, a file content extraction module, a file integrity verification module and a security policy making module.

As a further scheme of the invention, the feature extraction and fusion module is based on a deep learning framework, adopts a converter algorithm and a BERT model to perform text processing and generate text depth feature vectors, simultaneously utilizes a convolutional neural network to extract image features, and performs self-encoder fusion with the text features to generate text and image feature data;

the risk information labeling module labels the risk information based on the generated text and image characteristic data through natural language processing and image recognition technology, and utilizes a statistical method to integrate and integrate the risk information labeling module to generate a secret information report;

the risk vulnerability simulation module simulates potential risks by generating an countermeasure network based on the generated secret information report, and confirms system vulnerabilities through pattern recognition, optimizes an early warning mechanism and generates an optimized early warning mechanism and vulnerability assessment report;

The system behavior analysis module performs static analysis on system behaviors by adopting a data flow analysis algorithm based on the optimized early warning mechanism and the vulnerability assessment report, marks potential data leakage points, assesses safety risks and generates a data flow analysis report;

the file content extraction module formats file data by adopting a binary reading method to generate an original data extraction report;

the file integrity verification module calculates a file hash value by using an SHA-256 hash algorithm based on the generated original data extraction report, compares the file hash value with a blockchain hash value to ensure the file integrity, and generates an integrity verification record;

the security policy making module defines and sets authority for the role of the file based on the extracted integrity check record, encrypts the file by adopting an AES dynamic encryption method, makes a security storage policy of the file and generates access control and encryption policies.

As a further scheme of the invention, the feature extraction and fusion module comprises a text processing sub-module, an image feature extraction sub-module and a feature fusion sub-module;

the risk information labeling module comprises a risk information labeling sub-module, a secret information classifying sub-module and an information integrating sub-module;

The risk vulnerability simulation module comprises a security vulnerability simulation sub-module, an attacker behavior simulation sub-module, a system vulnerability recognition sub-module and an early warning mechanism optimization sub-module;

the system behavior analysis module comprises a file operation monitoring sub-module, a system behavior analysis sub-module, a data leakage point marking sub-module and a data stream integration sub-module;

the file content extraction module comprises a data reading sub-module and a data formatting sub-module;

the file integrity verification module comprises a hash Ji Suanzi module, a hash rope detection sub-module and an integrity confirmation sub-module;

the security policy making module comprises a file role defining sub-module, an access right setting sub-module, a file encrypting sub-module and a policy confirming sub-module.

Compared with the prior art, the invention has the advantages and positive effects that:

according to the invention, through a deep learning technology, a converter algorithm and an image processing technology, the deep semantic analysis and key feature extraction of texts and images can be ensured to be more accurate. The method combines natural language processing and an image recognition algorithm, so that the judgment of secret information in the file is more accurate, and the false alarm rate and the omission rate are reduced. By generating a simulation mechanism against the network, the potential security holes and attack behaviors of the system are pre-warned to a higher degree, and the recognition and response capacity to external threats are enhanced. And based on a data flow analysis method, the system is more comprehensive in monitoring and preventing data leakage. In addition, the integrity of the file is checked by combining the blockchain technology, so that the safety and the non-tamper property of the file are improved.

Drawings

FIG. 1 is a schematic workflow diagram of the present invention;

FIG. 2 is a S1 refinement flowchart of the present invention;

FIG. 3 is a S2 refinement flowchart of the present invention;

FIG. 4 is a S3 refinement flowchart of the present invention;

FIG. 5 is a S4 refinement flowchart of the present invention;

FIG. 6 is a S5 refinement flowchart of the present invention;

FIG. 7 is a S6 refinement flowchart of the present invention;

FIG. 8 is a system flow diagram of the present invention;

FIG. 9 is a schematic diagram of a system framework of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

In the description of the present invention, it should be understood that the terms "length," "width," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like indicate orientations or positional relationships based on the orientation or positional relationships shown in the drawings, merely to facilitate describing the present invention and simplify the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and therefore should not be construed as limiting the present invention. Furthermore, in the description of the present invention, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.

Examples

Referring to fig. 1, the present invention provides a technical solution: a computer file security check method comprising the steps of:

s3: based on the secret information report, adopting a generated countermeasure network to simulate potential security vulnerabilities and attack behaviors, enhancing a pre-warning mechanism of the system, and integrating vulnerability data to generate security vulnerability simulation data;

the converter algorithm is specifically a BERT and GPT series model and is used for semantic understanding of text contents, the convolutional neural network is used for extracting key features from images, the natural language processing technology is used for identifying text type secret information comprising private API keys and passwords, the image identification algorithm is used for identifying secret contents in the images, the data flow analysis algorithm is specifically used for tracking the flow and storage paths of secret data in a system and identifying illegal access or falsification attempts, the dynamic encryption technology comprises AES (advanced encryption standard) for encrypting files, and RSA is used for managing keys of the AES.

The text and image analysis technology based on deep learning adopts a transformer algorithm and a convolutional neural network to carry out deep semantic analysis and key feature extraction, so that secret information in a file can be effectively identified. The text and the image in the file are subjected to secret information judgment and classification processing through a natural language processing technology and an image recognition algorithm, so that a secret information report is generated, and the timely discovery and protection of sensitive information are facilitated.

The generation of the application against the network can simulate potential security vulnerabilities and attack behaviors, and enhance the early warning mechanism of the system. Through dynamic monitoring and static analysis of file operation and system behavior, potential data leakage points are found, data flow integration is carried out, and a data flow analysis report is generated, so that the method is beneficial to finding and preventing safety risks in advance.

The SHA-256 hash algorithm is used for checking the integrity of the file, comparing the hash value with the hash value on the blockchain, determining the integrity of the file, and generating an integrity check record. This helps ensure that the file is not tampered with or damaged during transport and storage.

Based on the combination of role-based access control and dynamic encryption technology, access control and encryption processing on files can be realized, access rights are set, and access control and encryption strategies are generated. This helps to protect the security and privacy of the file from unauthorized access and disclosure.

Referring to fig. 2, based on deep learning, a transformer algorithm and a convolutional neural network are adopted to perform deep semantic analysis and key feature extraction on text and image content, and perform feature data integration, so that the steps of generating text and image feature data are specifically as follows:

the text intermediate vector is specifically a vectorized representation of text content, the text depth feature vector is specifically a deep learning feature representation of the original text, the image primary feature matrix is specifically a feature representation of image content, and the text and image feature data comprises a fused feature vector representation of the text and the image.

In S101, the original text is converted to a vectorized representation using a deep learning framework, such as TensorFlow or pyrerch, using a transformer algorithm (e.g., word2Vec or FastText).

Code example:

import of corresponding libraries and models

from gensim.models import Word2Vec

import nltk

# word segmentation and pre-processing text

text = "Your input text here."

tokens = nltk.word_tokenize(text)

Word2Vec model training #

model = Word2Vec(tokens, vector_size=100, window=5, min_count=1, sg=0)

# obtain text vector

text_vector = model.wv['your_word']

In S102, the text is deep learning trained to extract deep semantic features using a pre-trained BERT model, such as the converters library of Hugging Face.

Code example:

from transformers import BertTokenizer, BertModel

import torch

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

model = BertModel.from_pretrained('bert-base-uncased')

text = "Your input text here."

input_ids = tokenizer(text, return_tensors='pt').input_ids

outputs = model(input_ids)

# acquisition of deep semantic features

text_embeddings = outputs.last_hidden_state.mean(dim=1)

In S103, a Convolutional Neural Network (CNN) model is created using a deep learning framework, such as pyrerch or TensorFlow, for extracting the primary features of the image.

Code example:

import torch

import torch.nn as nn

import torchvision.models as models

# load pre-trained CNN model

cnn_model = models.resnet50(pretrained=True)

The cnn_model=nn.sequential (.list (cnn_model children ()) [: 2 ]) # removes the last fully connected layer

# image preprocessing and feature extraction

image=load_image ("you_image. Jpg") # load image

image=preprocess_image (image) # preprocess image

image_features=cnn_model (image) # extract primary features

In S104, a self-encoder model is created, and the text deep semantic features and the image primary features are input into the self-encoder to generate fused text and image feature data.

Code example:

import torch

import torch.nn as nn

class AutoEncoder(nn.Module):

def __init__(self, text_feature_size, image_feature_size, latent_size):

super(AutoEncoder, self).__init()

self.text_encoder = nn.Linear(text_feature_size, latent_size)

self.image_encoder = nn.Linear(image_feature_size, latent_size)

self.decoder = nn.Linear(latent_size, text_feature_size + image_feature_size)

def forward(self, text_features, image_features):

text_latent = self.text_encoder(text_features)

image_latent = self.image_encoder(image_features)

combined_latent = torch.cat((text_latent, image_latent), dim=1)

reconstructed_features = self.decoder(combined_latent)

return reconstructed_features

# initialization and training self-encoder

autoencoder = AutoEncoder(text_feature_size, image_feature_size, latent_size)

loss_fn = nn.MSELoss()

optimizer = torch.optim.Adam(autoencoder.parameters(), lr=learning_rate)

# training self-encoder

for epoch in range(num_epochs):

optimizer.zero_grad()

outputs = autoencoder(text_features, image_features)

loss = loss_fn(outputs, torch.cat((text_features, image_features), dim=1))

loss.backward()

optimizer.step()

Referring to fig. 3, based on text and image feature data, a natural language processing technology and an image recognition algorithm are adopted to determine secret information of text and images in a file, and classification processing is performed, so that the secret information report is generated specifically by the steps of:

the text security preliminary report specifically includes the location and content of potential risk words, sentences or paragraphs, the image security preliminary report specifically refers to potential risk areas or objects marked in the image, and the security information classification data specifically includes security information classified by type, source or importance.

And marking the potential risk information in the text content by adopting a natural language processing technology based on the text and image characteristic data. This includes using text analysis algorithms to identify and tag potentially sensitive words, phrases or sentences as potentially risky information. At the same time, there is a need to record the location and content of such risk potential information in the text.

Next, based on the text and the image feature data, the potential risk information in the image content is annotated using an image recognition algorithm. This may be accomplished by using image recognition techniques to detect specific areas or objects in the image and mark them as potential risk information. For example, it may be detected whether a person has shown a device or file in the image that inhibits shooting.

After the text and image security preliminary reports are generated, the potential risk information in these reports needs to be generalized and categorized. For this purpose, similar secret information may be grouped using a clustering algorithm to form secret information classification data. These classifications may be divided according to criteria such as type, origin, or importance.

And finally, based on the classified data of the secret information, carrying out information integration by adopting a statistical method to form a secret information overview and details, and generating a secret information report. This includes statistical data analysis of various types of confidential information, as well as description and interpretation of detailed information for each category.

Referring to fig. 4, based on a secret information report, the steps of generating security vulnerability simulation data are specifically as follows:

s302: based on a potential security vulnerability simulation scene, adopting reinforcement learning to simulate the behavior of an attacker and generating simulated attack behavior data;

s303: based on the simulated attack behavior data, identifying the weaknesses of the system by adopting pattern identification, and generating a system vulnerability and a weaknesses identification result;

s304: based on the system vulnerability and the vulnerability recognition result, data integration is carried out, an early warning mechanism is optimized, and an optimized early warning mechanism and vulnerability assessment report are generated;

the method comprises the steps of generating an countermeasure network, specifically using a generator and a discriminator to capture data distribution, for simulating an attack scene, automatically identifying and classifying attack modes by using a support vector machine and a decision tree algorithm, and carrying out optimized early warning mechanism and vulnerability assessment report, wherein the optimized early warning mechanism and vulnerability assessment report comprise descriptions of vulnerability identification, influence assessment and recommended protection measures.

In S301, a potential security vulnerability scenario is simulated using a generative countermeasure network model, including a generator and a discriminant. A data set is prepared including descriptions of potential security vulnerabilities identified in the secret information report and contextual information. A sample of the potential security vulnerability scenario is generated using the generator model. The generator receives random noise as input and generates a scene similar to the actual vulnerability.

Example # code: generator model

generator = create_generator_model()

generator.compile(loss='binary_crossentropy', optimizer='adam')

generator.fit(random_noise, simulated_vulnerability_scenarios, epochs=100)

The discriminators are trained to distinguish between generated vulnerability scenarios and actual vulnerability scenarios. This helps to improve the performance of the generator.

Example # code: distinguishing device model

discriminator = create_discriminator_model()

discriminator.compile(loss='binary_crossentropy', optimizer='adam')

discriminator.fit(scenarios, real_vs_generated_labels, epochs=100)

The generator and the arbiter are combined into GAN, which are trained alternately to improve the quality of the generated vulnerability scenario.

Example # code: GAN model

gan = create_gan(generator, discriminator)

gan.compile(loss='binary_crossentropy', optimizer='adam')

gan.fit(random_noise, real_labels, epochs=100)

In S302, the behavior of the attacker is simulated using a reinforcement learning model. And taking the potential security vulnerability scene as an environment, and establishing a mapping of states, actions and rewards. A reinforcement learning model, such as a deep reinforcement learning model (DRL), is trained to simulate the behavior of an attacker.

Example # code: reinforced learning training

rl_agent = create_reinforcement_learning_agent()

rl_agent.train(environment, num_episodes=1000)

After training, the behavior of the attacker is simulated by using a reinforcement learning model, and attack behavior data is generated.

Example # code: generating simulated attack behavior data

simulated_attack_data = rl_agent.simulate_attacks(num_samples=1000)

In S303, system vulnerabilities and vulnerabilities are automatically identified using a pattern recognition algorithm, such as a support vector machine or decision tree. And (5) arranging the simulated attack behavior data and related system data. Features are extracted, converting the data into a form usable for pattern recognition. A support vector machine, decision tree, or other model is trained to identify attack patterns and system vulnerabilities.

Example # code: pattern recognition model training

pattern_recognition_model = create_pattern_recognition_model()

pattern_recognition_model.fit(features, labels)

Identifying a vulnerability: a pattern recognition model is used to identify vulnerabilities in the system.

Example # code: identifying vulnerabilities in a system

vulnerability_identification = pattern_recognition_model.predict(system_data)

In S304, the system vulnerability and the weak data are integrated. And optimizing the early warning mechanism of the system according to the point identification result and the previous secret information report number and the vulnerability identification result so as to improve the alertness to the potential threat. A vulnerability assessment report is generated that includes a vulnerability description, an impact assessment, and suggested safeguards.

Example # code: generating vulnerability assessment reports

vulnerability_report = generate_vulnerability_report(vulnerability_identification)

Referring to fig. 5, based on security hole simulation data, a data flow analysis algorithm is adopted to dynamically monitor and statically analyze file operation and system behavior, find potential data leakage points, and perform data flow integration, so as to generate a data flow analysis report specifically including the following steps:

s402: based on the dynamic file operation monitoring data, continuously adopting a data flow analysis algorithm to perform static analysis of system behaviors and generate system behavior static analysis data;

s404: based on potential data leakage point marks, carrying out data flow integration, evaluating the data security risk of the whole system, and generating a data flow analysis report;

the data flow analysis algorithm is specifically used for analyzing a behavior mode of a data flow in the system, the system behavior static analysis data is specifically used for analyzing the behavior of the system without external input through a static method, the potential data leakage point marks are specifically used for leading to a risk area of information leakage in the system behavior, and the data flow analysis report comprises the overall distribution of the data flow, the marked leakage points and suggested repairing measures.

And dynamically monitoring file operation in the system by adopting a data flow analysis algorithm based on the optimized early warning mechanism and the vulnerability assessment report. This includes collecting and analyzing a log of file operations in the system, identifying security-related operations, and generating dynamic file operation monitoring data.

And then, based on the dynamic file operation monitoring data, continuously adopting a data flow analysis algorithm to perform static analysis on the system behavior. This can be done by analyzing the behavior of the system without external input, identifying risk areas that may lead to information leakage, and generating system behavior static analysis data.

After the system behavior static analysis data is generated, potential data leakage points in the data need to be marked. This may be accomplished by analyzing abnormal patterns in the system behavior or operations related to known vulnerabilities. Potential data leaks are marked to distinguish them from other normal behavior for subsequent analysis and processing.

And finally, carrying out data flow integration based on the marking result of the potential data leakage points, evaluating the data security risk of the whole system, and generating a data flow analysis report. This includes analyzing the overall distribution of the individual data streams in the system and providing corresponding repair suggestions and measures for the marked leak points.

Referring to fig. 6, based on original data of a file, an SHA-256 hash algorithm is adopted to perform integrity check on the file, and the hash value is compared with the hash value stored on a blockchain to determine the integrity of the file, and the steps of generating an integrity check record are specifically as follows:

the original data extraction report is specifically in the form of a byte stream of a file, the file hash value is specifically a character string formed by 64-bit characters, the blockchain hash value is specifically a hash record for storing the file on a blockchain, and the integrity check record specifically indicates whether the file is tampered or damaged.

And reading the original data of the file. The data in the file can be read out by using a binary reading method and formatted. Ensuring that the file content is extracted in the form of a byte stream. Next, the file content is hashed using the SHA-256 hash algorithm. SHA-256 is a commonly used hashing algorithm that can map data of arbitrary length to a hash value of fixed length (64 bits). By inputting the byte stream of the file into the SHA-256 algorithm, a unique string-represented file hash value can be obtained.

After the file hash value is generated, a blockchain network interface is required to be utilized for searching the file hash value. This may be accomplished by querying file hash records stored on the blockchain. And finding out the corresponding block chain hash record according to the name or other identification information of the file. And comparing the generated file hash value with the hash record on the block chain for preparation. The length and format of the two hash values are consistent, so that subsequent comparison operation can be performed.

And finally, confirming the file integrity by adopting a hash comparison method. Comparing whether the two hash values are the same, and if so, ensuring the integrity of the file; integrity issues may exist if different. And generating an integrity check record according to the comparison result, and indicating whether the file is tampered or damaged.

Referring to fig. 7, in combination with the integrity check record, the access control and encryption processing is performed on the file by adopting a role-based access control and dynamic encryption technology, and the access authority is set, so that the steps of generating the access control and encryption policy are specifically as follows:

s603: based on a file access control strategy, performing encryption processing on the file by adopting an AES dynamic encryption method, and integrating encrypted data to generate encrypted file data;

In S601, different roles of the file are determined based on the integrity check record. Each role is then assigned the appropriate access rights, e.g. read, write or execute rights. Finally, a role definition and rights allocation table is generated in which role names, role descriptions, and specific access rights associated with each role are listed.

In S602, access rights of the file are assigned by using a role-based access control algorithm based on the previously generated role definition and rights assignment table. At the same time, a file access policy is set to determine which roles can access the file with which rights, and which roles should be denied access.

In S603, the file is encrypted by AES dynamic encryption. This typically involves protecting the security of the file using AES or other suitable encryption algorithm. The encrypted file data may be integrated into a byte stream or other binary data format to ensure secure storage of the data.

In S604, a policy integration method is used to create a secure storage policy for the file based on the encrypted file data and the file access control policy. This includes determining the physical storage location of the file, formulating backup policies, planning access audits, and the like. And finally, generating a access control and encryption strategy to ensure confidentiality and integrity of the file.

Referring to fig. 8, a computer file security inspection system is used for executing the computer file security inspection method, and the system includes a feature extraction and fusion module, a risk information labeling module, a risk vulnerability simulation module, a system behavior analysis module, a file content extraction module, a file integrity verification module, and a security policy formulation module.

The feature extraction and fusion module is based on a deep learning framework, adopts a converter algorithm and a BERT model to perform text processing and generate text depth feature vectors, simultaneously utilizes a convolutional neural network to extract image features, and performs self-encoder fusion with the text features to generate text and image feature data;

the risk information labeling module labels the risk information through natural language processing and image recognition technology based on the generated text and image characteristic data, and utilizes a statistical method to integrate the risk information labeling module to generate a secret information report;

the risk vulnerability simulation module simulates potential risks by generating an countermeasure network based on the generated secret information report, and optimizes the early warning mechanism by identifying system vulnerabilities through a mode, and generates an optimized early warning mechanism and vulnerability assessment report;

the security policy making module defines and sets authority for the role of the file based on the extracted integrity check record, encrypts the file by adopting an AES dynamic encryption method, makes a security storage policy of the file, and generates access control and encryption policies.

Due to the adoption of deep learning and various advanced algorithms, the system can accurately extract and fuse text and image features. The deep feature extraction ensures the security of the information during transmission, storage and processing, thereby greatly reducing the security risk caused by information leakage, tampering or loss.

The risk information labeling and vulnerability simulation module arranged in the system enables potential safety risks to be identified and early-warned in the early stage. Such early identification and countermeasures should help take corresponding measures before the problem is further exacerbated.

Through the combination of SHA-256 hashing algorithm and blockchain technique, the system has high reliability in ensuring file integrity. The introduction of blockchain technology increases the resistance to file tampering, thereby increasing the security of the file.

The system focuses not only on the security of data, but also on the efficient management of data. And (3) extracting features, evaluating risks and verifying file integrity, wherein each step is used for ensuring the accuracy and the integrity of data, so that the efficiency of the whole data management flow is improved.

The introduction of the security policy making module makes the authority management of the file more definite and strict. Based on the role and importance of the file, the system can dynamically assign rights to the file, ensuring that only authorized users can access the relevant information.

From text processing, image feature extraction, to file integrity and access control, the system provides an all-round data protection strategy. Each module provides an additional barrier to the security of the data, ensuring that the data is best protected under any environment and conditions.

Referring to fig. 9, the feature extraction and fusion module includes a text processing sub-module, an image feature extraction sub-module, and a feature fusion sub-module;

In the feature extraction and fusion module, a text processing submodule adopts a deep learning framework, and a converter algorithm and a BERT model are used for processing the text to generate a text depth feature vector. The image feature extraction submodule extracts image features by using a convolutional neural network. And the feature fusion sub-module fuses the text features and the image features by using a self-encoder to generate text and image feature data.

And in the risk information labeling module, the risk information labeling sub-module labels the risk information through natural language processing and image recognition technology based on the generated text and image characteristic data. The confidential information classification submodule utilizes a statistical method to summarize and integrate the marked risk information, and generates a confidential information report. And the information integration sub-module integrates the secret information report to generate a final secret information report.

In the risk vulnerability simulation module, the security vulnerability simulation sub-module generates an countermeasure network simulation potential risk based on the generated secret information report. The attacker behavior simulation sub-module simulates the behavior pattern of an attacker. The system vulnerability identification sub-module identifies vulnerabilities of the system by pattern recognition. And the early warning mechanism optimizing sub-module optimizes the early warning mechanism according to the simulation result and generates an optimized early warning mechanism and a vulnerability assessment report.

In the system behavior analysis module, a file operation monitoring sub-module monitors the file operation of the system. The system behavior analysis sub-module analyzes the behavior of the system in the absence of external input. The data leak marking submodule marks potential data leaks. And integrating the analysis results by the data stream integration submodule to generate a data stream analysis report.

In the file content extraction module, a data reading sub-module adopts a binary reading method to read file data. The data formatting sub-module performs formatting processing on the read data to generate an original data extraction report.

In the file integrity verification module, a hash calculation submodule calculates a hash value of a file by using an SHA-256 hash algorithm based on the generated original data extraction report. The hash check sub-module compares the hash value of the file with the hash value on the blockchain to ensure the integrity of the file. And the integrity verification sub-module is used for verifying the integrity of the file according to the comparison result and generating an integrity check record.

In the security policy making module, a file role definition sub-module defines the roles of the files. The access right setting submodule sets the access right of the file. The file encryption sub-module encrypts the file by adopting an AES dynamic encryption method. The policy validation submodule validates the formulated policy and generates access control and encryption policies.

The present invention is not limited to the above embodiments, and any equivalent embodiments which can be changed or modified by the technical disclosure described above can be applied to other fields, but any simple modification, equivalent changes and modification made to the above embodiments according to the technical matter of the present invention will still fall within the scope of the technical disclosure.

Claims

1. A method for checking confidentiality of a computer file, comprising the steps of:

based on deep learning, adopting a converter algorithm and a convolutional neural network to perform deep semantic analysis and key feature extraction on text and image content, and performing feature data integration to generate text and image feature data;

Based on the text and image characteristic data, adopting a natural language processing technology and an image recognition algorithm to judge the secret information of the text and the image in the file, and carrying out classification processing to generate a secret information report;

based on the secret information report, generating an countermeasure network, simulating potential security vulnerabilities and attack behaviors, enhancing a pre-warning mechanism of a system, and integrating vulnerability data to generate security vulnerability simulation data;

based on the security hole simulation data, adopting a data flow analysis algorithm to dynamically monitor and statically analyze file operation and system behavior, searching potential data leakage points, and integrating data flows to generate a data flow analysis report;

based on the original data of the file, carrying out integrity check on the file by adopting an SHA-256 hash algorithm, comparing the hash value with the hash value stored on the block chain, determining the integrity of the file, and generating an integrity check record;

combining the integrity check record, adopting a role-based access control and dynamic encryption technology to carry out access control and encryption processing on the file, setting access rights and generating an access control and encryption strategy;

2. The method for checking confidentiality of computer files according to claim 1, wherein based on deep learning, using a transformer algorithm and a convolutional neural network, performing deep semantic analysis and key feature extraction on text and image contents, and performing feature data integration, the step of generating text and image feature data is specifically as follows:

based on a deep learning framework, performing primary processing on a text by adopting a converter algorithm, converting the text into an intermediate vector representation, and performing feature extraction to generate a text intermediate vector;

based on the text intermediate vector, performing deep learning training by adopting a BERT model, extracting deep semantic features of the text, and generating a text deep feature vector;

Performing primary feature extraction on image content by adopting a convolutional neural network, converting an image into a primary feature matrix, and generating an image primary feature matrix;

based on the text depth feature vector and the image primary feature matrix, performing feature fusion by adopting a self-encoder, integrating key features of the text and the image, and generating text and image feature data;

3. The method for checking confidentiality of a computer file according to claim 1, wherein based on said text and image feature data, using a natural language processing technique and an image recognition algorithm, performing a confidential information judgment on the text and image in the file, and performing a classification process, the step of generating a confidential information report is specifically:

based on the text and image characteristic data, marking potential risk information in the text content by adopting a natural language processing technology, and generating a text confidentiality preliminary report;

Based on the text and the image characteristic data, marking potential risk information in the image content by adopting an image recognition algorithm, and generating an image confidentiality preliminary report;

based on the text secret preliminary report and the image secret preliminary report, adopting a clustering algorithm to summarize similar secret information, and generating secret information classification data;

based on the classified data of the secret information, carrying out information integration by adopting a statistical method to form a secret information overview and details and generate a secret information report;

4. The method for checking security of computer files according to claim 1, wherein based on the report of security information, the steps of generating a countering network, simulating potential security vulnerabilities and attack behaviors, enhancing a pre-warning mechanism of a system, integrating vulnerability data, and generating security vulnerability simulation data are specifically as follows:

Based on the secret information report, generating an countermeasure network, simulating potential security vulnerabilities and generating a potential security vulnerabilities simulation scene;

based on the potential security vulnerability simulation scene, adopting reinforcement learning to simulate the behavior of an attacker and generating simulated attack behavior data;

based on the simulated attack behavior data, identifying the weaknesses of the system by adopting pattern identification, and generating system loopholes and a weaknesses identification result;

based on the system vulnerability and the vulnerability identification result, data integration is carried out, an early warning mechanism is optimized, and an optimized early warning mechanism and vulnerability assessment report are generated;

5. The method for checking confidentiality of a computer file according to claim 4, wherein based on said security hole simulation data, a data stream analysis algorithm is adopted to dynamically monitor and statically analyze file operation and system behavior, search for potential data leakage points, and perform data stream integration, and the step of generating a data stream analysis report is specifically as follows:

Based on the optimized early warning mechanism and the vulnerability assessment report, adopting a data flow analysis algorithm to dynamically monitor file operation in the system and generate dynamic file operation monitoring data;

based on the dynamic file operation monitoring data, continuously adopting a data flow analysis algorithm to perform static analysis on system behaviors and generating system behavior static analysis data;

based on the system behavior static analysis data, marking potential data leakage points and generating potential data leakage point marks;

based on the potential data leakage point marks, carrying out data flow integration, evaluating the data security risk of the whole system, and generating a data flow analysis report;

6. The method for checking confidentiality of a computer file according to claim 1, wherein the steps of checking the integrity of the file based on the original data of the file by using SHA-256 hash algorithm, comparing the hash value with the hash value stored in the blockchain, determining the integrity of the file, and generating the integrity check record are as follows:

Based on file data, extracting file contents by adopting a binary reading method, formatting the file contents, and generating an original data extraction report;

based on the original data extraction report, performing file hash calculation by adopting an SHA-256 hash algorithm, formatting a hash value, and generating a file hash value;

based on a block chain network interface, a hash searching method is adopted to search the association with the file hash value, hash value comparison and preparation are carried out, and a block chain hash value is generated;

based on the file hash value and the blockchain hash value, a hash comparison method is adopted to confirm the integrity of the file, and the integrity is checked to generate an integrity check record;

7. The method for checking confidentiality of a computer file according to claim 1, wherein said step of performing access control and encryption processing on the file and setting access rights by using a role-based access control and dynamic encryption technique in combination with said integrity check record, and generating access control and encryption policies comprises the steps of:

Based on the integrity check record, defining file roles by adopting a role analysis method, setting authorities and generating a role definition and authority allocation table;

based on the role definition and the permission allocation table, adopting a role-based access control algorithm to allocate access permissions, setting access policies and generating file access control policies;

based on the file access control strategy, adopting an AES dynamic encryption method to encrypt the file, integrating the encrypted data, and generating encrypted file data;

based on the encrypted file data and the file access control strategy, adopting a strategy integration method to make a safe storage strategy of the file, and confirming the strategy to generate an access control and encryption strategy;

8. A computer file security check system, characterized in that the system comprises a feature extraction and fusion module, a risk information labeling module, a risk vulnerability simulation module, a system behavior analysis module, a file content extraction module, a file integrity verification module and a security policy formulation module according to the computer file security check method of any one of claims 1-7.

9. The computer file confidentiality checking system of claim 8, characterized in that said feature extraction and fusion module uses a transformer algorithm and BERT model to process text and generate text depth feature vectors based on a deep learning framework, and simultaneously uses convolutional neural network to extract image features and self-encoder fusion with the text features to generate text and image feature data;

10. The computer file security inspection system of claim 8, wherein the feature extraction and fusion module comprises a text processing sub-module, an image feature extraction sub-module, a feature fusion sub-module;