CN116167057A

CN116167057A - Code dynamic safe loading method and device based on key code semantic detection

Info

Publication number: CN116167057A
Application number: CN202310416949.9A
Authority: CN
Inventors: 赵新建; 张颂; 陈石; 陈璐; 陈牧; 夏飞; 袁国泉; 庄岭; 冒佳明; 徐晨维; 宋浒; 赵然; 程昕云; 奚梦婷
Original assignee: State Grid Smart Grid Research Institute Co ltd; State Grid Corp of China SGCC; State Grid Jiangsu Electric Power Co Ltd; Information and Telecommunication Branch of State Grid Jiangsu Electric Power Co Ltd
Current assignee: State Grid Smart Grid Research Institute Co ltd; State Grid Corp of China SGCC; State Grid Jiangsu Electric Power Co Ltd; Information and Telecommunication Branch of State Grid Jiangsu Electric Power Co Ltd
Priority date: 2023-04-19
Filing date: 2023-04-19
Publication date: 2023-05-26
Anticipated expiration: 2043-04-19
Also published as: CN116167057B

Abstract

The invention discloses a code dynamic security loading method and device based on key code semantic detection, which relate to the field of network security and comprise a key code detection module, a code configuration management module and a remote browser module. The invention utilizes a semantic-based neural network model in the key code detection module to carry out key detection on the codes, then utilizes the code configuration management module to configure the key codes, and finally utilizes the remote browser module to issue the remote browser to carry out isolated loading on the key codes.

Description

Code dynamic safe loading method and device based on key code semantic detection

Technical Field

The invention belongs to the field of network security, and particularly relates to a code dynamic security loading method and device based on key code semantic detection.

Background

The micro-application program has small volume and simpler business function, and a developer can embed the micro-application program into various program pages, so that the method is beneficial to accelerating the application development of enterprises and improving the management efficiency of the enterprises. The power mobile micro-application provides business services in the aspects of power equipment, marketing, development, security supervision, infrastructure, materials and the like, and relates to a plurality of sensitive information, however, a power mobile security protection framework based on the micro-application architecture faces a plurality of security threats, and the source codes are easy to be directly obtained by malicious applications due to the fact that the loaded source codes lack protection of shells, and malicious personnel can attack the source codes by utilizing code holes, so that key information is revealed. For power mobile micro-applications involving a multitude of sensitive information, it is particularly important to secure their security.

The development technology of the micro-application mainly depends on the HTML5 technology, javaScript is taken as a basic component of the HTML5, and the development technology has the characteristics of dynamic execution, independence of a platform and the like, but the JavaScript brings convenience to the development of the micro-application and brings numerous potential safety hazards, and an attacker uses the script as an attack medium to destroy the information security of the micro-application and steal sensitive information by injecting JavaScript malicious codes and running invasive or destructive programs.

The traditional micro-application safety protection mode mainly takes passive defense as a main part and active defense as an auxiliary part, and the source code confidentiality is insufficient. The common micro-application defense mode mainly comprises an active code confusion technology, a malicious code scanning technology and a code isolation technology based on a virtual machine: the source code confusion technology is a source code protection technology which improves the complexity of codes and reduces the readability of the codes on the premise of not changing the code logic, and the code confusion technology increases the difficulty of a cracking person in analyzing the codes, but can not truly prevent reverse analysis of the codes, and the cracking person can still restore the whole program logic after a certain time of reverse analysis, so that the safety is required to be improved; the malicious code scanning technology mainly comprises a characteristic value detection technology, wherein the characteristic value detection technology utilizes a manual or machine learning mode to extract the characteristic of the malicious code and utilizes the obtained characteristic to scan and identify the malicious code, the detection speed of the method is high, but the method can only detect based on known vulnerabilities and can not detect the unknown vulnerabilities; the code isolation technology based on the virtual machine generally uses a general virtual machine to isolate the application, but the virtual machine manager has huge trusted computing base, higher virtualization cost and higher performance overhead. The invention discloses a patent application of a website application isolation protection system and a patent application of a website application isolation protection system, which are disclosed by publication No. CN111931170A and publication No. CN113641934A, and the patent application of a website application isolation protection system for website safety access, which are both provided by the invention, uses a remote browser to safely run website applications and return pages to a client browser for display in real time, but both patents focus on isolating the whole application to the remote browser, the isolation mode is not flexible enough, and the platform load is heavy.

Disclosure of Invention

The invention aims to solve the problems mentioned in the background art and provides a code dynamic security loading method and device based on key code semantic detection. The invention adopts the remote browser to isolate the key codes so as to improve the running safety of the key codes of the power mobile micro-application.

In order to achieve the technical purpose, the invention adopts the following technical scheme:

the code dynamic safe loading method based on the key code semantic detection comprises the following steps:

s1, inputting a code to be tested into a pre-constructed key code detection model for key code detection, if the code to be tested comprises the key code, turning to a step S2, and if the code to be tested does not comprise the key code, the code to be tested runs locally, wherein the key code detection model is used for judging whether the code to be tested contains the key code;

s2, configuring the key codes, adding file names of codes to be tested which are expected to be migrated to the remote browser to operate in the configuration file, and isolating the key codes so that the key codes can safely operate in the remote browser and non-key codes operate locally.

In order to optimize the technical scheme, the specific measures adopted further comprise:

before inputting the code to be tested into the pre-constructed key code detection model, the method further comprises the following steps:

step 1, acquiring a code data set;

step 2, a sensitive function table is predefined, key API function names related to authority calling and sensitive information calling are used as key words, a code data set containing the key API functions and a code data set not containing the key API functions are integrated by utilizing a box division method based on a sensitive word matching rule, and screening of the code data set is completed;

step 3, preprocessing and code characterization are carried out on the screened code data set, and finally codes are expressed in a vector form;

and 4, inputting the vector into an LSTM neural network to train the neural network, wherein the neural network model after training is the semantic-based key code detection model.

The specific steps of preprocessing and code characterization of the screened code data set in the step 3 are as follows:

step 3.1, performing lexical and grammatical analysis on codes in a code data set by using a code analysis tool, and generating a corresponding abstract grammar tree by taking operators or sentences in the codes as nodes, wherein the codes in the code data set are converted into token sequences;

step 3.2, adding data dependence and control dependence among nodes into an abstract syntax tree to form a program dependence graph;

step 3.3, traversing an abstract syntax tree, finding out a node with a node type of Identifier, matching an Identifier corresponding to the node with a predefined key API table, wherein the key API table is composed of an API name which is easy to suffer from malicious attack and relates to sensitive authority and a corresponding functional explanation, when the Identifier corresponding to the node is successfully matched with the key API table, slicing the node by defining slicing standards to obtain a key code slice and a non-key code slice, and labeling the key code slice and the non-key code slice respectively, wherein the key code slice is marked with 1, and the non-key code slice is marked with 0;

step 3.4 obtaining a vectorized representation of each code slice using word2vec, each code slice comprising a plurality of lines of code, each line of code comprising a plurality of token, each token corresponding to a vector, each code slice corresponding to a vector consisting of vectors corresponding to a sequence of token, defining the semantic vector corresponding to each code slice as

；

Step 3.5, using the program dependency graph generated in step 3.2 to perform centrality analysis on each node in the program dependency graph, calculating a feature vector centrality index of each node,

given a program dependency graph, the nodes in the graph have

Personal (S)>

Representing node->

Characteristic centrality measurement of>

The initial value is the degree of each node, and the +.>

，

Representation->

An adjacency matrix formed by the individual nodes, wherein the vector is stable, and the iteration is completed>

Namely, the characteristic vector, and the formula of the characteristic vector centrality index is expressed as follows: />

；

Wherein the node

Is +.>

，/>

For node->

Is a characteristic centrality measure of->

Is a proportionality constant, < >>

Representing node->

Degree of (1)/(2)>

Representation->

Adjacency matrix formed by individual nodes>

Middle node

To->

When->

When the value is 0, no connection exists between the two nodes, and when the value is 1, the connection relationship exists between the two nodes, and the number of the connection is +.>

For node->

Is used for the characteristic centrality measurement value of (a),

each code slice corresponds to a program dependency graph, and the structural vector of one code slice is defined as

Specifically expressed as the following formula:

；

step 3.6, splicing and combining the semantic vector obtained in the step 3.4 and the structural vector obtained in the step 3.5 to form a combined vector

Specifically expressed as the following formula:

；

。

in step S1, the key code detection includes the steps of:

s11, performing lexical and grammatical analysis on the code to be detected by using a code analysis tool to generate a corresponding abstract grammar tree;

s12, adding data dependence and control dependence among each node in the abstract syntax tree into the abstract syntax tree of the code to be tested to form a program dependence graph;

s13, slicing the code to be tested by taking the function call as a slicing key point, traversing an abstract syntax tree, finding out a node with a node type of Identifier, starting slicing by defining slicing standards from the node, and finally obtaining a function call slice of the code to be tested;

s14, vectorizing the program slices, and obtaining vectorized representation of each code slice to be tested by using word2vec, wherein the vector corresponding to each code slice to be tested is a two-dimensional matrix, so as to obtain the semantic vector of each code slice to be tested;

s15, taking the structural information of the codes into consideration, carrying out centrality analysis on each node in the code slice to be tested by using the program dependency graph generated in the step S12, and calculating the characteristic centrality vector of each node to obtain the structural vector of each code slice to be tested;

s16, splicing and combining the semantic vector obtained in the step S14 and the structural vector obtained in the step S15 to form a combined vector;

s17, inputting the combined vector obtained in the step S16 into a key code detection model to detect the key code.

In step S2, when the remote browser runs the key codes, rendering the running results into static H5 pages, and transmitting the static H5 pages back to the local application and displaying the static H5 pages to the user.

The code dynamic security loading device based on the semantic detection of the key codes comprises a key code detection module, a code configuration management module and a remote browser module, wherein,

the key code detection module is internally provided with a pre-built key code detection model, and detects the input code to be detected through the key code detection model and identifies key codes and non-key codes in the code to be detected;

the code configuration management module is in signal connection with the key code detection module and is used for carrying out configuration management on the key codes, adding file names of codes to be detected, which are expected to be migrated to the remote browser module to operate, in configuration files and marking the corresponding files to operate in the remote browser module;

the remote browser module is in signal connection with the code configuration management module and is used for reading the configuration file of the code configuration management module and carrying out remote browser isolation loading on the key codes according to the strategy instruction issued by the code configuration management module.

The key code detection model in the key code detection module specifically comprises the following steps: firstly, a code data set is obtained, then a sensitive function table is defined, key API function names related to permission calling and sensitive information calling are used as key words, a code data set containing key API functions and a code data set not containing key API functions are integrated by a box division method based on a sensitive word matching rule, screening of the code data set is completed, preprocessing and code characterization are carried out on the screened code data set, codes are expressed in a vector form, finally, the vectors are input into an LSTM neural network for training of the neural network, and a neural network model after training is completed is a semantic-based key code detection model.

And when the key codes are operated, the remote browser module renders the operation result into a static H5 page, and the static H5 page is returned to the local application and displayed.

A computer readable storage medium storing a computer program which, when executed by a processor, performs the method steps described above.

An electronic device comprising a processor and a memory, said memory storing a computer program which, when executed by said processor, implements the method steps described above.

Compared with the prior art, the invention has the following beneficial effects:

1. the invention uses the remote browser to isolate the key codes, the key codes run by the remote browser, and the non-key codes run locally, thereby effectively ensuring the safety of the key codes and being beneficial to reducing the load of the platform.

2. The invention adopts the box division method to preprocess and integrate the code data set, effectively screens the data set, and is convenient for improving the preprocessing speed of the subsequent code data; introducing the concept of the centrality of the social network, carrying out feature vector centrality analysis on the sliced code nodes, obtaining the structural vector of the code, and fusing the structural vector with the semantic vector of the code, so that the features of the code can be fully embodied.

Drawings

FIG. 1 is a schematic diagram of the code dynamic secure loading method based on key code semantic detection in the present invention;

FIG. 2 is a flow chart of a code dynamic secure loading method based on key code semantic detection in the present invention.

Detailed Description

Embodiments of the present invention are described in further detail below with reference to the accompanying drawings.

It should be noted that the terms like "upper", "lower", "left", "right", "front", "rear", and the like are also used for descriptive purposes only and are not intended to limit the scope of the invention in which the invention may be practiced, but rather the relative relationship of the terms may be altered or modified without materially altering the teachings of the invention.

Example 1

As shown in fig. 1, the device for dynamically and safely loading codes based on key code semantic detection provided by the invention comprises the following three modules: a key code detection module, a code configuration management module and a remote browser module, wherein,

When the key codes are operated, the remote browser module renders an operation result into a static H5 page, and the static H5 page is transmitted back to the local application and displayed, and meanwhile, a keyboard and mouse event sent by the local browser can be received and sent to the local for rendering through a websocket protocol.

Based on the above, in the code loading process, the method first uses the key code detection module to detect the code to be detected, if the key code module identifies the key code, the key code is transmitted to the code configuration management module, after the key code is configured, the key code is transmitted to the remote browser module to protect the key code, and if the key code detection module identifies the non-key code, the key code is operated in the local browser.

Example two

As shown in FIG. 2, the specific implementation method comprises the following steps:

s1, inputting a code to be tested into a pre-constructed key code detection model, detecting the key code, if the code to be tested comprises the key code, turning to a step S2, and if the code to be tested does not comprise the key code, running the code to be tested locally;

s2, configuring the key codes, adding file names of codes to be tested which are expected to be migrated to the remote browser to run in the configuration file, and isolating the key codes in the remote browser module so that the key codes can run safely and non-key codes can run locally.

Before S1, a step of constructing a semantic-based key code detection model is further provided, and the model is used for judging whether the code to be detected contains the key code.

The method for constructing the semantic-based key code detection model comprises the following steps:

step 1, scanning a website by utilizing security scanning software, and downloading to obtain a code data set after confirming that no virus exists;

step 2, processing the obtained code data set by adopting a box division method, wherein as the API function can represent semantic information of codes, a sensitive function table is predefined, key API functions related to permission calling and sensitive information calling are used as keywords, the key API functions comprise getCookie, setCookie, websocket, innerHTML, outerHTML, history and the like, the code data set containing the key API functions and the code data set not containing the key API functions are integrated by utilizing the box division method based on a sensitive word matching rule, the code data set containing the key API functions is used as a data set to be input into a key code detection module, and the data set is effectively screened;

step 3, preprocessing and code characterization are carried out on the screened code data set, and finally codes are expressed in a vector form; the method comprises the following specific steps:

step 3.1, performing lexical and grammatical analysis on codes in a code data set by using a code analysis tool Esprima to generate a corresponding abstract grammar tree, wherein the codes in the data set can be expressed as token, the types of the generated abstract grammar tree token are Keyword, identifier, punctuator, string, numeric five types, and because Numeric, punctuator two types of token have small influence on code semantics, token with Keyword, identifier, string extraction type is stored in a text file and used as an experimental data set, and each token is separated by a space;

step 3.2, in order to extract more complete semantic information, considering the structural relationship between codes, namely data dependency relationship and control dependency relationship, and adding the data dependency and control dependency between each node in the abstract syntax tree into the abstract syntax tree to form a program dependency graph because each node of the abstract syntax tree represents a corresponding operator or statement in the code;

step 3.3, slicing the code by taking the function call as a slicing key point, firstly traversing an abstract syntax tree, finding out a node with a node type of Identifier, when an Identifier corresponding to the node is successfully matched with a predefined key API table, starting to define a slicing standard from the node, wherein slicing refers to extracting sentences related to the key point from a source code, filtering sentences irrelevant to the key point, reducing noise interference, improving a detection effect, finally obtaining a key code slice and a non-key code slice, and marking labels on the key code slice and the non-key code slice, wherein the key code slice is marked as 1, and the non-key code slice is marked as 0;

step 3.4 program slice vectorization, using word2vec to obtain vectorized representation of each code slice, each code slice including a plurality of lines of code, each line of code including a plurality of token, each token corresponding to a vector, each code slice corresponding to a vector consisting of vectors corresponding to a sequence of token, and defining semantic vectors corresponding to each code slice as

；

Step 3.5, in order to enable feature description to be more accurate, a concept of social network centrality is introduced in consideration of structural information of codes, centrality analysis is conducted on each node in the program dependency graph by utilizing the program dependency graph generated in step 3.2, a feature vector centrality index of each node is calculated, the index can represent structural features of each node, and the importance of one node is considered to be dependent on the number of neighbor nodes of the node and the importance of each neighbor node by the feature vector centrality.

Given a program dependency graph, the nodes in the graph have

Personal (S)>

Representing node->

Characteristic centrality measurement of>

The initial value is the degree of each node, and can be obtained through multiple iterations

，/>

Representation->

That is, the feature vector, and finally, the formula of the center index of the feature vector can be expressed as the following formula:

；

wherein the node

Is +.>

，/>

For node->

Is used for the characteristic centrality measurement value of (a),

is a proportionality constant, < >>

Representing node->

Degree of (1)/(2)>

Representation->

Adjacency matrix formed by individual nodes>

Middle node->

To the point of

When->

For node->

Is a characteristic centrality measurement of (a).

Specifically expressed as the following formula:

；/>

Specifically expressed as the following formula:

；

；

wherein the method comprises the steps of

Can be seen as a vector space, +.>

And->

Can be regarded as a linear space +.>

Is defined as +.>

；

Step 4, inputting the combination vector into an LSTM (least squares) for training a neural network to obtain a semantic-based key code detection model;

and 5, evaluating the model, and confirming that the semantic-based key code detection model is reliable to run.

The detection of the key code comprises the following steps:

s12, in order to extract more complete semantic information, considering the structural relationship between codes, namely data dependency relationship and control dependency relationship, adding the data dependency and control dependency between each node in the abstract syntax tree into the abstract syntax tree of the code to be tested to form a program dependency graph, wherein a directed edge between any two nodes represents the data dependency or control dependency relationship between the nodes;

s17, inputting the combination vector obtained in the step S16 into an LSTM neural network for key code detection;

s2, configuring the key codes, adding file names of codes to be tested which are expected to be migrated to the remote browser to run in the configuration file, marking the corresponding files to run in the remote browser, enabling the remote browser to receive configuration instructions, isolating the key codes, enabling the key codes to safely run in the remote browser, and enabling non-key codes to run locally. When a user requests a service, reading a configuration file and judging an access request; if the requested code file comprises a key code file, the remote browser runs, and meanwhile, the remote browser can receive a keyboard and mouse event sent by the local browser, send the event to the local for rendering and display to a user through a websocket protocol; if the key code file is not included, the key code file is directly operated locally, and finally the dynamic safe loading of the key code is realized.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The solutions in the embodiments of the present application may be implemented in various computer languages, for example, object-oriented programming language Java, and an transliterated scripting language JavaScript, etc.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various modifications and variations can be made in the present application without departing from the spirit or scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims and the equivalents thereof, the present application is intended to cover such modifications and variations.

Claims

1. The code dynamic safe loading method based on the key code semantic detection is characterized by comprising the following steps of: the method comprises the following steps:

2. The code dynamic security loading method based on key code semantic detection according to claim 1, wherein the method is characterized in that: before inputting the code to be tested into the pre-constructed key code detection model, the method further comprises the following steps:

step 1, acquiring a code data set;

3. The code dynamic security loading method based on key code semantic detection according to claim 2, wherein the method is characterized in that:

；

given a program dependency graph, the nodes in the graph have

Personal (S)>

Representing node->

Characteristic centrality measurement of>

The initial value is the degree of each node, and the +.>

，

Representation->

；

Wherein the node

Is +.>

，/>

For node->

Is a characteristic centrality measure of->

Is a proportionality constant, < >>

Representing node->

Degree of (1)/(2)>

Representation->

Adjacency matrix formed by individual nodes>

Middle node->

To the point of

When->

For node->

Is used for the characteristic centrality measurement value of (a),

Specifically expressed as the following formula:

；

Specifically expressed as the following formula:

；

。

4. the code dynamic security loading method based on key code semantic detection according to claim 3, wherein the method is characterized in that: in step S1, the key code detection includes the steps of:

5. The code dynamic security loading method based on key code semantic detection according to claim 3, wherein the method is characterized in that: in step S2, when the remote browser runs the key codes, rendering the running results into static H5 pages, and transmitting the static H5 pages back to the local application and displaying the static H5 pages to the user.

6. The code dynamic safe loading device based on the key code semantic detection is characterized in that: comprises a key code detection module, a code configuration management module and a remote browser module, wherein,

7. The code dynamic secure loading device based on key code semantic detection according to claim 6, wherein: the key code detection model in the key code detection module specifically comprises the following steps: firstly, a code data set is obtained, then a sensitive function table is defined, key API function names related to permission calling and sensitive information calling are used as key words, a code data set containing key API functions and a code data set not containing key API functions are integrated by a box division method based on a sensitive word matching rule, screening of the code data set is completed, preprocessing and code characterization are carried out on the screened code data set, codes are expressed in a vector form, finally, the vectors are input into an LSTM neural network for training of the neural network, and a neural network model after training is completed is a semantic-based key code detection model.

8. The code dynamic secure loading device based on key code semantic detection according to claim 6, wherein: and when the key codes are operated, the remote browser module renders the operation result into a static H5 page, and the static H5 page is returned to the local application and displayed.

9. A computer-readable storage medium, characterized by: the computer readable storage medium stores a computer program which, when executed by a processor, implements the method steps of any of claims 1-5.

10. An electronic device, characterized by: the electronic device comprising a processor and a memory, the memory storing a computer program, which, when executed by the processor, performs the method steps of any of claims 1-5.