CN116167057A - Code dynamic safe loading method and device based on key code semantic detection - Google Patents

Code dynamic safe loading method and device based on key code semantic detection Download PDF

Info

Publication number
CN116167057A
CN116167057A CN202310416949.9A CN202310416949A CN116167057A CN 116167057 A CN116167057 A CN 116167057A CN 202310416949 A CN202310416949 A CN 202310416949A CN 116167057 A CN116167057 A CN 116167057A
Authority
CN
China
Prior art keywords
code
key
codes
node
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310416949.9A
Other languages
Chinese (zh)
Other versions
CN116167057B (en
Inventor
赵新建
张颂
陈石
陈璐
陈牧
夏飞
袁国泉
庄岭
冒佳明
徐晨维
宋浒
赵然
程昕云
奚梦婷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Smart Grid Research Institute Co ltd
State Grid Corp of China SGCC
State Grid Jiangsu Electric Power Co Ltd
Information and Telecommunication Branch of State Grid Jiangsu Electric Power Co Ltd
Original Assignee
State Grid Smart Grid Research Institute Co ltd
State Grid Corp of China SGCC
State Grid Jiangsu Electric Power Co Ltd
Information and Telecommunication Branch of State Grid Jiangsu Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Smart Grid Research Institute Co ltd, State Grid Corp of China SGCC, State Grid Jiangsu Electric Power Co Ltd, Information and Telecommunication Branch of State Grid Jiangsu Electric Power Co Ltd filed Critical State Grid Smart Grid Research Institute Co ltd
Priority to CN202310416949.9A priority Critical patent/CN116167057B/en
Publication of CN116167057A publication Critical patent/CN116167057A/en
Application granted granted Critical
Publication of CN116167057B publication Critical patent/CN116167057B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • G06F21/563Static detection by source code analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Machine Translation (AREA)
  • Input From Keyboards Or The Like (AREA)

Abstract

The invention discloses a code dynamic security loading method and device based on key code semantic detection, which relate to the field of network security and comprise a key code detection module, a code configuration management module and a remote browser module. The invention utilizes a semantic-based neural network model in the key code detection module to carry out key detection on the codes, then utilizes the code configuration management module to configure the key codes, and finally utilizes the remote browser module to issue the remote browser to carry out isolated loading on the key codes.

Description

Code dynamic safe loading method and device based on key code semantic detection
Technical Field
The invention belongs to the field of network security, and particularly relates to a code dynamic security loading method and device based on key code semantic detection.
Background
The micro-application program has small volume and simpler business function, and a developer can embed the micro-application program into various program pages, so that the method is beneficial to accelerating the application development of enterprises and improving the management efficiency of the enterprises. The power mobile micro-application provides business services in the aspects of power equipment, marketing, development, security supervision, infrastructure, materials and the like, and relates to a plurality of sensitive information, however, a power mobile security protection framework based on the micro-application architecture faces a plurality of security threats, and the source codes are easy to be directly obtained by malicious applications due to the fact that the loaded source codes lack protection of shells, and malicious personnel can attack the source codes by utilizing code holes, so that key information is revealed. For power mobile micro-applications involving a multitude of sensitive information, it is particularly important to secure their security.
The development technology of the micro-application mainly depends on the HTML5 technology, javaScript is taken as a basic component of the HTML5, and the development technology has the characteristics of dynamic execution, independence of a platform and the like, but the JavaScript brings convenience to the development of the micro-application and brings numerous potential safety hazards, and an attacker uses the script as an attack medium to destroy the information security of the micro-application and steal sensitive information by injecting JavaScript malicious codes and running invasive or destructive programs.
The traditional micro-application safety protection mode mainly takes passive defense as a main part and active defense as an auxiliary part, and the source code confidentiality is insufficient. The common micro-application defense mode mainly comprises an active code confusion technology, a malicious code scanning technology and a code isolation technology based on a virtual machine: the source code confusion technology is a source code protection technology which improves the complexity of codes and reduces the readability of the codes on the premise of not changing the code logic, and the code confusion technology increases the difficulty of a cracking person in analyzing the codes, but can not truly prevent reverse analysis of the codes, and the cracking person can still restore the whole program logic after a certain time of reverse analysis, so that the safety is required to be improved; the malicious code scanning technology mainly comprises a characteristic value detection technology, wherein the characteristic value detection technology utilizes a manual or machine learning mode to extract the characteristic of the malicious code and utilizes the obtained characteristic to scan and identify the malicious code, the detection speed of the method is high, but the method can only detect based on known vulnerabilities and can not detect the unknown vulnerabilities; the code isolation technology based on the virtual machine generally uses a general virtual machine to isolate the application, but the virtual machine manager has huge trusted computing base, higher virtualization cost and higher performance overhead. The invention discloses a patent application of a website application isolation protection system and a patent application of a website application isolation protection system, which are disclosed by publication No. CN111931170A and publication No. CN113641934A, and the patent application of a website application isolation protection system for website safety access, which are both provided by the invention, uses a remote browser to safely run website applications and return pages to a client browser for display in real time, but both patents focus on isolating the whole application to the remote browser, the isolation mode is not flexible enough, and the platform load is heavy.
Disclosure of Invention
The invention aims to solve the problems mentioned in the background art and provides a code dynamic security loading method and device based on key code semantic detection. The invention adopts the remote browser to isolate the key codes so as to improve the running safety of the key codes of the power mobile micro-application.
In order to achieve the technical purpose, the invention adopts the following technical scheme:
the code dynamic safe loading method based on the key code semantic detection comprises the following steps:
s1, inputting a code to be tested into a pre-constructed key code detection model for key code detection, if the code to be tested comprises the key code, turning to a step S2, and if the code to be tested does not comprise the key code, the code to be tested runs locally, wherein the key code detection model is used for judging whether the code to be tested contains the key code;
s2, configuring the key codes, adding file names of codes to be tested which are expected to be migrated to the remote browser to operate in the configuration file, and isolating the key codes so that the key codes can safely operate in the remote browser and non-key codes operate locally.
In order to optimize the technical scheme, the specific measures adopted further comprise:
before inputting the code to be tested into the pre-constructed key code detection model, the method further comprises the following steps:
step 1, acquiring a code data set;
step 2, a sensitive function table is predefined, key API function names related to authority calling and sensitive information calling are used as key words, a code data set containing the key API functions and a code data set not containing the key API functions are integrated by utilizing a box division method based on a sensitive word matching rule, and screening of the code data set is completed;
step 3, preprocessing and code characterization are carried out on the screened code data set, and finally codes are expressed in a vector form;
and 4, inputting the vector into an LSTM neural network to train the neural network, wherein the neural network model after training is the semantic-based key code detection model.
The specific steps of preprocessing and code characterization of the screened code data set in the step 3 are as follows:
step 3.1, performing lexical and grammatical analysis on codes in a code data set by using a code analysis tool, and generating a corresponding abstract grammar tree by taking operators or sentences in the codes as nodes, wherein the codes in the code data set are converted into token sequences;
step 3.2, adding data dependence and control dependence among nodes into an abstract syntax tree to form a program dependence graph;
step 3.3, traversing an abstract syntax tree, finding out a node with a node type of Identifier, matching an Identifier corresponding to the node with a predefined key API table, wherein the key API table is composed of an API name which is easy to suffer from malicious attack and relates to sensitive authority and a corresponding functional explanation, when the Identifier corresponding to the node is successfully matched with the key API table, slicing the node by defining slicing standards to obtain a key code slice and a non-key code slice, and labeling the key code slice and the non-key code slice respectively, wherein the key code slice is marked with 1, and the non-key code slice is marked with 0;
step 3.4 obtaining a vectorized representation of each code slice using word2vec, each code slice comprising a plurality of lines of code, each line of code comprising a plurality of token, each token corresponding to a vector, each code slice corresponding to a vector consisting of vectors corresponding to a sequence of token, defining the semantic vector corresponding to each code slice as
Figure SMS_1
Step 3.5, using the program dependency graph generated in step 3.2 to perform centrality analysis on each node in the program dependency graph, calculating a feature vector centrality index of each node,
given a program dependency graph, the nodes in the graph have
Figure SMS_4
Personal (S)>
Figure SMS_6
Representing node->
Figure SMS_8
Characteristic centrality measurement of>
Figure SMS_3
The initial value is the degree of each node, and the +.>
Figure SMS_5
Figure SMS_7
Representation->
Figure SMS_9
An adjacency matrix formed by the individual nodes, wherein the vector is stable, and the iteration is completed>
Figure SMS_2
Namely, the characteristic vector, and the formula of the characteristic vector centrality index is expressed as follows: />
Figure SMS_10
Wherein the node
Figure SMS_12
Is +.>
Figure SMS_17
,/>
Figure SMS_21
For node->
Figure SMS_13
Is a characteristic centrality measure of->
Figure SMS_18
Is a proportionality constant, < >>
Figure SMS_22
Representing node->
Figure SMS_25
Degree of (1)/(2)>
Figure SMS_11
Representation->
Figure SMS_15
Adjacency matrix formed by individual nodes>
Figure SMS_19
Middle node
Figure SMS_23
To->
Figure SMS_14
When->
Figure SMS_16
When the value is 0, no connection exists between the two nodes, and when the value is 1, the connection relationship exists between the two nodes, and the number of the connection is +.>
Figure SMS_20
For node->
Figure SMS_24
Is used for the characteristic centrality measurement value of (a),
each code slice corresponds to a program dependency graph, and the structural vector of one code slice is defined as
Figure SMS_26
Specifically expressed as the following formula:
Figure SMS_27
step 3.6, splicing and combining the semantic vector obtained in the step 3.4 and the structural vector obtained in the step 3.5 to form a combined vector
Figure SMS_28
Specifically expressed as the following formula:
Figure SMS_29
Figure SMS_30
in step S1, the key code detection includes the steps of:
s11, performing lexical and grammatical analysis on the code to be detected by using a code analysis tool to generate a corresponding abstract grammar tree;
s12, adding data dependence and control dependence among each node in the abstract syntax tree into the abstract syntax tree of the code to be tested to form a program dependence graph;
s13, slicing the code to be tested by taking the function call as a slicing key point, traversing an abstract syntax tree, finding out a node with a node type of Identifier, starting slicing by defining slicing standards from the node, and finally obtaining a function call slice of the code to be tested;
s14, vectorizing the program slices, and obtaining vectorized representation of each code slice to be tested by using word2vec, wherein the vector corresponding to each code slice to be tested is a two-dimensional matrix, so as to obtain the semantic vector of each code slice to be tested;
s15, taking the structural information of the codes into consideration, carrying out centrality analysis on each node in the code slice to be tested by using the program dependency graph generated in the step S12, and calculating the characteristic centrality vector of each node to obtain the structural vector of each code slice to be tested;
s16, splicing and combining the semantic vector obtained in the step S14 and the structural vector obtained in the step S15 to form a combined vector;
s17, inputting the combined vector obtained in the step S16 into a key code detection model to detect the key code.
In step S2, when the remote browser runs the key codes, rendering the running results into static H5 pages, and transmitting the static H5 pages back to the local application and displaying the static H5 pages to the user.
The code dynamic security loading device based on the semantic detection of the key codes comprises a key code detection module, a code configuration management module and a remote browser module, wherein,
the key code detection module is internally provided with a pre-built key code detection model, and detects the input code to be detected through the key code detection model and identifies key codes and non-key codes in the code to be detected;
the code configuration management module is in signal connection with the key code detection module and is used for carrying out configuration management on the key codes, adding file names of codes to be detected, which are expected to be migrated to the remote browser module to operate, in configuration files and marking the corresponding files to operate in the remote browser module;
the remote browser module is in signal connection with the code configuration management module and is used for reading the configuration file of the code configuration management module and carrying out remote browser isolation loading on the key codes according to the strategy instruction issued by the code configuration management module.
The key code detection model in the key code detection module specifically comprises the following steps: firstly, a code data set is obtained, then a sensitive function table is defined, key API function names related to permission calling and sensitive information calling are used as key words, a code data set containing key API functions and a code data set not containing key API functions are integrated by a box division method based on a sensitive word matching rule, screening of the code data set is completed, preprocessing and code characterization are carried out on the screened code data set, codes are expressed in a vector form, finally, the vectors are input into an LSTM neural network for training of the neural network, and a neural network model after training is completed is a semantic-based key code detection model.
And when the key codes are operated, the remote browser module renders the operation result into a static H5 page, and the static H5 page is returned to the local application and displayed.
A computer readable storage medium storing a computer program which, when executed by a processor, performs the method steps described above.
An electronic device comprising a processor and a memory, said memory storing a computer program which, when executed by said processor, implements the method steps described above.
Compared with the prior art, the invention has the following beneficial effects:
1. the invention uses the remote browser to isolate the key codes, the key codes run by the remote browser, and the non-key codes run locally, thereby effectively ensuring the safety of the key codes and being beneficial to reducing the load of the platform.
2. The invention adopts the box division method to preprocess and integrate the code data set, effectively screens the data set, and is convenient for improving the preprocessing speed of the subsequent code data; introducing the concept of the centrality of the social network, carrying out feature vector centrality analysis on the sliced code nodes, obtaining the structural vector of the code, and fusing the structural vector with the semantic vector of the code, so that the features of the code can be fully embodied.
Drawings
FIG. 1 is a schematic diagram of the code dynamic secure loading method based on key code semantic detection in the present invention;
FIG. 2 is a flow chart of a code dynamic secure loading method based on key code semantic detection in the present invention.
Detailed Description
Embodiments of the present invention are described in further detail below with reference to the accompanying drawings.
It should be noted that the terms like "upper", "lower", "left", "right", "front", "rear", and the like are also used for descriptive purposes only and are not intended to limit the scope of the invention in which the invention may be practiced, but rather the relative relationship of the terms may be altered or modified without materially altering the teachings of the invention.
Example 1
As shown in fig. 1, the device for dynamically and safely loading codes based on key code semantic detection provided by the invention comprises the following three modules: a key code detection module, a code configuration management module and a remote browser module, wherein,
the key code detection module is internally provided with a pre-built key code detection model, and detects the input code to be detected through the key code detection model and identifies key codes and non-key codes in the code to be detected;
the code configuration management module is in signal connection with the key code detection module and is used for carrying out configuration management on the key codes, adding file names of codes to be detected, which are expected to be migrated to the remote browser module to operate, in configuration files and marking the corresponding files to operate in the remote browser module;
the remote browser module is in signal connection with the code configuration management module and is used for reading the configuration file of the code configuration management module and carrying out remote browser isolation loading on the key codes according to the strategy instruction issued by the code configuration management module.
The key code detection model in the key code detection module specifically comprises the following steps: firstly, a code data set is obtained, then a sensitive function table is defined, key API function names related to permission calling and sensitive information calling are used as key words, a code data set containing key API functions and a code data set not containing key API functions are integrated by a box division method based on a sensitive word matching rule, screening of the code data set is completed, preprocessing and code characterization are carried out on the screened code data set, codes are expressed in a vector form, finally, the vectors are input into an LSTM neural network for training of the neural network, and a neural network model after training is completed is a semantic-based key code detection model.
When the key codes are operated, the remote browser module renders an operation result into a static H5 page, and the static H5 page is transmitted back to the local application and displayed, and meanwhile, a keyboard and mouse event sent by the local browser can be received and sent to the local for rendering through a websocket protocol.
Based on the above, in the code loading process, the method first uses the key code detection module to detect the code to be detected, if the key code module identifies the key code, the key code is transmitted to the code configuration management module, after the key code is configured, the key code is transmitted to the remote browser module to protect the key code, and if the key code detection module identifies the non-key code, the key code is operated in the local browser.
Example two
As shown in FIG. 2, the specific implementation method comprises the following steps:
s1, inputting a code to be tested into a pre-constructed key code detection model, detecting the key code, if the code to be tested comprises the key code, turning to a step S2, and if the code to be tested does not comprise the key code, running the code to be tested locally;
s2, configuring the key codes, adding file names of codes to be tested which are expected to be migrated to the remote browser to run in the configuration file, and isolating the key codes in the remote browser module so that the key codes can run safely and non-key codes can run locally.
Before S1, a step of constructing a semantic-based key code detection model is further provided, and the model is used for judging whether the code to be detected contains the key code.
The method for constructing the semantic-based key code detection model comprises the following steps:
step 1, scanning a website by utilizing security scanning software, and downloading to obtain a code data set after confirming that no virus exists;
step 2, processing the obtained code data set by adopting a box division method, wherein as the API function can represent semantic information of codes, a sensitive function table is predefined, key API functions related to permission calling and sensitive information calling are used as keywords, the key API functions comprise getCookie, setCookie, websocket, innerHTML, outerHTML, history and the like, the code data set containing the key API functions and the code data set not containing the key API functions are integrated by utilizing the box division method based on a sensitive word matching rule, the code data set containing the key API functions is used as a data set to be input into a key code detection module, and the data set is effectively screened;
step 3, preprocessing and code characterization are carried out on the screened code data set, and finally codes are expressed in a vector form; the method comprises the following specific steps:
step 3.1, performing lexical and grammatical analysis on codes in a code data set by using a code analysis tool Esprima to generate a corresponding abstract grammar tree, wherein the codes in the data set can be expressed as token, the types of the generated abstract grammar tree token are Keyword, identifier, punctuator, string, numeric five types, and because Numeric, punctuator two types of token have small influence on code semantics, token with Keyword, identifier, string extraction type is stored in a text file and used as an experimental data set, and each token is separated by a space;
step 3.2, in order to extract more complete semantic information, considering the structural relationship between codes, namely data dependency relationship and control dependency relationship, and adding the data dependency and control dependency between each node in the abstract syntax tree into the abstract syntax tree to form a program dependency graph because each node of the abstract syntax tree represents a corresponding operator or statement in the code;
step 3.3, slicing the code by taking the function call as a slicing key point, firstly traversing an abstract syntax tree, finding out a node with a node type of Identifier, when an Identifier corresponding to the node is successfully matched with a predefined key API table, starting to define a slicing standard from the node, wherein slicing refers to extracting sentences related to the key point from a source code, filtering sentences irrelevant to the key point, reducing noise interference, improving a detection effect, finally obtaining a key code slice and a non-key code slice, and marking labels on the key code slice and the non-key code slice, wherein the key code slice is marked as 1, and the non-key code slice is marked as 0;
step 3.4 program slice vectorization, using word2vec to obtain vectorized representation of each code slice, each code slice including a plurality of lines of code, each line of code including a plurality of token, each token corresponding to a vector, each code slice corresponding to a vector consisting of vectors corresponding to a sequence of token, and defining semantic vectors corresponding to each code slice as
Figure SMS_31
Step 3.5, in order to enable feature description to be more accurate, a concept of social network centrality is introduced in consideration of structural information of codes, centrality analysis is conducted on each node in the program dependency graph by utilizing the program dependency graph generated in step 3.2, a feature vector centrality index of each node is calculated, the index can represent structural features of each node, and the importance of one node is considered to be dependent on the number of neighbor nodes of the node and the importance of each neighbor node by the feature vector centrality.
Given a program dependency graph, the nodes in the graph have
Figure SMS_33
Personal (S)>
Figure SMS_35
Representing node->
Figure SMS_37
Characteristic centrality measurement of>
Figure SMS_34
The initial value is the degree of each node, and can be obtained through multiple iterations
Figure SMS_36
,/>
Figure SMS_38
Representation->
Figure SMS_39
An adjacency matrix formed by the individual nodes, wherein the vector is stable, and the iteration is completed>
Figure SMS_32
That is, the feature vector, and finally, the formula of the center index of the feature vector can be expressed as the following formula:
Figure SMS_40
wherein the node
Figure SMS_44
Is +.>
Figure SMS_50
,/>
Figure SMS_54
For node->
Figure SMS_43
Is used for the characteristic centrality measurement value of (a),
Figure SMS_45
is a proportionality constant, < >>
Figure SMS_47
Representing node->
Figure SMS_51
Degree of (1)/(2)>
Figure SMS_42
Representation->
Figure SMS_48
Adjacency matrix formed by individual nodes>
Figure SMS_53
Middle node->
Figure SMS_55
To the point of
Figure SMS_41
When->
Figure SMS_46
When the value is 0, no connection exists between the two nodes, and when the value is 1, the connection relationship exists between the two nodes, and the number of the connection is +.>
Figure SMS_49
For node->
Figure SMS_52
Is a characteristic centrality measurement of (a).
Each code slice corresponds to a program dependency graph, and the structural vector of one code slice is defined as
Figure SMS_56
Specifically expressed as the following formula:
Figure SMS_57
;/>
step 3.6, splicing and combining the semantic vector obtained in the step 3.4 and the structural vector obtained in the step 3.5 to form a combined vector
Figure SMS_58
Specifically expressed as the following formula:
Figure SMS_59
Figure SMS_60
wherein the method comprises the steps of
Figure SMS_61
Can be seen as a vector space, +.>
Figure SMS_62
And->
Figure SMS_63
Can be regarded as a linear space +.>
Figure SMS_64
Is defined as +.>
Figure SMS_65
Step 4, inputting the combination vector into an LSTM (least squares) for training a neural network to obtain a semantic-based key code detection model;
and 5, evaluating the model, and confirming that the semantic-based key code detection model is reliable to run.
The detection of the key code comprises the following steps:
s11, performing lexical and grammatical analysis on the code to be detected by using a code analysis tool to generate a corresponding abstract grammar tree;
s12, in order to extract more complete semantic information, considering the structural relationship between codes, namely data dependency relationship and control dependency relationship, adding the data dependency and control dependency between each node in the abstract syntax tree into the abstract syntax tree of the code to be tested to form a program dependency graph, wherein a directed edge between any two nodes represents the data dependency or control dependency relationship between the nodes;
s13, slicing the code to be tested by taking the function call as a slicing key point, traversing an abstract syntax tree, finding out a node with a node type of Identifier, starting slicing by defining slicing standards from the node, and finally obtaining a function call slice of the code to be tested;
s14, vectorizing the program slices, and obtaining vectorized representation of each code slice to be tested by using word2vec, wherein the vector corresponding to each code slice to be tested is a two-dimensional matrix, so as to obtain the semantic vector of each code slice to be tested;
s15, taking the structural information of the codes into consideration, carrying out centrality analysis on each node in the code slice to be tested by using the program dependency graph generated in the step S12, and calculating the characteristic centrality vector of each node to obtain the structural vector of each code slice to be tested;
s16, splicing and combining the semantic vector obtained in the step S14 and the structural vector obtained in the step S15 to form a combined vector;
s17, inputting the combination vector obtained in the step S16 into an LSTM neural network for key code detection;
s2, configuring the key codes, adding file names of codes to be tested which are expected to be migrated to the remote browser to run in the configuration file, marking the corresponding files to run in the remote browser, enabling the remote browser to receive configuration instructions, isolating the key codes, enabling the key codes to safely run in the remote browser, and enabling non-key codes to run locally. When a user requests a service, reading a configuration file and judging an access request; if the requested code file comprises a key code file, the remote browser runs, and meanwhile, the remote browser can receive a keyboard and mouse event sent by the local browser, send the event to the local for rendering and display to a user through a websocket protocol; if the key code file is not included, the key code file is directly operated locally, and finally the dynamic safe loading of the key code is realized.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The solutions in the embodiments of the present application may be implemented in various computer languages, for example, object-oriented programming language Java, and an transliterated scripting language JavaScript, etc.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various modifications and variations can be made in the present application without departing from the spirit or scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims and the equivalents thereof, the present application is intended to cover such modifications and variations.

Claims (10)

1. The code dynamic safe loading method based on the key code semantic detection is characterized by comprising the following steps of: the method comprises the following steps:
s1, inputting a code to be tested into a pre-constructed key code detection model for key code detection, if the code to be tested comprises the key code, turning to a step S2, and if the code to be tested does not comprise the key code, the code to be tested runs locally, wherein the key code detection model is used for judging whether the code to be tested contains the key code;
s2, configuring the key codes, adding file names of codes to be tested which are expected to be migrated to the remote browser to operate in the configuration file, and isolating the key codes so that the key codes can safely operate in the remote browser and non-key codes operate locally.
2. The code dynamic security loading method based on key code semantic detection according to claim 1, wherein the method is characterized in that: before inputting the code to be tested into the pre-constructed key code detection model, the method further comprises the following steps:
step 1, acquiring a code data set;
step 2, a sensitive function table is predefined, key API function names related to authority calling and sensitive information calling are used as key words, a code data set containing the key API functions and a code data set not containing the key API functions are integrated by utilizing a box division method based on a sensitive word matching rule, and screening of the code data set is completed;
step 3, preprocessing and code characterization are carried out on the screened code data set, and finally codes are expressed in a vector form;
and 4, inputting the vector into an LSTM neural network to train the neural network, wherein the neural network model after training is the semantic-based key code detection model.
3. The code dynamic security loading method based on key code semantic detection according to claim 2, wherein the method is characterized in that:
the specific steps of preprocessing and code characterization of the screened code data set in the step 3 are as follows:
step 3.1, performing lexical and grammatical analysis on codes in a code data set by using a code analysis tool, and generating a corresponding abstract grammar tree by taking operators or sentences in the codes as nodes, wherein the codes in the code data set are converted into token sequences;
step 3.2, adding data dependence and control dependence among nodes into an abstract syntax tree to form a program dependence graph;
step 3.3, traversing an abstract syntax tree, finding out a node with a node type of Identifier, matching an Identifier corresponding to the node with a predefined key API table, wherein the key API table is composed of an API name which is easy to suffer from malicious attack and relates to sensitive authority and a corresponding functional explanation, when the Identifier corresponding to the node is successfully matched with the key API table, slicing the node by defining slicing standards to obtain a key code slice and a non-key code slice, and labeling the key code slice and the non-key code slice respectively, wherein the key code slice is marked with 1, and the non-key code slice is marked with 0;
step 3.4 obtaining a vectorized representation of each code slice using word2vec, each code slice comprising a plurality of lines of code, each line of code comprising a plurality of token, each token corresponding to a vector, each code slice corresponding to a vector consisting of vectors corresponding to a sequence of token, defining the semantic vector corresponding to each code slice as
Figure QLYQS_1
Step 3.5, using the program dependency graph generated in step 3.2 to perform centrality analysis on each node in the program dependency graph, calculating a feature vector centrality index of each node,
given a program dependency graph, the nodes in the graph have
Figure QLYQS_4
Personal (S)>
Figure QLYQS_6
Representing node->
Figure QLYQS_8
Characteristic centrality measurement of>
Figure QLYQS_3
The initial value is the degree of each node, and the +.>
Figure QLYQS_5
Figure QLYQS_7
Representation->
Figure QLYQS_9
An adjacency matrix formed by the individual nodes, wherein the vector is stable, and the iteration is completed>
Figure QLYQS_2
Namely, the characteristic vector, and the formula of the characteristic vector centrality index is expressed as follows: />
Figure QLYQS_10
Wherein the node
Figure QLYQS_11
Is +.>
Figure QLYQS_15
,/>
Figure QLYQS_19
For node->
Figure QLYQS_14
Is a characteristic centrality measure of->
Figure QLYQS_18
Is a proportionality constant, < >>
Figure QLYQS_22
Representing node->
Figure QLYQS_25
Degree of (1)/(2)>
Figure QLYQS_12
Representation->
Figure QLYQS_16
Adjacency matrix formed by individual nodes>
Figure QLYQS_20
Middle node->
Figure QLYQS_24
To the point of
Figure QLYQS_13
When->
Figure QLYQS_17
When the value is 0, no connection exists between the two nodes, and when the value is 1, the connection relationship exists between the two nodes, and the number of the connection is +.>
Figure QLYQS_21
For node->
Figure QLYQS_23
Is used for the characteristic centrality measurement value of (a),
each code slice corresponds to a program dependency graph, and the structural vector of one code slice is defined as
Figure QLYQS_26
Specifically expressed as the following formula:
Figure QLYQS_27
step 3.6, splicing and combining the semantic vector obtained in the step 3.4 and the structural vector obtained in the step 3.5 to form a combined vector
Figure QLYQS_28
Specifically expressed as the following formula:
Figure QLYQS_29
Figure QLYQS_30
4. the code dynamic security loading method based on key code semantic detection according to claim 3, wherein the method is characterized in that: in step S1, the key code detection includes the steps of:
s11, performing lexical and grammatical analysis on the code to be detected by using a code analysis tool to generate a corresponding abstract grammar tree;
s12, adding data dependence and control dependence among each node in the abstract syntax tree into the abstract syntax tree of the code to be tested to form a program dependence graph;
s13, slicing the code to be tested by taking the function call as a slicing key point, traversing an abstract syntax tree, finding out a node with a node type of Identifier, starting slicing by defining slicing standards from the node, and finally obtaining a function call slice of the code to be tested;
s14, vectorizing the program slices, and obtaining vectorized representation of each code slice to be tested by using word2vec, wherein the vector corresponding to each code slice to be tested is a two-dimensional matrix, so as to obtain the semantic vector of each code slice to be tested;
s15, taking the structural information of the codes into consideration, carrying out centrality analysis on each node in the code slice to be tested by using the program dependency graph generated in the step S12, and calculating the characteristic centrality vector of each node to obtain the structural vector of each code slice to be tested;
s16, splicing and combining the semantic vector obtained in the step S14 and the structural vector obtained in the step S15 to form a combined vector;
s17, inputting the combined vector obtained in the step S16 into a key code detection model to detect the key code.
5. The code dynamic security loading method based on key code semantic detection according to claim 3, wherein the method is characterized in that: in step S2, when the remote browser runs the key codes, rendering the running results into static H5 pages, and transmitting the static H5 pages back to the local application and displaying the static H5 pages to the user.
6. The code dynamic safe loading device based on the key code semantic detection is characterized in that: comprises a key code detection module, a code configuration management module and a remote browser module, wherein,
the key code detection module is internally provided with a pre-built key code detection model, and detects the input code to be detected through the key code detection model and identifies key codes and non-key codes in the code to be detected;
the code configuration management module is in signal connection with the key code detection module and is used for carrying out configuration management on the key codes, adding file names of codes to be detected, which are expected to be migrated to the remote browser module to operate, in configuration files and marking the corresponding files to operate in the remote browser module;
the remote browser module is in signal connection with the code configuration management module and is used for reading the configuration file of the code configuration management module and carrying out remote browser isolation loading on the key codes according to the strategy instruction issued by the code configuration management module.
7. The code dynamic secure loading device based on key code semantic detection according to claim 6, wherein: the key code detection model in the key code detection module specifically comprises the following steps: firstly, a code data set is obtained, then a sensitive function table is defined, key API function names related to permission calling and sensitive information calling are used as key words, a code data set containing key API functions and a code data set not containing key API functions are integrated by a box division method based on a sensitive word matching rule, screening of the code data set is completed, preprocessing and code characterization are carried out on the screened code data set, codes are expressed in a vector form, finally, the vectors are input into an LSTM neural network for training of the neural network, and a neural network model after training is completed is a semantic-based key code detection model.
8. The code dynamic secure loading device based on key code semantic detection according to claim 6, wherein: and when the key codes are operated, the remote browser module renders the operation result into a static H5 page, and the static H5 page is returned to the local application and displayed.
9. A computer-readable storage medium, characterized by: the computer readable storage medium stores a computer program which, when executed by a processor, implements the method steps of any of claims 1-5.
10. An electronic device, characterized by: the electronic device comprising a processor and a memory, the memory storing a computer program, which, when executed by the processor, performs the method steps of any of claims 1-5.
CN202310416949.9A 2023-04-19 2023-04-19 Code dynamic safe loading method and device based on key code semantic detection Active CN116167057B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310416949.9A CN116167057B (en) 2023-04-19 2023-04-19 Code dynamic safe loading method and device based on key code semantic detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310416949.9A CN116167057B (en) 2023-04-19 2023-04-19 Code dynamic safe loading method and device based on key code semantic detection

Publications (2)

Publication Number Publication Date
CN116167057A true CN116167057A (en) 2023-05-26
CN116167057B CN116167057B (en) 2023-07-28

Family

ID=86416585

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310416949.9A Active CN116167057B (en) 2023-04-19 2023-04-19 Code dynamic safe loading method and device based on key code semantic detection

Country Status (1)

Country Link
CN (1) CN116167057B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117354067A (en) * 2023-12-06 2024-01-05 南京先维信息技术有限公司 Malicious code detection method and system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10452868B1 (en) * 2019-02-04 2019-10-22 S2 Systems Corporation Web browser remoting using network vector rendering
US20200250254A1 (en) * 2019-02-04 2020-08-06 Cloudflare, Inc. Web browser remoting across a network using draw commands
CN111666576A (en) * 2020-04-29 2020-09-15 平安科技(深圳)有限公司 Data processing model generation method and device and data processing method and device
CN113641934A (en) * 2021-08-05 2021-11-12 吕波 Isolation defense system for website security access
US11245731B1 (en) * 2020-03-21 2022-02-08 Menlo Security, Inc. Protecting web applications from untrusted endpoints using remote browser isolation
CN115269427A (en) * 2022-08-03 2022-11-01 沈阳航空航天大学 Intermediate language representation method and system for WEB injection vulnerability
CN115329320A (en) * 2021-05-10 2022-11-11 武汉安天信息技术有限责任公司 Risk application identification method and device, storage medium and electronic equipment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10452868B1 (en) * 2019-02-04 2019-10-22 S2 Systems Corporation Web browser remoting using network vector rendering
US20200250254A1 (en) * 2019-02-04 2020-08-06 Cloudflare, Inc. Web browser remoting across a network using draw commands
US11245731B1 (en) * 2020-03-21 2022-02-08 Menlo Security, Inc. Protecting web applications from untrusted endpoints using remote browser isolation
CN111666576A (en) * 2020-04-29 2020-09-15 平安科技(深圳)有限公司 Data processing model generation method and device and data processing method and device
CN115329320A (en) * 2021-05-10 2022-11-11 武汉安天信息技术有限责任公司 Risk application identification method and device, storage medium and electronic equipment
CN113641934A (en) * 2021-08-05 2021-11-12 吕波 Isolation defense system for website security access
CN115269427A (en) * 2022-08-03 2022-11-01 沈阳航空航天大学 Intermediate language representation method and system for WEB injection vulnerability

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
杨明: "基于特征向量中心度的程序依赖图顶点分级模型研究", 中国优秀硕士学位论文全文数据库 *
王彤彤;韩文报;王航;: "基于API监控的服务程序防御技术", 计算机工程与科学, no. 07 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117354067A (en) * 2023-12-06 2024-01-05 南京先维信息技术有限公司 Malicious code detection method and system
CN117354067B (en) * 2023-12-06 2024-02-23 南京先维信息技术有限公司 Malicious code detection method and system

Also Published As

Publication number Publication date
CN116167057B (en) 2023-07-28

Similar Documents

Publication Publication Date Title
US11120018B2 (en) Spark query method and system supporting trusted computing
Rathore et al. XSSClassifier: an efficient XSS attack detection approach based on machine learning classifier on SNSs
Huang Hunting the ethereum smart contract: Color-inspired inspection of potential attacks
Xu et al. A novel machine learning‐based analysis model for smart contract vulnerability
Yu et al. Deescvhunter: A deep learning-based framework for smart contract vulnerability detection
CN105069355B (en) The static detection method and device of webshell deformations
CN108090351B (en) Method and apparatus for processing request message
CN106250769B (en) A kind of the source code data detection method and device of multistage filtering
Pan et al. {FlowCog}: Context-aware semantics extraction and analysis of information flow leaks in android apps
Sethi et al. A novel malware analysis framework for malware detection and classification using machine learning approach
Huang et al. JSContana: Malicious JavaScript detection using adaptable context analysis and key feature extraction
CN116167057B (en) Code dynamic safe loading method and device based on key code semantic detection
Li et al. Large-scale third-party library detection in android markets
Taofeek Development of a Novel Approach to Phishing Detection Using Machine Learning
Malviya et al. Development of web browser prototype with embedded classification capability for mitigating Cross-Site Scripting attacks
CN113642025A (en) Interface data processing method, device, equipment and storage medium
EP4137976A1 (en) Learning device, detection device, learning method, detection method, learning program, and detection program
CN114626061A (en) Webpage Trojan horse detection method and device, electronic equipment and medium
Awang et al. Automated security testing framework for detecting SQL injection vulnerability in web application
Beksultanova et al. Analysis tools for smart contract security
CN116361793A (en) Code detection method, device, electronic equipment and storage medium
Pu et al. BERT‐Embedding‐Based JSP Webshell Detection on Bytecode Level Using XGBoost
Gupta et al. POND: polishing the execution of nested context-familiar runtime dynamic parsing and sanitisation of XSS worms on online edge servers of fog computing
Tatarinova et al. Extended vulnerability feature extraction based on public resources
CN114741692A (en) Method, system, equipment and readable storage medium for back door flow identification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant