CN116627466B - Service path extraction method, system, equipment and medium - Google Patents

Service path extraction method, system, equipment and medium Download PDF

Info

Publication number
CN116627466B
CN116627466B CN202310633597.2A CN202310633597A CN116627466B CN 116627466 B CN116627466 B CN 116627466B CN 202310633597 A CN202310633597 A CN 202310633597A CN 116627466 B CN116627466 B CN 116627466B
Authority
CN
China
Prior art keywords
data
graph
network
network security
behavior
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310633597.2A
Other languages
Chinese (zh)
Other versions
CN116627466A (en
Inventor
万顺彬
凌永志
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Hanshuo Information Technology Co ltd
Original Assignee
Shanghai Hanshuo Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Hanshuo Information Technology Co ltd filed Critical Shanghai Hanshuo Information Technology Co ltd
Priority to CN202310633597.2A priority Critical patent/CN116627466B/en
Publication of CN116627466A publication Critical patent/CN116627466A/en
Application granted granted Critical
Publication of CN116627466B publication Critical patent/CN116627466B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • G06F8/65Updates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/71Version control; Configuration management
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention provides a service path extraction method, a system, equipment and a medium, comprising the following steps: acquiring first data and second data, wherein the first data comprises patch data or updated version data of an application program, code writing specification data and middleware repairing data; the second data includes environment data, log data, and middleware version data of the application program; extracting a plurality of network security knowledge maps with different topics from the first data and the second data; carrying out intention data identification on the second data according to the network security knowledge graph to obtain intention data, wherein the intention data represents an optimization target of the application program; logic code is determined based on the intent data and an optimization scheme for the application is obtained based on the logic code. The invention can generate a related optimization scheme, thereby improving the application iteration process, accelerating the stable development of the business and leading the business development of the enterprise to be closer to the user.

Description

Service path extraction method, system, equipment and medium
Technical Field
The present invention relates to the field of computer software development, and in particular, to a method, a system, an apparatus, and a medium for extracting a service path.
Background
In the prior art, only two aspects can be achieved for an application program: and analyzing the business links by adopting the static scanning of the loopholes and the AST abstract syntax tree of the language. The static scanning of the loopholes is a process of filtering and scanning codes one by using known loopholes, and the prior knowledge is difficult to realize; whereas AST abstract syntax trees have no active optimization capabilities.
More specifically, as the internet field is continuously expanded, requirements for bug fixes and security of software and information systems are continuously increased. Therefore, quick and effective security guarantee cannot be obtained in the software development, test and production operation processes, and if evaluation analysis report is carried out by a manual analysis method and static vulnerability scanning, the repair process and the repair process are long, so that potential safety hazards and resource cost consumption are extremely high.
If the anti-serialization vulnerability is utilized to rely on an execution chain to finish attack payload execution, the vulnerability scanning tool generally focuses on detection of known vulnerabilities and detection of unknown vulnerabilities in detection due to characteristics of the anti-serialization vulnerability, and security tools have very limited capabilities and generally need to be discovered by professionals through security audit, code audit and other modes. Wherein java deserialization loopholes depend on two factors: 1. whether the application has the anti-serialization interface and 2, and whether the application contains the components with the loopholes. Thus, the corresponding vulnerability scanning tool also needs to detect based on these two factors. The white-box code auditing tool can search whether the operation of serialization is sent or not in the call chain: the different frameworks of the entries of the call chain are different, for example in the 1.2 example the entry of the call chain is a spring-boot controller. The interface has a serialization operation once the discovery of the serialization operation objectinputstream. But relying on the above information alone is not sufficient to determine if a vulnerability exists, and it is also necessary to determine if there are three-way dependencies of the execution chain in the code. In java, it is common to analyze whether a component contains a vulnerability by analyzing a poxxmlbuild gradle file.
In addition to the above difficulties, because the internet application iterates continuously, the application business process is complicated, business personnel are lost, and after several large iteration versions, the application process and codes become redundant. In this context, a method of analyzing a service link using an AST abstract syntax tree of a language is generated, in which JCTree is a base class of syntax tree elements, and contains an important field pos for indicating a position of a current syntax tree node (JCTree) in the syntax tree, so that it is impossible to directly create the syntax tree node using a new keyword, even if created. In addition, the data structure is decoupled from the processing of the data in conjunction with the visitor pattern. Therefore, the applied business flow is deduced, the business flow is tidied and analyzed to be manually inserted, and the development and test guidance is conducted when a proposal of a new business flow is made.
In summary, a method for extracting a service path with active optimization capability is needed.
Disclosure of Invention
The invention provides a service path extraction method, a system, equipment and a medium, which are used for solving the problems.
The invention provides a service path extraction method, which comprises the following steps:
Acquiring first data and second data, wherein the first data comprises patch data or updated version data of an application program, code writing specification data and middleware repairing data; the second data comprises environment data, log data and middleware version data of the application program;
extracting a plurality of network security knowledge maps with different topics from the first data and the second data;
carrying out intention data identification on the second data according to the network security knowledge graph to obtain intention data, wherein the intention data represents an optimization target of an application program;
logic codes are determined based on the intention data, and an optimization scheme of the application program is obtained based on the logic codes.
According to the method for extracting the service path provided by the invention, the network security knowledge patterns with a plurality of different topics are extracted from the first data and the second data, and the method comprises the following steps:
extracting features of the first data and the second data so as to obtain an entity, a relationship and an attribute;
analyzing network infrastructure, network situation, network threat and task dependence based on the entities, relations and attributes, and acquiring a middleware behavior knowledge graph, a network security behavior traceability graph, an operating system security feature graph and an entity behavior feature graph based on analysis results;
The middleware behavior knowledge graph is obtained based on patch data or updated version data of an application program in the first data; the network security behavior tracing map is obtained based on network protocol information, password information, database information, network security information and programming information in the second data; the operating system security feature map is obtained based on operating system information of the application program in the second data; the entity behavior feature map is obtained based on business entity relation data or operation process data in the second data.
According to the method for extracting the service path provided by the invention, the analysis of network infrastructure, network situation, network threat and task dependence is performed based on the entity, the relationship and the attribute, and the method comprises the following steps:
determining segment information, topology organization and sensor location information of a network based on the entities, relationships and attributes;
determining factors for network attacks on the network infrastructure based on the entities, relationships, and attributes;
determining a threat event stream based on the entity, relationship, and attribute;
acquiring a dependency relationship among the task components based on the entity, the relationship and the attribute;
And analyzing based on the segmentation information, topological organization, position information of the sensors, factors, threat event streams and dependency relations of the network to obtain an analysis result.
According to the service path extraction method provided by the invention, after the middleware behavior knowledge graph, the network security behavior traceability graph, the operating system security feature graph and the entity behavior feature graph are obtained based on the analysis result, the method further comprises the following steps:
and acquiring a malicious traffic detection graph, a traffic behavior knowledge graph, a malicious behavior feature graph and a malicious behavior tracing graph according to the middleware behavior knowledge graph, the network security behavior tracing graph, the operating system security feature graph and the entity behavior feature graph.
According to the service path extraction method provided by the invention, the intention data identification is carried out on the second data according to the network security knowledge graph to obtain the intention data, and the method comprises the following steps:
determining entity information related to the second data as intention data according to the network security knowledge graph;
accordingly, the determining logic code based on the intent data includes:
determining candidate data with semantic relation with the intention data from the network security knowledge graph according to the intention data; and/or
Determining candidate data from the network security knowledge graph by combining the environment of the application program on the basis of the intention data;
and recalling and sequencing the candidate data to obtain target data as logic codes.
According to the method for extracting the service path provided by the invention, the malicious traffic detection graph is obtained according to the middleware behavior knowledge graph, the network security behavior tracing graph, the operating system security feature graph and the entity behavior feature graph, and the method comprises the following steps:
classifying and predicting the flow in the first data by utilizing a pre-trained multi-classifier to obtain a flow attack type predicted value;
and determining a malicious traffic detection graph based on the attack traffic node, the traffic attack type predicted value and the traffic attack type true value corresponding to the traffic.
According to the service path extraction method provided by the invention, before the network security knowledge maps of a plurality of different topics are extracted from the first data and the second data, the method further comprises the following steps:
performing data cleaning on the first data and the second data to obtain cleaned data;
correspondingly, extracting a plurality of network security knowledge maps with different topics from the first data and the second data comprises the following steps:
And extracting and obtaining network security knowledge maps of a plurality of different topics from the cleaned data.
The invention also provides a service path extraction system, which comprises:
the system comprises a data acquisition module, a storage module and a storage module, wherein the data acquisition module is used for acquiring first data and second data, and the first data comprises patch data or updated version data of an application program, code writing specification data and middleware repair data; the second data comprises environment data, log data and middleware version data of the application program;
the knowledge acquisition module is used for extracting a plurality of network security knowledge maps with different topics from the first data and the second data;
the intention recognition module is used for carrying out intention data recognition on the second data according to the network security knowledge graph to obtain intention data, and the intention data represents an optimization target of an application program;
and the optimization scheme acquisition module is used for determining logic codes based on the intention data and acquiring an optimization scheme of the application program based on the logic codes.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes any service path extraction method when executing the program.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a traffic path extraction method as any one of the above.
According to the business path extraction method, system, equipment and medium, the network security knowledge patterns of a plurality of different topics are extracted from the first data and the second data, the intention data are determined based on the network security knowledge patterns, and the corresponding logic codes are obtained by matching, so that a related optimization scheme is generated, the application iteration process is improved, the stable development of business is accelerated, and the business development of enterprises is closer to users. In addition, the invention changes the vulnerability scanning and repairing process of the existing business from a passive process to an active process; and the business process is subjected to machine learning operation method to actively generate business process optimization for business personnel, a business process map is recommended, and the business personnel is assisted in business optimization or directly release the business process. The method provides an efficient business process conversion process for enterprises and an active defense technology of production environment, and provides an advantageous and efficient means for cost reduction and synergy of the enterprises.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of a service path extraction method according to an embodiment of the present invention;
FIG. 2 is a second flow chart of a method for extracting a service path according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a malicious traffic detection graph provided by an embodiment of the present invention;
FIG. 4 is a schematic diagram of a flow behavior knowledge graph provided by an embodiment of the present invention;
FIG. 5 is a schematic diagram of a malicious behavior feature map provided by an embodiment of the present invention;
FIG. 6 is a schematic diagram of a malicious behavior traceability graph provided by an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a service path extraction system according to an embodiment of the present invention;
fig. 8 is a schematic entity structure diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Fig. 1 is a schematic flow chart of a service path extraction method according to an embodiment of the present invention; FIG. 2 is a second flow chart of a method for extracting a service path according to an embodiment of the present invention; as shown in fig. 1 and 2, the service path extraction method includes:
s101, acquiring first data and second data.
Wherein the first data (i.e., external data) includes patch data or updated version data of an application program, code writing specification data, and middleware repair data. That is, each large middleware vendor distributes patch or version data for security; code writing specifications issued by Internet factories; middleware repairing scheme and repairing flow.
The second data (i.e., internal data) includes environment data, log data, and middleware version data of the application. That is, application-related data, such as: application environment data, log data, middleware version data, etc.
In this step, the first data and the second data are acquired in real time, specifically, after the updated version of the application program is obtained, the user side updates the application program according to the updated version data, so as to obtain data such as a log during the running period of the application program, where the first data is associated with the second data.
S102, extracting a plurality of network security knowledge maps with different topics from the first data and the second data.
In this step, the knowledge graph is constructed based on the first data and the second data, where the knowledge graph construction process is layered construction, and specifically includes obtaining a network infrastructure, obtaining a network situation, obtaining network threat information, determining a dependency relationship between task components, and the like, and obtaining knowledge graphs under different dimensions related to network security, such as a middleware behavior knowledge graph, an operating system security feature graph, and the like, based on the foregoing information.
And S103, carrying out intention data identification on the second data according to the network security knowledge graph to obtain intention data. Wherein the intent data characterizes an optimization objective of the application.
S104, determining logic codes based on the intention data, and acquiring an optimization scheme of the application program based on the logic codes.
It should be noted that, after the optimization scheme of the application program is obtained, a relevant technician determines whether to optimize the application program by adopting the optimization scheme, for example, the optimization scheme is an upgrade of the application program, in the optimization scheme, a published application program upgrade package and an application program difference of an existing version in the user terminal are displayed, and information such as an operating environment required for upgrade, a code to be upgraded and the like is required for upgrade, and the relevant technician determines whether to need upgrade according to the program information disclosed in the optimization scheme.
According to the business path extraction method provided by the embodiment of the invention, the network security knowledge patterns of a plurality of different topics are extracted from the first data and the second data, the intention data is determined based on the network security knowledge patterns, and the corresponding logic codes are obtained by matching, so that a related optimization scheme is generated, the application iteration process is improved, the stable development of business is accelerated, and the business development of enterprises is closer to users. In addition, the invention changes the vulnerability scanning and repairing process of the existing business from a passive process to an active process; and the business process is subjected to machine learning operation method to actively generate business process optimization for business personnel, a business process map is recommended, and the business personnel is assisted in business optimization or directly release the business process. The method provides an efficient business process conversion process for enterprises and an active defense technology of production environment, and provides an advantageous and efficient means for cost reduction and synergy of the enterprises.
Further, on the basis of the foregoing embodiment, the extracting network security knowledge maps of a plurality of different topics from the first data and the second data includes:
And extracting the characteristics of the first data and the second data so as to obtain an entity, a relationship and an attribute.
And analyzing network infrastructure, network situation, network threat and task dependence based on the entities, the relations and the attributes, and acquiring a middleware behavior knowledge graph, a network security behavior traceability graph, an operating system security feature graph and an entity behavior feature graph based on analysis results.
The middleware behavior knowledge graph is obtained based on patch data or updated version data of an application program in the first data; the network security behavior tracing map is obtained based on network protocol information, password information, database information, network security information and programming information in the second data; the operating system security feature map is obtained based on operating system information of the application program in the second data; the entity behavior feature map is obtained based on business entity relation data or operation process data in the second data.
In this embodiment, the construction is in the form of triplets as shown in formula (1).
Wherein,representing a collection of entities in a graph, which is a specific representation of a graph storage object, comprising i different entities altogether; / >The collection of relations in the expression map is a specific expression of the related object of the map and contains j different relations altogether; />Representing a collection of attributes in a graph is a specific representation of the graph store data, each entity e or relationship p may possess a different n attributes.
The knowledge graph mainly comprises an entity, a relation and an attribute 3, wherein the entity is the most main element and represents an aggregate of data; the attribute comprises an attribute name and an attribute value, wherein the attribute name represents the characteristics and features of the object, and the attribute value is a value designated by the corresponding attribute; the relationship is used to connect two entities, representing the association between the entities.
And forming knowledge maps of different topics according to the entity, the attribute and the relation extracted from the first data and the second data. The method specifically comprises a middleware behavior knowledge graph, a network security behavior traceability graph, an operating system security feature graph and an entity behavior feature graph.
The middleware behavior knowledge graph is formed by collecting middleware between various applications/services and an operating system/database system and other system software, mainly solves the problems of data transmission, data access, application scheduling, system construction, system integration, flow management and the like in a distributed environment, is a platform for supporting application development, operation and integration in the distributed environment, can realize interconnection and intercommunication among the systems, and helps users develop the application software efficiently.
The network security behavior traceability map means that the software and the data in the system are protected, the software is not damaged, changed and leaked due to accidental or malicious reasons, the system continuously and reliably operates normally, and network service is not interrupted.
The operating system security feature map is a tool for organizing computer equipment and is responsible for information transmission, equipment storage space management and scheduling of various system resources. The operating system acts as a software platform for the application system, and its security is directly related to the security of the application system.
The physical behavior feature pattern is obtained through physical behavior analysis, which is a network security process that records the normal behavior of the user. Conversely, when there is a deviation from these "normal" modes, they will detect any abnormal behavior or instance.
The construction process of the more specific knowledge graph comprises the following steps:
firstly, external data (namely first data) for constructing the application program related knowledge graph is acquired, and the data for constructing the knowledge graph at present are divided into two types. One type is data of an open domain, and the other type is data of a vertical domain. The data of the open domain includes data established by a google and hundred-degree search engine, such as middleware upgrade articles and malicious behaviors released by individuals. The data in the vertical field comprises upgrade data issued by the NGINX official website, upgrade data issued by WebLogic middleware oracle, network security data issued by the national network security center official network, and the like.
The middleware behavior knowledge graph is constructed based on the first data, and specifically comprises the following steps: NGINX, weblogic, redis, rabbitmq, kafka, mySQL, oracle, mongoDB, and the like, and article data established by a google and hundred-degree search engine.
The network security behavior traceability map is constructed based on the second data, and specifically comprises the following steps: 1. network protocol: knowledge of the format, structure, fields, etc., of the various network protocols, such as TCP/IP, HTTP, SMTP, FTP, etc. 2. Cryptography: to various encryption algorithms, digital signatures, certificates, etc., such as RSA, AES, SHA, MD, etc. 3. Database: to knowledge of the structure, operation, queries, etc., of various databases. 4. Network security: the method relates to knowledge such as various celebration network attacks, defenses, loopholes and the like, such as DDoS attacks, SQL injection, XSS attacks and the like. 5. Programming reputation carrier: knowledge of the syntax, data type, function library, etc., of the various programming languages, such as C, java, python, etc.
The operating system security feature map is constructed based on the second data, and specifically comprises the following steps: knowledge of network configuration, security settings, system logs, etc., such as Windows, linux, unix, etc., involving various operating systems.
The entity behavior feature map is constructed based on the second data, and specifically comprises the following steps: the enterprise internal data is business entity relationship data or operation process data in the system.
After the first data and the second data are obtained, the first data and the second data need to be processed, specifically, data capture on the internet is performed by adopting the python language, and the captured data are unstructured and semi-structured. Structured data can also be obtained from authorities in the form of payment. The structure data is generally business data of a company, namely, the data is stored in a database, and the data is extracted from the database by adopting an ETL mode and can be used by performing some simple preprocessing. A description of a certain class of middleware upgrades, or a title, may be a piece of text or a piece of picture, which is some unstructured data. It stores some information, and reflects some attributes in the knowledge graph. It is necessary to extract the text data in a structured form by extracting it using, in this embodiment, the python language for keyword capture and lexical analysis of the text data.
Next, the information of entities, attributes, relationships needs to be extracted from the structured data. The extraction of the entity is named entity recognition inside the NLP. Because the named entity recognition algorithm is relatively mature, it is not described in detail herein. Entity alignment and entity disambiguation are also required after the entity is identified.
After the above steps are completed, ontology extraction is performed based on the identified entity. An ontology library is then built based on the extracted ontologies, for example, the company belongs to an organization, which has an up-and-down relationship. The need to calculate the degree of acquaintance between entities for the level, such as the bitmaps and Qiao Busi, is at the entity level, belonging to two entities that are relatively similar, both belonging to the person.
The ontology library, namely the knowledge library, is obtained in the mode, and the quality of the obtained knowledge library is evaluated, particularly by related technicians. After the quality evaluation is completed, a knowledge graph is finally formed. After the knowledge graph is formed, some relations may not be directly obtained, and then knowledge reasoning needs to be performed, which can expand the knowledge graph. For example, a cat is a feline, and the feline is a mammal. This can be inferred that the cat is a mammal. But this reasoning is not arbitrary. For example, the doctor of the United states is a person who creates a company, but this company is not necessarily the United states.
Further, on the basis of the foregoing embodiment, the analyzing the network infrastructure, the network situation, the network threat and the task dependency based on the entity, the relationship and the attribute includes:
Segment information, topology organization, and location information of the sensors of the network are determined based on the entities, relationships, and attributes.
Factors for network attacks on the network infrastructure are determined based on the entities, relationships, and attributes.
A threat event stream is determined based on the entities, relationships, and attributes.
And acquiring the dependency relationship among the task components based on the entity, the relationship and the attribute.
And analyzing based on the segmentation information, topological organization, position information of the sensors, factors, threat event streams and dependency relations of the network to obtain an analysis result.
In this embodiment, the network security knowledge map maps a potential attack path through the network, adds any network attribute that may cause successful attack, and performs analysis by using various tools on the basis of fusing various data source information to perform dynamic evolution, and the specific construction process is gradually constructed according to the following hierarchy:
a network infrastructure layer that captures how the network is segmented and topologically organized, the location of sensors, etc.;
the network situation layer mainly considers factors which possibly influence network attack/defense in network facilities;
a cyber threat layer describing potential adversary threats for defending situation, which includes threat reports and event streams of alarms and other behavioral indicators;
And a task dependency layer for capturing the dependency relationship among various task components, middleware, vulnerability upgrade of an operating system and the like.
Further, on the basis of the foregoing embodiment, after the middleware behavior knowledge graph, the network security behavior traceability graph, the operating system security feature graph, and the entity behavior feature graph are obtained based on the analysis result, the method further includes:
and acquiring a malicious traffic detection graph, a traffic behavior knowledge graph, a malicious behavior feature graph and a malicious behavior tracing graph according to the middleware behavior knowledge graph, the network security behavior tracing graph, the operating system security feature graph and the entity behavior feature graph.
The malicious traffic detection graph is obtained by the following steps:
and classifying and predicting the flow in the first data by utilizing a pre-trained multi-classifier to obtain a flow attack type predicted value.
And determining a malicious traffic detection graph based on the attack traffic node, the traffic attack type predicted value and the traffic attack type true value corresponding to the traffic.
Specifically, in this embodiment, malicious traffic detection is DDoS attack detection, and a corresponding malicious traffic detection diagram is shown in fig. 3, where dark gray nodes in fig. 3 are DDoS attack traffic nodes, each DDoS attack traffic node represents one piece of flow information collected from a detection module (referred to an operating system, middleware, security knowledge information issued by network security, may be acquired by an information acquisition mode, an interface transmission mode, etc.), an attribute of a node includes 84 kinds of flow features output by a ciclowmeter, and a label pre_label for predicting the type of flow attack and a label real_label for representing the type of flow Real attack output by a multi-classifier (information data acquisition, program component for performing classification refinement of a subject in a transmission process).
According to the attack type, the nodes are divided into attack type nodes and normal flow nodes, light gray is the attack type nodes, white is the normal flow nodes, and the two nodes comprise 19 finely-classified DDoS attack types and 1 normal flow type detected by the detection module. These two nodes are used to classify the offending traffic nodes in dark grey.
In fig. 3, there are different edges between nodes to show the relationship between nodes, where real line is the real label relationship and prediction line is the predicted label relationship. Both relationships are directed by dark grey DDoS attack flow nodes to light grey attack type nodes or white normal traffic nodes, which are used to indicate the predicted type and the true type of the stored DDoS attack flow nodes. The visualization display of the malicious traffic detection graph can quickly inquire the information of different DDoS attack types in the detection accuracy and the misjudgment direction.
The flow behavior knowledge graph is shown in fig. 4, the light gray nodes represent the type of the CAPEC attack, the dark gray nodes represent the type of the CWE vulnerability, and the light gray nodes are linked together through the relation of the related_weaknesses; each entity contains a name, an ID, a description, 5 attributes and attribute values for causing harm and alleviating measures. On the basis of keeping the original correlation, the correlation is created based on the associated weak point name in the attack node description to connect the attack node and the weak point node, and a knowledge graph of the correlation of the attack and the weak point is constructed. When the attack and the vulnerability are inquired, all the connection relations can be related and inquired.
The malicious behavior feature map is shown in fig. 5, which stores 21 DDoS attacks detected and 69 traffic features, as well as relationships between entities. In fig. 5, white nodes represent DDoS attacks, black nodes represent 5 types of attack types of DDoS attacks, light gray nodes represent 21 types of attack types included in each of the 5 types of attack types, and dark gray nodes represent important traffic characteristics corresponding to the attack types.
The malicious behavior traceability graph is shown in fig. 6, and comprises 4 entity types, namely malicious nodes, attacked nodes, malicious attack types and time slots. The malicious source node comprises an IP address and a source port number, points to a malicious attack type and points to an attacked node, and the type of a transmission protocol is defined in the relation attribute; the attacked node also contains its IP address and its port number. In addition, all host nodes display the change rule of the attack along with time by pointing to the corresponding time period so as to comb the relation between the attack and the time. In fig. 6, dark gray nodes represent attack nodes, white nodes represent attacked nodes, black nodes represent attack modes used by the dark gray attack nodes, and two light gray nodes represent attack start time and attack end time respectively. The whole graph depicts an attack starting time point, an attack node used for calling an ACKFlooding attack mode initiates an attack to an attacked node, and finally the whole process of the attack is ended.
In addition, after the knowledge graph is constructed based on the code of the Cypher language, the knowledge graph is displayed, specifically, the data model is visualized based on the knowledge graph constructed by the Neo4j graph database, and the structure and the link relation of the data are displayed in the form of a node link graph. The search engine contained in Neo4j can query relevant data in a targeted manner according to requirements and expand selected data entities so as to display all relevant data.
Script files of a knowledge graph data import interface and a data export interface are written based on Neo4jCypherAPI and JAVA development language, so that key data interaction and associated query between knowledge graphs are realized. The experimental result shows that the knowledge graph can normally conduct data import and data export.
Through the malicious traffic detection graph, the traffic behavior knowledge graph, the malicious behavior feature graph and the malicious behavior traceability graph, malicious behavior detection can be accurately completed, and network safety is improved.
Further, on the basis of the foregoing embodiment, the identifying the intention data according to the network security knowledge graph to the second data, to obtain intention data includes:
determining entity information related to the second data as intention data according to the network security knowledge graph;
Accordingly, the determining logic code based on the intent data includes:
determining candidate data with semantic relation with the intention data from the network security knowledge graph according to the intention data; and/or
Determining candidate data from the network security knowledge graph by combining the environment of the application program on the basis of the intention data;
and recalling and sequencing the candidate data to obtain target data as logic codes.
In this embodiment, the established middleware behavior knowledge graph, the network security behavior traceability graph, the operating system security feature graph and the entity behavior feature graph data are used to add into the recommendation system. The introduction of knowledge graph data in the recommendation algorithm is equivalent to the introduction of semantic association relations (that is, the relation between entities, namely middleware, network protocol and the like) and various knowledge entities, so that data points can be mined from the aspect of semantics by recommendation, recommendation results are more divergent, and single recommendation result is avoided.
There are three methods for using the knowledge graph in the recommendation system based on the knowledge graph:
(1) Based on the embedded method, the low-dimensional vector obtained by learning the features of the knowledge graph is directly used for enriching the features serving as users (internal systems comprising an operating system, middleware, network protocols and malicious behaviors) and items (items, here recommended things), and semantic relations in the knowledge graph are brought into a recommendation algorithm;
(2) Constructing a user (internal system comprising an operating system, middleware, network protocol and malicious behavior) -project graph, recommending by using a connection mode of an entity in the graph, and enhancing recommendation by using the connectivity similarity of the user (internal system comprising the operating system, middleware, network protocol and malicious behavior) and the project;
(3) The unified method is a unified of the two methods, and is a method for combining semantic representation of entities and relations with connectivity information for application. The unified approach is based on the idea of embedding propagation.
The recommendation algorithm is a commonly used recall algorithm as follows:
the recommended prediction accuracy is typically measured by accuracy/recall.
R (u) is a list of recommendations made to the user based on the user's behavior on the training set, and T (u) is a list of user's behavior on the test set. Then, recall of the recommended results is defined as:
Accuracy of the recommended results is defined as:
where U is the user and U is the user set.
Coverage describes the ability of a recommender system to mine the long tail of an item. The coverage rate is defined by different methods, and the simplest definition is that the recommendation system can recommend the proportion of the articles to the total article set. Assuming that the user set of the system is U, the recommendation system recommends an item list R (U) with a length of N to each user, wherein I is the total set of items. The coverage of the recommender system may be calculated by the following formula:
the effect and performance of the recommendation algorithm adopted in the recommendation system can be measured by the following common recommendation system evaluation indexes:
the F1 score is one of the evaluation indexes of the recommendation algorithm. It is defined as the harmonic mean of the precision and recall, which can more fairly reflect the effectiveness of the recommendation algorithm. The calculation formula of the F1 fraction is as follows:
precision represents precision, recovery represents recall, and both of these calculations are related to the confusion matrix.
And carrying out logic code generation and display through the recall and sequencing algorithm.
Further, on the basis of the foregoing embodiment, before the extracting a plurality of network security knowledge maps of different topics from the first data and the second data, the method further includes:
And performing data cleaning on the first data and the second data to obtain cleaned data.
Correspondingly, extracting a plurality of network security knowledge maps with different topics from the first data and the second data comprises the following steps:
and extracting and obtaining network security knowledge maps of a plurality of different topics from the cleaned data.
In this embodiment, the first data and the second data are firstly structured and unstructured, and then data processing is performed in a big data acquisition and cleaning mode; the data can be collected in a common mode such as Chukwa, kafka, flink, and the data are cleaned by adopting Pandas, sklearn libraries.
The service path extraction system provided by the invention is described below, and the service path extraction system described below and the service path extraction method described above can be referred to correspondingly.
Fig. 7 is a schematic structural diagram of a service path extraction system according to an embodiment of the present invention, as shown in fig. 7, a service path extraction system includes:
the data acquisition module 701 acquires the first data and the second data.
The first data comprises patch data or updated version data of an application program, code writing specification data and middleware repair data. That is, each large middleware vendor distributes patch or version data for security; code writing specifications issued by Internet factories; middleware repairing scheme and repairing flow.
The second data includes environment data, log data, and middleware version data of the application program. That is, application-related data, such as: application environment data, log data, middleware version data, etc.
In the module, first data and second data are acquired in real time, specifically, after an updated version of an application program is acquired, a user side updates the application program according to the updated version data, so that data such as a log and the like during the running of the application program are acquired, and the first data are associated with the second data.
The knowledge acquisition module 702 extracts a plurality of network security knowledge maps with different topics from the first data and the second data.
In the module, the knowledge graph is constructed based on the first data and the second data, wherein the knowledge graph construction process is layered construction, and the knowledge graph construction process specifically comprises the steps of acquiring network infrastructure, acquiring network situation, acquiring network threat information, determining the dependency relationship among each task component and the like, and acquiring knowledge graphs, such as middleware behavior knowledge graphs, operating system security feature graphs and the like, under different dimensionalities related to network security based on the information.
And the intention recognition module 703 is used for carrying out intention data recognition on the second data according to the network security knowledge graph to obtain intention data. Wherein the intent data characterizes an optimization objective of the application.
The optimization scheme acquisition module 704 determines a logic code based on the intention data, and acquires an optimization scheme of an application program based on the logic code.
According to the service path extraction system provided by the embodiment of the invention, the network security knowledge patterns of a plurality of different topics are extracted from the first data and the second data, the intention data is determined based on the network security knowledge patterns, and the corresponding logic codes are obtained by matching, so that a related optimization scheme is generated, the application iteration process is improved, the stable development of the service is accelerated, and the service development of enterprises is closer to users. In addition, the invention changes the vulnerability scanning and repairing process of the existing business from a passive process to an active process; and the business process is subjected to machine learning operation method to actively generate business process optimization for business personnel, a business process map is recommended, and the business personnel is assisted in business optimization or directly release the business process. The method provides an efficient business process conversion process for enterprises and an active defense technology of production environment, and provides an advantageous and efficient means for cost reduction and synergy of the enterprises.
Fig. 8 is a schematic physical structure of an electronic device according to an embodiment of the present invention, as shown in fig. 8, where the electronic device may include: processor 810 (processor), communication interface 820 (communication interface), memory 830 (memory) and communication bus 840, wherein processor 810, communication interface 820, memory 830 accomplish communication with each other through communication bus 840. Processor 88 may invoke logic instructions in memory 830 to perform a traffic path extraction method comprising: acquiring first data and second data, wherein the first data comprises patch data or updated version data of an application program, code writing specification data and middleware repairing data; the second data comprises environment data, log data and middleware version data of the application program; extracting a plurality of network security knowledge maps with different topics from the first data and the second data; carrying out intention data identification on the second data according to the network security knowledge graph to obtain intention data, wherein the intention data represents an optimization target of an application program; logic codes are determined based on the intention data, and an optimization scheme of the application program is obtained based on the logic codes.
Further, the logic instructions in the memory 830 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method of the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-only memory (ROM), a random access memory (RAM, randomAccessMemory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In still another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the service path extraction method provided by the above method, the service path extraction method comprising: acquiring first data and second data, wherein the first data comprises patch data or updated version data of an application program, code writing specification data and middleware repairing data; the second data comprises environment data, log data and middleware version data of the application program; extracting a plurality of network security knowledge maps with different topics from the first data and the second data; carrying out intention data identification on the second data according to the network security knowledge graph to obtain intention data, wherein the intention data represents an optimization target of an application program; logic codes are determined based on the intention data, and an optimization scheme of the application program is obtained based on the logic codes.
The system embodiments described above are merely illustrative, in which elements illustrated as separate elements may or may not be physically separate, and elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on such understanding, the foregoing technical solutions may be embodied essentially or in part in the form of a software product, which may be stored in a computer-readable storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the various embodiments or methods of some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (8)

1. A method for extracting a service path, comprising:
acquiring first data and second data, wherein the first data comprises patch data or updated version data of an application program, code writing specification data and middleware repairing data; the second data comprises environment data, log data and middleware version data of the application program;
extracting a plurality of network security knowledge maps with different topics from the first data and the second data;
carrying out intention data identification on the second data according to the network security knowledge graph to obtain intention data, wherein the intention data represents an optimization target of an application program;
Determining logic codes based on the intention data, and acquiring an optimization scheme of an application program based on the logic codes;
the extracting network security knowledge maps with a plurality of different topics from the first data and the second data comprises the following steps:
extracting features of the first data and the second data so as to obtain an entity, a relationship and an attribute;
analyzing network infrastructure, network situation, network threat and task dependence based on the entities, relations and attributes, and acquiring a middleware behavior knowledge graph, a network security behavior traceability graph, an operating system security feature graph and an entity behavior feature graph based on analysis results;
the middleware behavior knowledge graph is obtained based on patch data or updated version data of an application program in the first data; the network security behavior tracing map is obtained based on network protocol information, password information, database information, network security information and programming information in the second data; the operating system security feature map is obtained based on operating system information of the application program in the second data; the entity behavior feature map is obtained based on business entity relation data or operation process data in the second data;
The identifying the intention data according to the network security knowledge graph, and obtaining the intention data includes:
determining entity information related to the second data as intention data according to the network security knowledge graph;
accordingly, the determining logic code based on the intent data includes:
determining candidate data with semantic relation with the intention data from the network security knowledge graph according to the intention data; and/or
Determining candidate data from the network security knowledge graph by combining the environment of the application program on the basis of the intention data;
and recalling and sequencing the candidate data to obtain target data as logic codes.
2. The traffic path extraction method according to claim 1, wherein said analyzing network infrastructure, network situation, network threat, and task dependency based on said entities, relationships, and attributes comprises:
determining segment information, topology organization and sensor location information of a network based on the entities, relationships and attributes;
determining factors for network attacks on the network infrastructure based on the entities, relationships, and attributes;
Determining a threat event stream based on the entity, relationship, and attribute;
acquiring a dependency relationship among the task components based on the entity, the relationship and the attribute;
and analyzing based on the segmentation information, topological organization, position information of the sensors, factors, threat event streams and dependency relations of the network to obtain an analysis result.
3. The method according to claim 1, wherein after the obtaining the middleware behavior knowledge graph, the network security behavior trace-source graph, the operating system security feature graph, and the entity behavior feature graph based on the analysis result, the method further comprises:
and acquiring a malicious traffic detection graph, a traffic behavior knowledge graph, a malicious behavior feature graph and a malicious behavior tracing graph according to the middleware behavior knowledge graph, the network security behavior tracing graph, the operating system security feature graph and the entity behavior feature graph.
4. The service path extraction method according to claim 3, wherein obtaining a malicious traffic detection graph according to the middleware behavior knowledge graph, the network security behavior traceability graph, the operating system security feature graph and the entity behavior feature graph comprises:
Classifying and predicting the flow in the first data by utilizing a pre-trained multi-classifier to obtain a flow attack type predicted value;
and determining a malicious traffic detection graph based on the attack traffic node, the traffic attack type predicted value and the traffic attack type true value corresponding to the traffic.
5. The traffic path extraction method according to any one of claims 1 to 4, wherein before extracting a plurality of network security knowledge patterns of different topics from the first data and the second data, the method further comprises:
performing data cleaning on the first data and the second data to obtain cleaned data;
correspondingly, extracting a plurality of network security knowledge maps with different topics from the first data and the second data comprises the following steps:
and extracting and obtaining network security knowledge maps of a plurality of different topics from the cleaned data.
6. A traffic path extraction system, comprising:
the system comprises a data acquisition module, a storage module and a storage module, wherein the data acquisition module is used for acquiring first data and second data, and the first data comprises patch data or updated version data of an application program, code writing specification data and middleware repair data; the second data comprises environment data, log data and middleware version data of the application program;
The knowledge acquisition module is used for extracting a plurality of network security knowledge maps with different topics from the first data and the second data;
the intention recognition module is used for carrying out intention data recognition on the second data according to the network security knowledge graph to obtain intention data, and the intention data represents an optimization target of an application program;
the optimization scheme acquisition module is used for determining logic codes based on the intention data and acquiring an optimization scheme of the application program based on the logic codes;
the extracting network security knowledge maps with a plurality of different topics from the first data and the second data comprises the following steps:
extracting features of the first data and the second data so as to obtain an entity, a relationship and an attribute;
analyzing network infrastructure, network situation, network threat and task dependence based on the entities, relations and attributes, and acquiring a middleware behavior knowledge graph, a network security behavior traceability graph, an operating system security feature graph and an entity behavior feature graph based on analysis results;
the middleware behavior knowledge graph is obtained based on patch data or updated version data of an application program in the first data; the network security behavior tracing map is obtained based on network protocol information, password information, database information, network security information and programming information in the second data; the operating system security feature map is obtained based on operating system information of the application program in the second data; the entity behavior feature map is obtained based on business entity relation data or operation process data in the second data;
The identifying the intention data according to the network security knowledge graph, and obtaining the intention data includes:
determining entity information related to the second data as intention data according to the network security knowledge graph;
accordingly, the determining logic code based on the intent data includes:
determining candidate data with semantic relation with the intention data from the network security knowledge graph according to the intention data; and/or
Determining candidate data from the network security knowledge graph by combining the environment of the application program on the basis of the intention data;
and recalling and sequencing the candidate data to obtain target data as logic codes.
7. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the traffic path extraction method according to any one of claims 1 to 5 when executing the program.
8. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the traffic path extraction method according to any of claims 1 to 5.
CN202310633597.2A 2023-05-31 2023-05-31 Service path extraction method, system, equipment and medium Active CN116627466B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310633597.2A CN116627466B (en) 2023-05-31 2023-05-31 Service path extraction method, system, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310633597.2A CN116627466B (en) 2023-05-31 2023-05-31 Service path extraction method, system, equipment and medium

Publications (2)

Publication Number Publication Date
CN116627466A CN116627466A (en) 2023-08-22
CN116627466B true CN116627466B (en) 2024-01-26

Family

ID=87641499

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310633597.2A Active CN116627466B (en) 2023-05-31 2023-05-31 Service path extraction method, system, equipment and medium

Country Status (1)

Country Link
CN (1) CN116627466B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110688456A (en) * 2019-09-25 2020-01-14 北京计算机技术及应用研究所 Vulnerability knowledge base construction method based on knowledge graph
CN115186015A (en) * 2022-09-13 2022-10-14 广东财经大学 Network security knowledge graph construction method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180159876A1 (en) * 2016-12-05 2018-06-07 International Business Machines Corporation Consolidating structured and unstructured security and threat intelligence with knowledge graphs

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110688456A (en) * 2019-09-25 2020-01-14 北京计算机技术及应用研究所 Vulnerability knowledge base construction method based on knowledge graph
CN115186015A (en) * 2022-09-13 2022-10-14 广东财经大学 Network security knowledge graph construction method and system

Also Published As

Publication number Publication date
CN116627466A (en) 2023-08-22

Similar Documents

Publication Publication Date Title
CN109347801B (en) Vulnerability exploitation risk assessment method based on multi-source word embedding and knowledge graph
CN113647078B (en) Method, device and computer readable storage medium for managing security events
Naway et al. A review on the use of deep learning in android malware detection
CN112131882A (en) Multi-source heterogeneous network security knowledge graph construction method and device
US7530105B2 (en) Tactical and strategic attack detection and prediction
Alqahtani et al. Sv-af—a security vulnerability analysis framework
Feng et al. Automated detection of password leakage from public github repositories
CN114528457A (en) Web fingerprint detection method and related equipment
CN113886829B (en) Method and device for detecting defect host, electronic equipment and storage medium
Nour et al. A survey on threat hunting in enterprise networks
CN114491513A (en) Knowledge graph-based block chain intelligent contract reentry attack detection system and method
Amarasinghe et al. AI based cyber threats and vulnerability detection, prevention and prediction system
US20240054210A1 (en) Cyber threat information processing apparatus, cyber threat information processing method, and storage medium storing cyber threat information processing program
US20230396638A1 (en) Adaptive system for network and security management
US20230048076A1 (en) Cyber threat information processing apparatus, cyber threat information processing method, and storage medium storing cyber threat information processing program
CN116627466B (en) Service path extraction method, system, equipment and medium
Al-Sada et al. MITRE ATT&CK: State of the Art and Way Forward
Swathy Akshaya et al. Zero-Day Attack Path Identification using Probabilistic and Graph Approach based Back Propagation Neural Network in Cloud
US20240214406A1 (en) Cyber threat information processing apparatus, cyber threat information processing method, and storage medium storing cyber threat information processing program
US20240054215A1 (en) Cyber threat information processing apparatus, cyber threat information processing method, and storage medium storing cyber threat information processing program
US20240214396A1 (en) Cyber threat information processing apparatus, cyber threat information processing method, and storage medium storing cyber threat information processing program
US20230252146A1 (en) Cyber threat information processing apparatus, cyber threat information processing method, and storage medium storing cyber threat information processing program
US20240211595A1 (en) Cyber threat information processing apparatus, cyber threat information processing method, and storage medium storing cyber threat information processing program
US20230252143A1 (en) Apparatus for processing cyber threat information, method for processing cyber threat information, and medium for storing a program processing cyber threat information
Lavieille et al. IsoEx: an explainable unsupervised approach to process event logs cyber investigation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant