WO2023246146A1

WO2023246146A1 - Target security recognition method and apparatus based on optimization rule decision tree

Info

Publication number: WO2023246146A1
Application number: PCT/CN2023/077880
Authority: WO
Inventors: 鲁文娜; 王垚炜; 沈赟
Original assignee: 上海淇玥信息技术有限公司
Priority date: 2022-06-23
Filing date: 2023-02-23
Publication date: 2023-12-28
Also published as: CN115310510A

Abstract

A target security recognition method and apparatus based on an optimization rule decision tree. The method comprises: respectively generating a corresponding logical character string by means of underlying logical data of each node of a rule decision tree (S202); determining the relationship between the logical character strings according to a tree structure of the rule decision tree (S204); generating a rule structure diagram according to the relationship between the logical character strings (S206); according to the rule structure diagram, respectively determining the importance of the relationship between the logical character strings (S208); optimizing the rule decision tree according to the importance of the relationship between the logical character strings (S210); and recognizing, by means of the optimized rule decision tree, target data of a target to be subjected to recognition, and performing security grading on said target according to a recognition result (S212). By means of the method, a complex rule decision tree can be simplified, thereby improving the decision-making efficiency of a service, and ensuring the security of service data; and when an error occurs in the service data, the degree of influence can also be quickly calculated, thereby ensuring the running security of the service.

Description

Target safety identification method and device based on optimized rule decision tree

Technical field

The present application relates to the field of computer information processing, specifically, to a target security identification method, device, electronic equipment and computer-readable medium based on an optimized rule decision tree.

Background technique

In the existing rule decision tree, because the crowd is large and there are many categories, there are also many process branches, nodes on the branches, and rules and models under the nodes, resulting in a very large structure.

It is precisely because of the complex structure of the rule decision tree that when updating the risk control strategy on a daily basis, it is afraid of affecting other branches and ultimately causing worse effects. Therefore, in general, only rules are added to the rule decision tree, and rules are rarely reduced. . If things go on like this, the rule decision tree will become more and more complex, and later maintenance will be very troublesome. Moreover, when the rule decision tree is run online in the business system, once a problem occurs in a certain data source and causes a business error, and the error needs to be located, engineers need to test all models that use the data source and re-perform the scoring test and evaluation. , very time and energy consuming.

Therefore, a new target security identification method, device, electronic equipment and computer-readable medium based on an optimized rule decision tree are needed.

The above information disclosed in the Background section is only for enhancement of understanding of the context of the application and therefore it may contain information that does not form the prior art that is already known to a person of ordinary skill in the art.

Contents of the invention

In view of this, this application provides a target security identification method, device, electronic equipment and computer-readable medium based on an optimized rule decision tree, which can simplify the complex rule decision tree, improve business decision-making efficiency, and ensure business data security; It can also quickly calculate the impact when errors occur in business data to ensure safe business operations.

Additional features and advantages of the invention will be apparent from the detailed description which follows, or, in part, may be learned by practice of the invention.

According to one aspect of the present application, a target security identification method based on an optimized rule decision tree is proposed. The method includes: generating corresponding logical strings respectively through the underlying logical data of each node of the rule decision tree; according to the tree shape of the rule decision tree The structure determines the relationship between the logical strings; generates a rule structure diagram based on the relationship between the logical strings; determines the importance of the relationship between the logical strings based on the rule structure diagram; Analyze the relationship importance between the logical strings to optimize the rule decision tree, identify the target data of the target to be identified through the optimized rule decision tree, and perform security operations on the target to be identified based on the identification results. Grading.

Optionally, generating the relationship between the logical strings through the underlying logical data of the rule decision tree includes: rewriting and parsing the underlying logical data of the rule decision tree through python language; in the process of rewriting and parsing , extract unstructured rule data from each node of the rule decision tree; generate the logical string through the unstructured rule data.

Optionally, determining the relationship between the logical strings according to the tree structure of the rule decision tree includes: extracting the relationship between the unstructured rule data as the logical string according to the tree structure of the rule decision tree. The relationship between.

Optionally, generating a rule structure graph based on the relationship between the logical strings includes: using the logical strings as nodes in the rule structure graph; using the relationships between the logical strings as edges between multiple nodes; The regular structure graph is generated through nodes and edges.

Optionally, determining the relationship importance between the logical strings according to the rule structure diagram includes: obtaining the trained machine learning model and its corresponding sample set, where the sample set includes multiple sample data, Each sample data includes multiple features; feature importance corresponding to the multiple features is generated; and the relationship importance between the logical strings is determined based on the graph structure of the rule structure diagram and the feature importance corresponding to the multiple features.

Optionally, generating feature importance corresponding to multiple features includes: generating an initial performance score of the machine learning model on the sample set; generating feature performance scores corresponding to multiple features; and generating a score based on the initial performance score and Multiple feature performance scores generate multiple feature importances.

Optionally, generating feature performance scores corresponding to multiple features includes: sequentially extracting one feature from multiple features in the sample set; randomly rearranging the features in the sample set to generate a random sample set; generating A feature performance score of the machine learning model corresponding to the feature on the random sample set.

Optionally, determining the relationship importance between the logical strings according to the graph structure of the rule structure graph and the feature importance corresponding to multiple features includes: determining the logic according to the graph structure of the rule structure graph. Structural importance of the relationship between strings; determining the feature importance of the relationship between the logical strings based on the feature importance corresponding to multiple features; generating the logical character based on the structural importance and the feature importance The importance of the relationship between strings.

Optionally, analyzing the relationship importance between the logical strings to optimize the rule decision tree includes: optimizing the nodes and edges in the rule structure graph according to the relationship importance between the logical strings. Simplify; generate an optimized rule decision tree based on the simplified rule structure diagram.

Optionally, generating an optimized rule decision tree based on the simplified rule structure diagram includes: generating a simplified rule decision tree based on the simplified rule structure diagram; and modifying the parameters in the simplified rule decision book. Update; generate the optimization rule decision tree through the updated parameters and the simplified rule decision tree.

According to one aspect of the present application, a target security identification device based on an optimized rule decision tree is proposed. The device includes: a character module for generating corresponding logical strings respectively through the underlying logical data of each node of the rule decision tree; a relationship module , used to determine the relationship between the logical strings according to the tree structure of the rule decision tree; the structure module, used to generate a rule structure diagram based on the relationship between the logical strings; the importance module, used to determine the relationship between the logical strings according to the The rule structure diagram respectively determines the relationship importance between the logical strings; the optimization module is used to analyze the relationship importance between the logical strings to optimize the rule decision tree; the identification module is used to pass The optimized rule decision tree identifies the target data of the target to be identified, and performs security classification on the target to be identified based on the recognition results.

According to one aspect of the present application, an electronic device is proposed. The electronic device includes: one or more processors; a storage device for storing one or more programs; when one or more programs are processed by one or more processors, Execution causes one or more processors to implement the method as above.

According to one aspect of the present application, a computer-readable medium is proposed, on which a computer program is stored. When the program is executed by a processor, the above method is implemented.

According to the target security identification method, device, electronic device and computer-readable medium based on the optimized rule decision tree of the present application, the relationship between the logical strings is generated by passing the underlying logical data of the rule decision tree; based on the logical characters The relationship between the strings generates a rule structure diagram; the relationship importance between the logical strings is determined according to the rule structure diagram; the relationship importance between the logical strings is analyzed to optimize the rule decision tree , by identifying the target data of the target to be identified through the optimized rule decision tree, and classifying the security of the target to be identified according to the recognition results, the complex rule decision tree can be simplified, improve the efficiency of business decision-making, and ensure Business data is safe; it can also quickly calculate the impact when errors occur in business data to ensure safe business operations.

It should be understood that the above general description and the following detailed description are only exemplary and do not limit the present application.

Description of the drawings

The above and other objects, features and advantages of the present application will become more apparent by describing in detail example embodiments thereof with reference to the accompanying drawings. The drawings described below are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without exerting creative efforts.

Figure 1 is a system block diagram of a target safety identification method and device based on an optimized rule decision tree according to an exemplary embodiment.

Figure 2 is a flow chart of a target security identification method based on an optimized rule decision tree according to an exemplary embodiment.

Figure 3 is a flow chart of a target security identification method based on an optimized rule decision tree according to another exemplary embodiment.

Figure 4 is a flow chart of a target security identification method based on an optimized rule decision tree according to another exemplary embodiment.

Figure 5 is a block diagram of a target safety identification device based on an optimized rule decision tree according to an exemplary embodiment.

FIG. 6 is a block diagram of an electronic device according to an exemplary embodiment.

Detailed ways

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in various forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concepts of the example embodiments. To those skilled in the art. The same reference numerals in the drawings represent the same or similar parts, and thus their repeated description will be omitted.

Those skilled in the art can understand that the accompanying drawings are only schematic diagrams of exemplary embodiments, and the modules or processes in the accompanying drawings are not necessarily necessary to implement the present application, and therefore cannot be used to limit the protection scope of the present application.

The technical abbreviations involved in this application are explained as follows:

Rule engine: It is a series of software systems that execute rules according to some algorithms.

Drools: It is an open source rule engine written in Java language and uses the Rete algorithm to evaluate the written rules. Drools allows you to express business logic in a declarative manner and execute business rules and decision-making models by storing, processing and evaluating data.

BPMN2.0: The full name is Business Process Model and Notation. It is a set of business process models and symbolic modeling standards, using XML as the carrier and visualizing business with symbols.

jBPM: The full name is Java Business Process Management. It is an open source, flexible and easily extensible executable process language framework covering business process management, workflow, service collaboration and other fields. The specification used by the framework is BPMN2.0.

In this application, the rule decision tree is a collection of multiple control rules in the decision-making process of the business system. For convenience of description, the following will take the rule decision tree for terminal device identification as an example. Different rule decision trees can be constructed for different application scenarios, terminal device data associated with different services, etc. Different rule decision trees can be applied to different application scenarios, and the generation of decision rules for multiple businesses in various application scenarios has high flexibility. The rule decision tree can be generated based on the analysis of historical terminal device data and is highly reliable. In this application, terminal device operation information is taken as an example. The corresponding application scenarios under this business may include but are not limited to account registration, account login, data transmission, data generation, data download, and data maintenance, etc. Among them, the above application scenarios are only examples, and the specific application scenarios can be determined according to the actual application scenarios, and are not limited here. In this embodiment of the present application, based on the sample data associated with different business types, a decision tree of rules suitable for generating corresponding business types can be constructed.

As shown in Figure 1, the system architecture 10 may include terminal devices 101, 102, 103, a network 104 and a server 105. The network 104 is a medium used to provide communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

Users can use terminal devices 101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages, etc. Various communication client applications can be installed on the terminal devices 101, 102, and 103, such as Internet service applications, shopping applications, web browser applications, instant messaging tools, email clients, social platform software, etc.

The terminal devices 101, 102, and 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop computers, desktop computers, and the like.

The server 105 may be a server that provides various services, such as a backend management server that provides support for Internet service websites browsed by users using the terminal devices 101, 102, and 103. The background management server can analyze and process the received user data, and feed back the processing results (such as risk analysis results) to the administrator of the Internet service website and/or the terminal device 101, 102, 103.

The server 105 can generate corresponding logical strings respectively through the underlying logical data of each node of the rule decision tree; the server 105 can determine the relationship between the logical strings according to the tree structure of the rule decision tree; the server 105 can based on the logic The relationship between strings generates a rule structure diagram; the server 105 can determine the relationship importance between the logical strings according to the rule structure diagram; the server 105 can analyze the relationship importance between the logical strings. The rule decision tree is optimized, and the server 105 can identify the target data of the target to be identified through the optimized rule decision tree, and perform security classification on the target to be identified according to the recognition result.

The server 105 may also analyze the user data in the terminal devices 101, 102, and 103, for example, through the optimized rule decision tree.

The server 105 may be an entity server, or may also be composed of multiple servers. It should be noted that the target security identification method based on the optimized rule decision tree provided by the embodiment of the present application can be executed by the server 105. Correspondingly, based on The target security identification device for optimizing the rule decision tree may be set in the server 105 . The web pages provided for users to browse the Internet service platform are generally located in terminal devices 101, 102, and 103.

Figure 2 is a flow chart of a target security identification method based on an optimized rule decision tree according to an exemplary embodiment. The target safety identification method 20 based on the optimization rule decision tree includes at least steps S202 to S212.

As shown in Figure 2, in S202, corresponding logical strings are generated through the underlying logical data of each node of the rule decision tree. For example, the underlying logical data of the rule decision tree can be rewritten and parsed through the Python language; in the process of rewriting and parsing, unstructured rule data is extracted from each node of the rule decision tree; through the unstructured rule data Generate the logical string.

In a specific application, the underlying logical data of the rule decision tree is through the droo l s structure in the Java language. Implemented with jBPM technology. The underlying logical data of the rule decision tree can be rewritten and analyzed through the python language, that is, the underlying logic software code implemented in the Java language is rewritten through the python language.

In S204, the relationship between the logical strings is determined according to the tree structure of the rule decision tree.

In one embodiment, the relationship between unstructured rule data is extracted according to the tree structure of the rule decision tree as the relationship between the logical strings. More specifically, the unstructured rule data is extracted from the underlying logical data. ization rule data; use the unstructured data as strings, and use the relationships between the unstructured data as relationships between strings.

More specifically, the rules in the original Java language correspond to unstructured data. The unstructured data is extracted and retained as string data when rewritten in the Python language, and the relationship between the original structured data is retained.

In S206, a rule structure graph is generated based on the relationship between the logical strings. For example, logical strings can be used as nodes in the rule structure graph; relationships between logical strings can be used as edges between multiple nodes; the rule structure graph can be generated through nodes and edges.

More specifically, the regular structure graph can be a directed acyclic graph, which refers to a directed graph without loops. If there is a non-directed acyclic graph, and starting from point A to B and returning to A via C, a cycle is formed. If the edge direction from C to A is changed from A to C, it becomes a directed acyclic graph.

In one embodiment, the rule structure diagram can also be verified. The input items can be input into the input end of the rule structure diagram, the output items can be obtained after calculation, and the output items can be compared with the output items in the original rule decision tree. Yes, when the results are consistent, it is determined that the rule structure diagram is constructed correctly.

In S208, the relationship importance between the logical strings is determined according to the rule structure diagram. For example, the trained machine learning model and its corresponding sample set may be obtained. The sample set includes multiple sample data, and each sample data includes multiple features; generate feature importance corresponding to the multiple features; and according to the rules The graph structure of the structure graph and the feature importance corresponding to multiple features determine the relationship importance between the logical strings.

For example, you can calculate the importance of each feature by implementing a trained machine learning model, and then calculate the importance of nodes and edges in the rule structure graph based on the feature importance, which is the importance of characters and strings. Then you can also Combined with the structural importance of nodes and edges in the regular structure graph, the importance of nodes and edges is comprehensively obtained.

The specific content of "respectively determining the relationship importance between the logical strings according to the rule structure diagram" will be described in detail in the embodiments corresponding to Figures 3 and 4.

In S210, the relationship importance between the logical strings is analyzed to optimize the rule decision tree. For example, the nodes and edges in the rule structure graph can be simplified according to the relationship importance between the logical strings; and an optimized rule decision tree can be generated according to the simplified rule structure graph.

More specifically, for example, the importance of input items entering the rule structure diagram can be first found, where the input items refer to characteristics of users or products, and the input items may include multiple characteristics. Used in finding rules based on input items in decision trees fields that are of low importance to the output item, where the output item refers to the rule judgment result, and then the rules of the input items that are of low importance to the output item are filtered out. The rules can be represented as nodes or side. Delete these nodes or edges in the rule structure graph, adjust the node structure of the rule structure graph, and generate a new rule structure graph.

In one embodiment, for example, a simplified rule decision tree can be generated based on the simplified rule structure diagram; the parameters in the simplified rule decision book can be updated; and the updated parameters and simplified rules can be used to The decision tree generates the optimization rule decision tree.

Since the rule structure graph has been simplified, in order to run the original rule decision tree accurately, it may be necessary to adjust the parameters in the rules. The parameters in the nodes and edges in the rule structure graph can be fine-tuned to be more accurate. This application is not limited to thresholds or other assessment indicators.

In S212, the target data of the target to be identified is identified through the optimized rule decision tree, and the security classification of the target to be identified is performed according to the identification result. In the actual application process, the device can be used as the target to be identified, the device data of the device to be identified can be obtained, and the device data can be input into the optimized rule decision tree. The rule decision tree evaluates the device data according to its multiple internal rules. Make judgments and generate recognition results. The recognition result can be a high level, a medium level or a low level, and the recognition result can also be in the form of a score, and this application is not limited to this. The security level of the device is determined based on the identification results. Devices can access different data resources based on their corresponding security levels.

According to the target security identification method based on the optimized rule decision tree of the present application, corresponding logical strings are respectively generated through the underlying logical data of each node of the rule decision tree; the relationship between the logical strings is determined according to the tree structure of the rule decision tree. Relationship; generate a rule structure diagram based on the relationship between the logical strings; determine the relationship importance between the logical strings according to the rule structure diagram; analyze the relationship importance between the logical strings The rule decision tree is optimized, the target data of the target to be identified is identified through the optimized rule decision tree, and the target to be identified is classified securely according to the identification results, which can simplify the complex rule decision tree. , improve the efficiency of business decision-making and ensure the security of business data; it can also quickly calculate the degree of impact when errors occur in business data to ensure the safety of business operations. The method in this application can be optimized and pruned from an overall perspective, making the structure clearer and free of redundancy, making maintenance easier. After a problem with the data source, the impact can be assessed more quickly and accurately and the impact can be re-launched, and the newly launched model can be scored. Determine more accurate thresholds to filter and classify terminal devices.

It should be clearly understood that this application describes how to make and use specific examples, but that the principles of this application are not limited to any details of these examples. Rather, these principles can be applied to many other embodiments based on the teachings of this disclosure.

Figure 3 is a flow chart of a target security identification method based on an optimized rule decision tree according to another exemplary embodiment. The process 30 shown in FIG. 3 is a detailed description of S206 of "determining the relationship importance between the logical strings according to the rule structure diagram" in the process shown in FIG. 2 .

As shown in Figure 3, in S302, the trained machine learning model and its corresponding sample set are obtained. The sample set includes multiple sample data, and each sample data includes multiple features.

In a specific application, the sample set may be a terminal device feature sample set, and multiple feature information may be generated based on terminal device information and feature policies. Data cleaning and data fusion can be performed on the terminal equipment information to convert the terminal equipment information into multiple feature data. More specifically, the terminal equipment information can be analyzed and processed for variable missing rate and outlier processing; continuous variables can also be converted into Discrete user information is converted to WOE, discrete variables are converted to WOE, text variables are processed, word2vec processing of text variables is performed, etc.

In this embodiment of the present application, the terminal device may be a personal user terminal device or an enterprise user terminal device. The target data may be terminal device information, and the terminal device information may include basic information authorized by the user, which may be, for example, business account information, terminal device identification information, terminal device location information, etc.; terminal device information may also include behavior information. , can be, for example, the page operation data of the terminal device, the service access duration of the terminal device, the service access frequency of the terminal device, etc. The specific content of the terminal device information can be determined according to the actual application scenario, and is not limited here.

The machine learning model is trained through multiple samples and features in the sample data. When the training is completed, a machine learning model that can run stably in the business is generated. The machine learning model may be, for example, a convolutional neural network model, and its corresponding sample set may include multiple terminal device samples. The terminal device samples may include terminal device representation information, terminal device operation data, terminal device service access information, etc. feature.

In S304, feature importance corresponding to multiple features is generated. For example, an initial performance score of the machine learning model on the sample set may be generated; feature performance scores corresponding to multiple features may be generated; and multiple feature importances may be generated based on the initial performance scores and multiple feature performance scores.

In S306, the relationship importance between the logical strings is determined based on the graph structure of the rule structure graph and the feature importance corresponding to multiple features. For example, the structural importance of the relationship between the logical strings can be determined based on the graph structure of the rule structure diagram; the feature importance of the relationship between the logical strings can be determined based on the feature importance corresponding to multiple features; The structural importance and the feature importance generate the relationship importance between the logical strings.

In a specific embodiment, the node importance in solving the rule structure graph can be calculated based on the node importance algorithm in the graph algorithm, and the importance of the edges in the rule structure graph can also be calculated based on the related algorithm in the graph algorithm.

In one embodiment, weights can be set for structure importance and feature importance respectively, and then the importance of nodes and edges can be comprehensively calculated, corresponding to the importance of relationships between the logical strings.

The target security identification method based on the optimized rule decision tree of this application can analyze the existing rule flow, optimize pruning, and remove nodes or rules that have a small impact on the results. It can also help policy-related staff to more accurately define the thresholds for new models. After optimization, the structure is made clearer and useless rule nodes and models are removed, making the structure easier to maintain. And you can quickly try the impact of different thresholds on the final result, which can help strategists more accurately determine the thresholds of new online models and the impact of offline models. New threshold for models affected by data sources.

Figure 4 is a flow chart of a target security identification method based on an optimized rule decision tree according to another exemplary embodiment. The process 40 shown in Figure 4 is a detailed description of S304 "Generating feature performance scores corresponding to multiple features" in the process shown in Figure 3 .

As shown in Figure 4, in S402, one feature among multiple features of the sample set is extracted in sequence.

In S404, the features in the sample set are randomly rearranged to generate a random sample set.

In S406, a feature performance score of the machine learning model corresponding to the feature on the random sample set is generated.

In S408, the feature importance of the relationship between the logical strings is determined based on the feature importance corresponding to the multiple features.

It can be assumed that the trained machine learning model is M, and its corresponding sample set is D. The sample set may include a verification set, a training set, and a test set. This application is not limited to this. Assume that in the sample set D, the features include T ₁ , T ₂ ...T _j . More specifically, the characteristics of target A in the sample set can be expressed as " ^TA ₁ , T ^A ₂ ...T ^A _j ", and the characteristics of target B can be expressed as "T ^B ₁ , T ^B ₂ ...T ^B _j " , and so on.

The feature importance can be calculated separately for j features, where each feature can be calculated a total of k times to generate a feature performance score for the feature.

First of all, it can be assumed that the feature for feature calculation this time is T _j . In each of the K calculations, the feature T _j is first randomly rearranged, that is, among the scrambled features, the feature of user A can be expressed as " T ^A ₁ , T ^A ₂ ……T ^E _j ”, that is, T _j corresponding to user A is replaced with T _j corresponding to user E. The characteristics of user B can be expressed as “T ^B ₁ , T ^B ₂ ……T ^S _j ”, that is, T _j corresponding to user B is replaced with T _j corresponding to user S, and so on, to generate a set of random samples.

Calculate the performance score of the machine learning model M in the original sample set as Q, calculate the performance score of the machine learning model M in the shuffled random sample set, and record it as Q _kj ;

After K calculations, K Q _kj are obtained, and then the importance of feature T _j is calculated based on the following formula:

According to the target security identification method based on the optimized rule decision tree of this application, by calculating the importance of each feature in the input item, and thereby inferring the importance of the node or rule using the corresponding input item, the rule flow is optimized and pruned. branch method. By rewriting the entire drools+jBPM architecture in python, the existing risk control rule flow is constructed into a directed acyclic graph to implement the rule flow in python language, and the input items can be calculated through the rule flow to obtain the output items. By solving the importance of nodes in the graph and the rules under the nodes, unimportant decision nodes or rules can be eliminated from the entire risk control rule flow, or the threshold of a certain decision point can be adjusted to make it more accurate. Filter and classify targets.

According to the target security identification method based on optimized rule decision trees of this application, the rule flow itself can also be The purpose of sorting out the rule flow is achieved, so that when the model under the rules encounters a problem with an external data source, it can quickly obtain the effect of removing the data source, which can serve as a reference for re-online. Through pruning optimization, invalid models or input items are removed offline, so that the rule flow achieves a non-redundant structure, which is more conducive to later maintenance, faster assessment of the impact of data sources, and more accurate screening and classification of terminal devices. .

Those skilled in the art can understand that all or part of the steps for implementing the above-described embodiments are implemented as computer programs executed by a CPU. When the computer program is executed by the CPU, the above-mentioned functions defined by the above-mentioned method provided by this application are executed. The program can be stored in a computer-readable storage medium, which can be a read-only memory, a magnetic disk or an optical disk.

In addition, it should be noted that the above-mentioned drawings are only schematic illustrations of processes included in the methods according to the exemplary embodiments of the present application, and are not intended to be limiting. It is readily understood that the processes shown in the above figures do not indicate or limit the temporal sequence of these processes. In addition, it is also easy to understand that these processes may be executed synchronously or asynchronously in multiple modules, for example.

The following are device embodiments of the present application, which can be used to execute method embodiments of the present application. For details not disclosed in the device embodiments of this application, please refer to the method embodiments of this application.

Figure 5 is a block diagram of a target safety identification device based on an optimized rule decision tree according to an exemplary embodiment. As shown in Figure 5, the target safety identification device 50 based on the optimization rule decision tree includes: a character module 502, a relationship module 504, a structure module 506, an importance module 508, an optimization module 510, and an identification module 512.

The character module 502 is used to generate corresponding logical strings through the underlying logical data of each node of the rule decision tree.

The relationship module 504 is used to determine the relationship between the logical strings according to the tree structure of the rule decision tree.

The structure module 506 is used to generate a rule structure graph based on the relationship between the logical strings; the structure module 504 is also used to use the logical strings as nodes in the rule structure chart; use the relationships between the logical strings as multiple nodes the edges between them; generating the regular structure graph through nodes and edges.

The importance module 508 is used to determine the relationship importance between the logical strings according to the rule structure diagram; the importance module 506 is also used to obtain the trained machine learning model and its corresponding sample set. The collection includes multiple sample data, each sample data includes multiple features; generates feature importance corresponding to the multiple features; determines the logical string according to the graph structure of the rule structure diagram and the feature importance corresponding to the multiple features importance of the relationship between them.

The optimization module 510 is configured to analyze the relationship importance between the logical strings and optimize the rule decision tree. The optimization module 508 is also configured to simplify the nodes and edges in the rule structure graph according to the relationship importance between the logical strings; and generate an optimization rule decision tree according to the simplified rule structure graph.

The identification module 512 is configured to identify the target data of the target to be identified through the optimized rule decision tree, and perform security classification on the target to be identified based on the identification results.

According to the target safety identification device based on the optimized rule decision tree of the present application, through each node of the rule decision tree The underlying logical data generate corresponding logical strings respectively; determine the relationship between the logical strings according to the tree structure of the rule decision tree; generate a rule structure diagram based on the relationship between the logical strings; according to the rules The structure diagram determines the importance of the relationship between the logical strings respectively; analyzes the importance of the relationship between the logical strings to optimize the rule decision tree, and uses the optimized rule decision tree to determine the target to be identified. The method of identifying the target data and classifying the security of the target to be identified based on the identification results can simplify the complex rule decision tree, improve the efficiency of business decision-making, and ensure the security of business data; it can also quickly Calculate the degree of impact to ensure safe business operations.

As shown in Figure 6, the embodiment of the present invention provides an electronic device, including a processor 1110, a communication interface 1120, a memory 1130, and a communication bus 1140. The processor 1110, the communication interface 1120, and the memory 1130 are completed through the communication bus 1140. communication between each other;

Memory 1130, used to store computer programs;

The processor 1110 is configured to implement the target safety identification method based on the optimization rule decision tree of any of the above embodiments when executing the program stored on the memory 1130.

In the electronic device provided by the embodiment of the present invention, the processor 1110 generates the relationship between the logical strings through the underlying logical data of the rule decision tree by executing the program stored on the memory 1130; based on the relationship between the logical strings Generate a rule structure diagram; determine the relationship importance between the logical strings according to the rule structure diagram; analyze the relationship importance between the logical strings to optimize the rule decision tree, and through the optimized The rule decision tree identifies the target data of the target to be identified, and performs security classification on the target to be identified based on the identification results.

The communication bus 1140 mentioned in the above-mentioned electronic equipment may be a Peripheral Component Interconnect (PCI for short) bus or an Extended Industrial Standard Architecture (Extended Industry Standard Architecture (EISA for short) bus), etc. The communication bus 1140 can be divided into an address bus, a data bus, a control bus, etc. For ease of presentation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus.

The communication interface 1120 is used for communication between the above-mentioned electronic device and other devices.

The memory 1130 may include a random access memory 1130 (Random Access Memory, RAM for short), or may include a non-volatile memory 1130 (non-volatile memory), such as at least one disk memory 1130. Optionally, the memory 1130 may also be at least one storage device located far away from the aforementioned processor 1110 .

The above-mentioned processor 1110 may be a general processor 1110, including a central processing unit 1110 (Central Processing Unit, referred to as CPU), a network processor 1110 (Network Processor, referred to as NP), etc.; it may also be a digital signal processor 1110 (Digital Signal Processing, referred to as DSP) , Application Specific Integrated Circuit (ASIC for short), Field-Programmable Gate Array (FPGA for short) or other programmable logic devices, discrete gates or or transistor logic devices, discrete hardware components.

Embodiments of the present invention provide a computer-readable storage medium. The computer-readable storage medium stores one or more programs. The one or more programs can be executed by one or more processors 1110 to implement any of the above embodiments. A target security identification method based on optimized rule decision trees.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product. A computer program product includes one or more computer instructions. When computer program instructions are loaded and executed on a computer, processes or functions according to embodiments of the present invention are generated in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable device. Computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, e.g., computer instructions may be transmitted from a website, computer, server or data center via a wired link (e.g. Coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) means to transmit to another website site, computer, server or data center. Computer-readable storage media can be any available media that can be accessed by a computer or a data storage device such as a server, data center, or other integrated media that contains one or more available media. Available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk (SSD)), etc.

Exemplary embodiments of the present application have been specifically shown and described above. It is to be understood that the present application is not limited to the detailed structures, arrangements, or implementation methods described herein; on the contrary, the present application is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims

A target safety identification method based on an optimized rule decision tree, which is characterized by including:

The corresponding logical strings are generated through the underlying logical data of each node of the rule decision tree;

Determine the relationship between the logical strings according to the tree structure of the rule decision tree;

Generate a rule structure diagram based on the relationship between the logical strings;

Determine the relationship importance between the logical strings according to the rule structure diagram;

Analyze the relationship importance between the logical strings to optimize the rule decision tree;

The target data of the target to be identified is identified through the optimized rule decision tree, and the security classification of the target to be identified is performed according to the identification results.
The method according to claim 1, characterized in that corresponding logical strings are respectively generated through the underlying logical data of each node of the rule decision tree, including:

Rewrite and analyze the underlying logical data of the rule decision tree through Python language;

In the process of rewriting and parsing, unstructured rule data is extracted from each node of the rule decision tree;

The logical string is generated from unstructured rule data.
The method of claim 2, wherein determining the relationship between the logical strings according to the tree structure of a rule decision tree includes:

The relationship between unstructured rule data is extracted according to the tree structure of the rule decision tree as the relationship between the logical strings.
The method of claim 1, wherein generating a rule structure graph based on the relationship between the logical strings includes:

Use logical strings as nodes in the rule structure graph;

Treat relationships between logical strings as edges between multiple nodes;

The regular structure graph is generated through nodes and edges.
The method of claim 1, wherein determining the relationship importance between the logical strings according to the rule structure diagram includes:

Obtain the trained machine learning model and its corresponding sample set, where the sample set includes multiple sample data, and each sample data includes multiple features;

Generate feature importance corresponding to multiple features;

The relationship importance between the logical strings is determined according to the graph structure of the rule structure graph and the feature importance corresponding to multiple features.
The method of claim 5, characterized in that generating feature importance corresponding to multiple features includes:

Generate an initial performance score of the machine learning model on the sample set;

Generate feature performance scores corresponding to multiple features;

A plurality of feature importances are generated according to the initial energy score and a plurality of feature performance scores.
The method of claim 6, wherein generating feature performance scores corresponding to multiple features includes:

Extract one feature among multiple features of the sample set in sequence;

Randomly rearrange the features in the sample set to generate a random sample set;

Generating a feature performance score of the machine learning model corresponding to the feature on the random sample set.
The method of claim 5, wherein determining the relationship importance between the logical strings according to the graph structure of the rule structure graph and the feature importance corresponding to multiple features includes:

Determine the structural importance of the relationship between the logical strings according to the graph structure of the rule structure graph;

Determine the relationship between the logical strings based on the feature importance corresponding to multiple features feature importance;

The relationship importance between the logical strings is generated according to the structural importance and the feature importance.
The method of claim 1, wherein analyzing the relationship importance between the logical strings to optimize the rule decision tree includes:

Simplify the nodes and edges in the rule structure graph according to the relationship importance between the logical strings;

An optimized rule decision tree is generated based on the simplified rule structure graph.
The method of claim 9, wherein generating an optimized rule decision tree based on the simplified rule structure diagram includes:

Generate a simplified rule decision tree according to the simplified rule structure diagram;

Update the parameters in the simplification rule decision book;

The optimized rule decision tree is generated through the updated parameters and the simplified rule decision tree.
A target safety identification device based on an optimized rule decision tree, which is characterized by including:

The character module is used to generate corresponding logical strings through the underlying logical data of each node of the rule decision tree;

A relationship module, used to determine the relationship between the logical strings according to the tree structure of the rule decision tree;

A structure module, used to generate a rule structure diagram based on the relationship between the logical strings;

An importance module, used to respectively determine the importance of relationships between the logical strings according to the rule structure diagram;

An optimization module, used to analyze the relationship importance between the logical strings and optimize the rule decision tree;

An identification module is used to identify the target data of the target to be identified through the optimized rule decision tree, and perform security classification on the target to be identified based on the identification results.
An electronic device, characterized by including:

one or more processors;

A storage device for storing one or more programs;

When the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the method as described in any one of claims 1-10.
A computer-readable medium with a computer program stored thereon, characterized in that when the program is executed by a processor, the method according to any one of claims 1-10 is implemented.