WO2023246146A1 - Target security recognition method and apparatus based on optimization rule decision tree - Google Patents

Target security recognition method and apparatus based on optimization rule decision tree Download PDF

Info

Publication number
WO2023246146A1
WO2023246146A1 PCT/CN2023/077880 CN2023077880W WO2023246146A1 WO 2023246146 A1 WO2023246146 A1 WO 2023246146A1 CN 2023077880 W CN2023077880 W CN 2023077880W WO 2023246146 A1 WO2023246146 A1 WO 2023246146A1
Authority
WO
WIPO (PCT)
Prior art keywords
rule
decision tree
relationship
logical
importance
Prior art date
Application number
PCT/CN2023/077880
Other languages
French (fr)
Chinese (zh)
Inventor
鲁文娜
王垚炜
沈赟
Original Assignee
上海淇玥信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海淇玥信息技术有限公司 filed Critical 上海淇玥信息技术有限公司
Publication of WO2023246146A1 publication Critical patent/WO2023246146A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present application relates to the field of computer information processing, specifically, to a target security identification method, device, electronic equipment and computer-readable medium based on an optimized rule decision tree.
  • this application provides a target security identification method, device, electronic equipment and computer-readable medium based on an optimized rule decision tree, which can simplify the complex rule decision tree, improve business decision-making efficiency, and ensure business data security; It can also quickly calculate the impact when errors occur in business data to ensure safe business operations.
  • a target security identification method based on an optimized rule decision tree includes: generating corresponding logical strings respectively through the underlying logical data of each node of the rule decision tree; according to the tree shape of the rule decision tree The structure determines the relationship between the logical strings; generates a rule structure diagram based on the relationship between the logical strings; determines the importance of the relationship between the logical strings based on the rule structure diagram; Analyze the relationship importance between the logical strings to optimize the rule decision tree, identify the target data of the target to be identified through the optimized rule decision tree, and perform security operations on the target to be identified based on the identification results. Grading.
  • generating the relationship between the logical strings through the underlying logical data of the rule decision tree includes: rewriting and parsing the underlying logical data of the rule decision tree through python language; in the process of rewriting and parsing , extract unstructured rule data from each node of the rule decision tree; generate the logical string through the unstructured rule data.
  • determining the relationship between the logical strings according to the tree structure of the rule decision tree includes: extracting the relationship between the unstructured rule data as the logical string according to the tree structure of the rule decision tree. The relationship between.
  • determining the relationship importance between the logical strings according to the graph structure of the rule structure graph and the feature importance corresponding to multiple features includes: determining the logic according to the graph structure of the rule structure graph. Structural importance of the relationship between strings; determining the feature importance of the relationship between the logical strings based on the feature importance corresponding to multiple features; generating the logical character based on the structural importance and the feature importance The importance of the relationship between strings.
  • generating an optimized rule decision tree based on the simplified rule structure diagram includes: generating a simplified rule decision tree based on the simplified rule structure diagram; and modifying the parameters in the simplified rule decision book. Update; generate the optimization rule decision tree through the updated parameters and the simplified rule decision tree.
  • Figure 2 is a flow chart of a target security identification method based on an optimized rule decision tree according to an exemplary embodiment.
  • Figure 4 is a flow chart of a target security identification method based on an optimized rule decision tree according to another exemplary embodiment.
  • Figure 5 is a block diagram of a target safety identification device based on an optimized rule decision tree according to an exemplary embodiment.
  • FIG. 6 is a block diagram of an electronic device according to an exemplary embodiment.
  • Drools It is an open source rule engine written in Java language and uses the Rete algorithm to evaluate the written rules. Drools allows you to express business logic in a declarative manner and execute business rules and decision-making models by storing, processing and evaluating data.
  • BPMN2.0 The full name is Business Process Model and Notation. It is a set of business process models and symbolic modeling standards, using XML as the carrier and visualizing business with symbols.
  • jBPM The full name is Java Business Process Management. It is an open source, flexible and easily extensible executable process language framework covering business process management, workflow, service collaboration and other fields.
  • the specification used by the framework is BPMN2.0.
  • the above application scenarios are only examples, and the specific application scenarios can be determined according to the actual application scenarios, and are not limited here.
  • a decision tree of rules suitable for generating corresponding business types can be constructed.
  • Figure 1 is a system block diagram of a target safety identification method and device based on an optimized rule decision tree according to an exemplary embodiment.
  • the system architecture 10 may include terminal devices 101, 102, 103, a network 104 and a server 105.
  • the network 104 is a medium used to provide communication links between the terminal devices 101, 102, 103 and the server 105.
  • Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
  • Terminal devices 101, 102, 103 Users can use terminal devices 101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages, etc.
  • Various communication client applications can be installed on the terminal devices 101, 102, and 103, such as Internet service applications, shopping applications, web browser applications, instant messaging tools, email clients, social platform software, etc.
  • the server 105 can generate corresponding logical strings respectively through the underlying logical data of each node of the rule decision tree; the server 105 can determine the relationship between the logical strings according to the tree structure of the rule decision tree; the server 105 can based on the logic
  • the relationship between strings generates a rule structure diagram; the server 105 can determine the relationship importance between the logical strings according to the rule structure diagram; the server 105 can analyze the relationship importance between the logical strings.
  • the rule decision tree is optimized, and the server 105 can identify the target data of the target to be identified through the optimized rule decision tree, and perform security classification on the target to be identified according to the recognition result.
  • the server 105 may also analyze the user data in the terminal devices 101, 102, and 103, for example, through the optimized rule decision tree.
  • the server 105 may be an entity server, or may also be composed of multiple servers. It should be noted that the target security identification method based on the optimized rule decision tree provided by the embodiment of the present application can be executed by the server 105. Correspondingly, based on The target security identification device for optimizing the rule decision tree may be set in the server 105 .
  • the web pages provided for users to browse the Internet service platform are generally located in terminal devices 101, 102, and 103.
  • Figure 2 is a flow chart of a target security identification method based on an optimized rule decision tree according to an exemplary embodiment.
  • the target safety identification method 20 based on the optimization rule decision tree includes at least steps S202 to S212.
  • corresponding logical strings are generated through the underlying logical data of each node of the rule decision tree.
  • the underlying logical data of the rule decision tree can be rewritten and parsed through the Python language; in the process of rewriting and parsing, unstructured rule data is extracted from each node of the rule decision tree; through the unstructured rule data Generate the logical string.
  • the underlying logical data of the rule decision tree is through the droo l s structure in the Java language.
  • the underlying logical data of the rule decision tree can be rewritten and analyzed through the python language, that is, the underlying logic software code implemented in the Java language is rewritten through the python language.
  • the relationship between the logical strings is determined according to the tree structure of the rule decision tree.
  • the rules in the original Java language correspond to unstructured data.
  • the unstructured data is extracted and retained as string data when rewritten in the Python language, and the relationship between the original structured data is retained.
  • a rule structure graph is generated based on the relationship between the logical strings.
  • logical strings can be used as nodes in the rule structure graph; relationships between logical strings can be used as edges between multiple nodes; the rule structure graph can be generated through nodes and edges.
  • the regular structure graph can be a directed acyclic graph, which refers to a directed graph without loops. If there is a non-directed acyclic graph, and starting from point A to B and returning to A via C, a cycle is formed. If the edge direction from C to A is changed from A to C, it becomes a directed acyclic graph.
  • the relationship importance between the logical strings is determined according to the rule structure diagram.
  • the trained machine learning model and its corresponding sample set may be obtained.
  • the sample set includes multiple sample data, and each sample data includes multiple features; generate feature importance corresponding to the multiple features; and according to the rules
  • the graph structure of the structure graph and the feature importance corresponding to multiple features determine the relationship importance between the logical strings.
  • the importance of input items entering the rule structure diagram can be first found, where the input items refer to characteristics of users or products, and the input items may include multiple characteristics. Used in finding rules based on input items in decision trees fields that are of low importance to the output item, where the output item refers to the rule judgment result, and then the rules of the input items that are of low importance to the output item are filtered out.
  • the rules can be represented as nodes or side. Delete these nodes or edges in the rule structure graph, adjust the node structure of the rule structure graph, and generate a new rule structure graph.
  • a simplified rule decision tree can be generated based on the simplified rule structure diagram; the parameters in the simplified rule decision book can be updated; and the updated parameters and simplified rules can be used to The decision tree generates the optimization rule decision tree.
  • rule structure graph Since the rule structure graph has been simplified, in order to run the original rule decision tree accurately, it may be necessary to adjust the parameters in the rules.
  • the parameters in the nodes and edges in the rule structure graph can be fine-tuned to be more accurate. This application is not limited to thresholds or other assessment indicators.
  • the target data of the target to be identified is identified through the optimized rule decision tree, and the security classification of the target to be identified is performed according to the identification result.
  • the device can be used as the target to be identified, the device data of the device to be identified can be obtained, and the device data can be input into the optimized rule decision tree.
  • the rule decision tree evaluates the device data according to its multiple internal rules. Make judgments and generate recognition results.
  • the recognition result can be a high level, a medium level or a low level, and the recognition result can also be in the form of a score, and this application is not limited to this.
  • the security level of the device is determined based on the identification results. Devices can access different data resources based on their corresponding security levels.
  • Figure 3 is a flow chart of a target security identification method based on an optimized rule decision tree according to another exemplary embodiment.
  • the process 30 shown in FIG. 3 is a detailed description of S206 of "determining the relationship importance between the logical strings according to the rule structure diagram" in the process shown in FIG. 2 .
  • the trained machine learning model and its corresponding sample set are obtained.
  • the sample set includes multiple sample data, and each sample data includes multiple features.
  • the terminal device may be a personal user terminal device or an enterprise user terminal device.
  • the target data may be terminal device information, and the terminal device information may include basic information authorized by the user, which may be, for example, business account information, terminal device identification information, terminal device location information, etc.; terminal device information may also include behavior information. , can be, for example, the page operation data of the terminal device, the service access duration of the terminal device, the service access frequency of the terminal device, etc.
  • the specific content of the terminal device information can be determined according to the actual application scenario, and is not limited here.
  • feature importance corresponding to multiple features is generated. For example, an initial performance score of the machine learning model on the sample set may be generated; feature performance scores corresponding to multiple features may be generated; and multiple feature importances may be generated based on the initial performance scores and multiple feature performance scores.
  • the target security identification method based on the optimized rule decision tree of this application, by calculating the importance of each feature in the input item, and thereby inferring the importance of the node or rule using the corresponding input item, the rule flow is optimized and pruned. branch method.
  • the existing risk control rule flow is constructed into a directed acyclic graph to implement the rule flow in python language, and the input items can be calculated through the rule flow to obtain the output items.
  • unimportant decision nodes or rules can be eliminated from the entire risk control rule flow, or the threshold of a certain decision point can be adjusted to make it more accurate. Filter and classify targets.
  • the rule flow itself can also be The purpose of sorting out the rule flow is achieved, so that when the model under the rules encounters a problem with an external data source, it can quickly obtain the effect of removing the data source, which can serve as a reference for re-online.
  • pruning optimization invalid models or input items are removed offline, so that the rule flow achieves a non-redundant structure, which is more conducive to later maintenance, faster assessment of the impact of data sources, and more accurate screening and classification of terminal devices. .
  • Figure 5 is a block diagram of a target safety identification device based on an optimized rule decision tree according to an exemplary embodiment.
  • the target safety identification device 50 based on the optimization rule decision tree includes: a character module 502, a relationship module 504, a structure module 506, an importance module 508, an optimization module 510, and an identification module 512.
  • the character module 502 is used to generate corresponding logical strings through the underlying logical data of each node of the rule decision tree.
  • the relationship module 504 is used to determine the relationship between the logical strings according to the tree structure of the rule decision tree.
  • the structure module 506 is used to generate a rule structure graph based on the relationship between the logical strings; the structure module 504 is also used to use the logical strings as nodes in the rule structure chart; use the relationships between the logical strings as multiple nodes the edges between them; generating the regular structure graph through nodes and edges.
  • the importance module 508 is used to determine the relationship importance between the logical strings according to the rule structure diagram; the importance module 506 is also used to obtain the trained machine learning model and its corresponding sample set.
  • the collection includes multiple sample data, each sample data includes multiple features; generates feature importance corresponding to the multiple features; determines the logical string according to the graph structure of the rule structure diagram and the feature importance corresponding to the multiple features importance of the relationship between them.
  • the optimization module 510 is configured to analyze the relationship importance between the logical strings and optimize the rule decision tree.
  • the optimization module 508 is also configured to simplify the nodes and edges in the rule structure graph according to the relationship importance between the logical strings; and generate an optimization rule decision tree according to the simplified rule structure graph.
  • the target safety identification device based on the optimized rule decision tree of the present application, through each node of the rule decision tree The underlying logical data generate corresponding logical strings respectively; determine the relationship between the logical strings according to the tree structure of the rule decision tree; generate a rule structure diagram based on the relationship between the logical strings; according to the rules The structure diagram determines the importance of the relationship between the logical strings respectively; analyzes the importance of the relationship between the logical strings to optimize the rule decision tree, and uses the optimized rule decision tree to determine the target to be identified.
  • the method of identifying the target data and classifying the security of the target to be identified based on the identification results can simplify the complex rule decision tree, improve the efficiency of business decision-making, and ensure the security of business data; it can also quickly Calculate the degree of impact to ensure safe business operations.
  • the embodiment of the present invention provides an electronic device, including a processor 1110, a communication interface 1120, a memory 1130, and a communication bus 1140.
  • the processor 1110, the communication interface 1120, and the memory 1130 are completed through the communication bus 1140. communication between each other;
  • Memory 1130 used to store computer programs
  • the processor 1110 is configured to implement the target safety identification method based on the optimization rule decision tree of any of the above embodiments when executing the program stored on the memory 1130.
  • the processor 1110 generates the relationship between the logical strings through the underlying logical data of the rule decision tree by executing the program stored on the memory 1130; based on the relationship between the logical strings Generate a rule structure diagram; determine the relationship importance between the logical strings according to the rule structure diagram; analyze the relationship importance between the logical strings to optimize the rule decision tree, and through the optimized
  • the rule decision tree identifies the target data of the target to be identified, and performs security classification on the target to be identified based on the identification results.
  • the communication interface 1120 is used for communication between the above-mentioned electronic device and other devices.
  • the memory 1130 may include a random access memory 1130 (Random Access Memory, RAM for short), or may include a non-volatile memory 1130 (non-volatile memory), such as at least one disk memory 1130.
  • the memory 1130 may also be at least one storage device located far away from the aforementioned processor 1110 .
  • Embodiments of the present invention provide a computer-readable storage medium.
  • the computer-readable storage medium stores one or more programs.
  • the one or more programs can be executed by one or more processors 1110 to implement any of the above embodiments.
  • a target security identification method based on optimized rule decision trees.
  • a computer program product includes one or more computer instructions.
  • Computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, e.g., computer instructions may be transmitted from a website, computer, server or data center via a wired link (e.g.
  • Coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless means to transmit to another website site, computer, server or data center.
  • Computer-readable storage media can be any available media that can be accessed by a computer or a data storage device such as a server, data center, or other integrated media that contains one or more available media. Available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk (SSD)), etc.

Abstract

A target security recognition method and apparatus based on an optimization rule decision tree. The method comprises: respectively generating a corresponding logical character string by means of underlying logical data of each node of a rule decision tree (S202); determining the relationship between the logical character strings according to a tree structure of the rule decision tree (S204); generating a rule structure diagram according to the relationship between the logical character strings (S206); according to the rule structure diagram, respectively determining the importance of the relationship between the logical character strings (S208); optimizing the rule decision tree according to the importance of the relationship between the logical character strings (S210); and recognizing, by means of the optimized rule decision tree, target data of a target to be subjected to recognition, and performing security grading on said target according to a recognition result (S212). By means of the method, a complex rule decision tree can be simplified, thereby improving the decision-making efficiency of a service, and ensuring the security of service data; and when an error occurs in the service data, the degree of influence can also be quickly calculated, thereby ensuring the running security of the service.

Description

基于优化规则决策树的目标安全识别方法及装置Target safety identification method and device based on optimized rule decision tree 技术领域Technical field
本申请涉及计算机信息处理领域,具体而言,涉及一种基于优化规则决策树的目标安全识别方法、装置、电子设备及计算机可读介质。The present application relates to the field of computer information processing, specifically, to a target security identification method, device, electronic equipment and computer-readable medium based on an optimized rule decision tree.
背景技术Background technique
现有的规则决策树,因为人群较大,分类较多,所以流程的分支、分支上的节点以及节点下的规则和模型也非常多,导致整个结构都非常庞大。In the existing rule decision tree, because the crowd is large and there are many categories, there are also many process branches, nodes on the branches, and rules and models under the nodes, resulting in a very large structure.
也正是因为规则决策树的结构复杂,在日常更新风控策略的时候怕影响其他分支导致最终出现较坏的影响,所以一般情况下只会给规则决策树填加规则,很少会减少规则。长此以往,规则决策树越来越复杂,后期维护会十分麻烦。而且,在规则决策树在业务系统中上线运行时,一旦某个数据源出现问题,造成业务错误,需要进行错误定位时,工程师需要去测试所有用到该数据源的模型,重新进行打分测试评估,十分耗费时间和精力。It is precisely because of the complex structure of the rule decision tree that when updating the risk control strategy on a daily basis, it is afraid of affecting other branches and ultimately causing worse effects. Therefore, in general, only rules are added to the rule decision tree, and rules are rarely reduced. . If things go on like this, the rule decision tree will become more and more complex, and later maintenance will be very troublesome. Moreover, when the rule decision tree is run online in the business system, once a problem occurs in a certain data source and causes a business error, and the error needs to be located, engineers need to test all models that use the data source and re-perform the scoring test and evaluation. , very time and energy consuming.
因此,需要一种新的基于优化规则决策树的目标安全识别方法、装置、电子设备及计算机可读介质。Therefore, a new target security identification method, device, electronic equipment and computer-readable medium based on an optimized rule decision tree are needed.
在所述背景技术部分公开的上述信息仅用于加强对本申请的背景的理解,因此它可以包括不构成对本领域普通技术人员已知的现有技术的信息。The above information disclosed in the Background section is only for enhancement of understanding of the context of the application and therefore it may contain information that does not form the prior art that is already known to a person of ordinary skill in the art.
发明内容Contents of the invention
有鉴于此,本申请提供一种基于优化规则决策树的目标安全识别方法、装置、电子设备及计算机可读介质,能够对复杂的规则决策树进行简化,提高业务决策效率,保证业务数据安全;还能够在业务数据出现错误时,快速计算影响程度,保证业务运行安全。In view of this, this application provides a target security identification method, device, electronic equipment and computer-readable medium based on an optimized rule decision tree, which can simplify the complex rule decision tree, improve business decision-making efficiency, and ensure business data security; It can also quickly calculate the impact when errors occur in business data to ensure safe business operations.
本申请的其他特性和优点将通过下面的详细描述变得显然,或部分地通过本申请的实践而习得。Additional features and advantages of the invention will be apparent from the detailed description which follows, or, in part, may be learned by practice of the invention.
根据本申请的一方面,提出一种基于优化规则决策树的目标安全识别方法,该方法包括:通过规则决策树各节点的底层逻辑数据分别生成对应的逻辑字符串;根据规则决策树的树状结构确定所述逻辑字符串之间的关系;基于所述逻辑字符串之间的关系生成规则结构图;依据所述规则结构图分别确定所述逻辑字符串之间的关系重要度; 分析所述逻辑字符串之间的关系重要度对所述规则决策树进行优化,通过优化后的所述规则决策树对待识别目标的目标数据进行识别,根据识别结果对所述待识别目标进行安全分级。According to one aspect of the present application, a target security identification method based on an optimized rule decision tree is proposed. The method includes: generating corresponding logical strings respectively through the underlying logical data of each node of the rule decision tree; according to the tree shape of the rule decision tree The structure determines the relationship between the logical strings; generates a rule structure diagram based on the relationship between the logical strings; determines the importance of the relationship between the logical strings based on the rule structure diagram; Analyze the relationship importance between the logical strings to optimize the rule decision tree, identify the target data of the target to be identified through the optimized rule decision tree, and perform security operations on the target to be identified based on the identification results. Grading.
可选地,通过规则决策树的底层逻辑数据生成所述逻辑字符串之间的关系,包括:通过python语言对所述规则决策树的底层逻辑数据进行重写解析;在重写解析的过程中,由规则决策树的各节点中提取非结构化规则数据;通过非结构化规则数据生成所述逻辑字符串。Optionally, generating the relationship between the logical strings through the underlying logical data of the rule decision tree includes: rewriting and parsing the underlying logical data of the rule decision tree through python language; in the process of rewriting and parsing , extract unstructured rule data from each node of the rule decision tree; generate the logical string through the unstructured rule data.
可选地,根据规则决策树的树状结构确定所述逻辑字符串之间的关系,包括:根据规则决策树的树状结构提取非结构化规则数据之间的关系以作为所述逻辑字符串之间的关系。Optionally, determining the relationship between the logical strings according to the tree structure of the rule decision tree includes: extracting the relationship between the unstructured rule data as the logical string according to the tree structure of the rule decision tree. The relationship between.
可选地,基于所述逻辑字符串之间的关系生成规则结构图,包括:将逻辑字符串作为规则结构图中的节点;将逻辑字符串之间的关系作为多个节点之间的边;通过节点和边生成所述规则结构图。Optionally, generating a rule structure graph based on the relationship between the logical strings includes: using the logical strings as nodes in the rule structure graph; using the relationships between the logical strings as edges between multiple nodes; The regular structure graph is generated through nodes and edges.
可选地,依据所述规则结构图分别确定所述逻辑字符串之间的关系重要度,包括:获取训练后的机器学习模型和其对应的样本集,所述样本集中包括多个样本数据,每个样本数据包括多个特征;生成多个特征对应的特征重要度;根据所述规则结构图的图结构和多个特征对应的特征重要度确定所述逻辑字符串之间的关系重要度。Optionally, determining the relationship importance between the logical strings according to the rule structure diagram includes: obtaining the trained machine learning model and its corresponding sample set, where the sample set includes multiple sample data, Each sample data includes multiple features; feature importance corresponding to the multiple features is generated; and the relationship importance between the logical strings is determined based on the graph structure of the rule structure diagram and the feature importance corresponding to the multiple features.
可选地,生成多个特征对应的特征重要度,包括:生成所述机器学习模型在所述样本集上的初始性能分;生成多个特征对应的特征性能评分;根据所述初始能评分和多个特征性能评分生成多个特征重要度。Optionally, generating feature importance corresponding to multiple features includes: generating an initial performance score of the machine learning model on the sample set; generating feature performance scores corresponding to multiple features; and generating a score based on the initial performance score and Multiple feature performance scores generate multiple feature importances.
可选地,生成多个特征对应的特征性能评分,包括:依次提取所述样本集合的多个特征中的一个特征;将所述样本集中的所述特征进行随机重排生成随机样本集;生成所述机器学习模型在所述随机样本集上的对应于所述特征的特征性能评分。Optionally, generating feature performance scores corresponding to multiple features includes: sequentially extracting one feature from multiple features in the sample set; randomly rearranging the features in the sample set to generate a random sample set; generating A feature performance score of the machine learning model corresponding to the feature on the random sample set.
可选地,根据所述规则结构图的图结构和多个特征对应的特征重要度确定所述逻辑字符串之间的关系重要度,包括:根据所述规则结构图的图结构确定所述逻辑字符串之间关系的结构重要度;根据多个特征对应的特征重要度确定所述逻辑字符串之间关系的特征重要度;根据所述结构重要度和所述特征重要度生成所述逻辑字符串之间的关系重要度。Optionally, determining the relationship importance between the logical strings according to the graph structure of the rule structure graph and the feature importance corresponding to multiple features includes: determining the logic according to the graph structure of the rule structure graph. Structural importance of the relationship between strings; determining the feature importance of the relationship between the logical strings based on the feature importance corresponding to multiple features; generating the logical character based on the structural importance and the feature importance The importance of the relationship between strings.
可选地,分析所述逻辑字符串之间的关系重要度对所述规则决策树进行优化,包括:根据所述逻辑字符串之间的关系重要度对所述规则结构图中的节点和边进行化简;根据化简之后的所述规则结构图生成优化规则决策树。Optionally, analyzing the relationship importance between the logical strings to optimize the rule decision tree includes: optimizing the nodes and edges in the rule structure graph according to the relationship importance between the logical strings. Simplify; generate an optimized rule decision tree based on the simplified rule structure diagram.
可选地,根据化简之后的所述规则结构图生成优化规则决策树,包括:根据化简之后的所述规则结构图生成化简规则决策树;对所述化简规则决策书中的参数进行更新;通过更新后的参数和化简规则决策树生成所述优化规则决策树。 Optionally, generating an optimized rule decision tree based on the simplified rule structure diagram includes: generating a simplified rule decision tree based on the simplified rule structure diagram; and modifying the parameters in the simplified rule decision book. Update; generate the optimization rule decision tree through the updated parameters and the simplified rule decision tree.
根据本申请的一方面,提出一种基于优化规则决策树的目标安全识别装置,该装置包括:字符模块,用于通过规则决策树各节点的底层逻辑数据分别生成对应的逻辑字符串;关系模块,用于根据规则决策树的树状结构确定所述逻辑字符串之间的关系;结构模块,用于基于所述逻辑字符串之间的关系生成规则结构图;重要度模块,用于依据所述规则结构图分别确定所述逻辑字符串之间的关系重要度;优化模块,用于分析所述逻辑字符串之间的关系重要度对所述规则决策树进行优化;识别模块,用于通过优化后的所述规则决策树对待识别目标的目标数据进行识别,根据识别结果对所述待识别目标进行安全分级。According to one aspect of the present application, a target security identification device based on an optimized rule decision tree is proposed. The device includes: a character module for generating corresponding logical strings respectively through the underlying logical data of each node of the rule decision tree; a relationship module , used to determine the relationship between the logical strings according to the tree structure of the rule decision tree; the structure module, used to generate a rule structure diagram based on the relationship between the logical strings; the importance module, used to determine the relationship between the logical strings according to the The rule structure diagram respectively determines the relationship importance between the logical strings; the optimization module is used to analyze the relationship importance between the logical strings to optimize the rule decision tree; the identification module is used to pass The optimized rule decision tree identifies the target data of the target to be identified, and performs security classification on the target to be identified based on the recognition results.
根据本申请的一方面,提出一种电子设备,该电子设备包括:一个或多个处理器;存储装置,用于存储一个或多个程序;当一个或多个程序被一个或多个处理器执行,使得一个或多个处理器实现如上文的方法。According to one aspect of the present application, an electronic device is proposed. The electronic device includes: one or more processors; a storage device for storing one or more programs; when one or more programs are processed by one or more processors, Execution causes one or more processors to implement the method as above.
根据本申请的一方面,提出一种计算机可读介质,其上存储有计算机程序,该程序被处理器执行时实现如上文中的方法。According to one aspect of the present application, a computer-readable medium is proposed, on which a computer program is stored. When the program is executed by a processor, the above method is implemented.
根据本申请的基于优化规则决策树的目标安全识别方法、装置、电子设备及计算机可读介质,通过通过规则决策树的底层逻辑数据生成所述逻辑字符串之间的关系;基于所述逻辑字符串之间的关系生成规则结构图;依据所述规则结构图分别确定所述逻辑字符串之间的关系重要度;分析所述逻辑字符串之间的关系重要度对所述规则决策树进行优化,通过优化后的所述规则决策树对待识别目标的目标数据进行识别,根据识别结果对所述待识别目标进行安全分级的方式,能够对复杂的规则决策树进行简化,提高业务决策效率,保证业务数据安全;还能够在业务数据出现错误时,快速计算影响程度,保证业务运行安全。According to the target security identification method, device, electronic device and computer-readable medium based on the optimized rule decision tree of the present application, the relationship between the logical strings is generated by passing the underlying logical data of the rule decision tree; based on the logical characters The relationship between the strings generates a rule structure diagram; the relationship importance between the logical strings is determined according to the rule structure diagram; the relationship importance between the logical strings is analyzed to optimize the rule decision tree , by identifying the target data of the target to be identified through the optimized rule decision tree, and classifying the security of the target to be identified according to the recognition results, the complex rule decision tree can be simplified, improve the efficiency of business decision-making, and ensure Business data is safe; it can also quickly calculate the impact when errors occur in business data to ensure safe business operations.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性的,并不能限制本申请。It should be understood that the above general description and the following detailed description are only exemplary and do not limit the present application.
附图说明Description of the drawings
通过参照附图详细描述其示例实施例,本申请的上述和其它目标、特征及优点将变得更加显而易见。下面描述的附图仅仅是本申请的一些实施例,对于本领域的普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。The above and other objects, features and advantages of the present application will become more apparent by describing in detail example embodiments thereof with reference to the accompanying drawings. The drawings described below are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without exerting creative efforts.
图1是根据一示例性实施例示出的一种基于优化规则决策树的目标安全识别方法及装置的系统框图。Figure 1 is a system block diagram of a target safety identification method and device based on an optimized rule decision tree according to an exemplary embodiment.
图2是根据一示例性实施例示出的一种基于优化规则决策树的目标安全识别方法的流程图。Figure 2 is a flow chart of a target security identification method based on an optimized rule decision tree according to an exemplary embodiment.
图3是根据另一示例性实施例示出的一种基于优化规则决策树的目标安全识别方法的流程图。 Figure 3 is a flow chart of a target security identification method based on an optimized rule decision tree according to another exemplary embodiment.
图4是根据另一示例性实施例示出的一种基于优化规则决策树的目标安全识别方法的流程图。Figure 4 is a flow chart of a target security identification method based on an optimized rule decision tree according to another exemplary embodiment.
图5是根据一示例性实施例示出的一种基于优化规则决策树的目标安全识别装置的框图。Figure 5 is a block diagram of a target safety identification device based on an optimized rule decision tree according to an exemplary embodiment.
图6是根据一示例性实施例示出的一种电子设备的框图。FIG. 6 is a block diagram of an electronic device according to an exemplary embodiment.
具体实施方式Detailed ways
现在将参考附图更全面地描述示例实施例。然而,示例实施例能够以多种形式实施,且不应被理解为限于在此阐述的实施例;相反,提供这些实施例使得本申请将全面和完整,并将示例实施例的构思全面地传达给本领域的技术人员。在图中相同的附图标记表示相同或类似的部分,因而将省略对它们的重复描述。Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in various forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concepts of the example embodiments. To those skilled in the art. The same reference numerals in the drawings represent the same or similar parts, and thus their repeated description will be omitted.
本领域技术人员可以理解,附图只是示例实施例的示意图,附图中的模块或流程并不一定是实施本申请所必须的,因此不能用于限制本申请的保护范围。Those skilled in the art can understand that the accompanying drawings are only schematic diagrams of exemplary embodiments, and the modules or processes in the accompanying drawings are not necessarily necessary to implement the present application, and therefore cannot be used to limit the protection scope of the present application.
本申请涉及的技术缩略语解释如下:The technical abbreviations involved in this application are explained as follows:
规则引擎:是根据一些算法执行规则的一系列软件系统。Rule engine: It is a series of software systems that execute rules according to some algorithms.
Drools:是用Java语言编写的开放源码规则引擎,使用Rete算法对所编写的规则求值。Drools允许使用声明方式表达业务逻辑,可通过存储、处理和评估数据来执行业务规则和决策模型。Drools: It is an open source rule engine written in Java language and uses the Rete algorithm to evaluate the written rules. Drools allows you to express business logic in a declarative manner and execute business rules and decision-making models by storing, processing and evaluating data.
BPMN2.0:全称是Business Process Model and Notation,是一套业务流程模型与符号建模标准,以XML为载体,以符号可视化业务。BPMN2.0: The full name is Business Process Model and Notation. It is a set of business process models and symbolic modeling standards, using XML as the carrier and visualizing business with symbols.
jBPM:全称是Java Business Process Management(业务流程管理),它是覆盖了业务流程管理、工作流、服务协作等领域的一个开源的、灵活的、易扩展的可执行流程语言框架。框架使用的规范是BPMN2.0。jBPM: The full name is Java Business Process Management. It is an open source, flexible and easily extensible executable process language framework covering business process management, workflow, service collaboration and other fields. The specification used by the framework is BPMN2.0.
在本申请中,规则决策树为业务系统在决策过程中,多个控制规则的集合。为方便描述,下面将以终端设备识别的规则决策树为例进行描述。针对不同应用场景、不同业务所关联的终端设备数据等,可构建不同的规则决策树。不同的规则决策树可适用于不同的应用场景,以及各种应用场景下的多种业务的决策规则的生成,灵活性高。规则决策树可基于对历史终端设备数据的分析生成,可靠性强。本申请中,以终端设备操作信息为例,在该业务下对应的应用场景可包括但不限于账户注册、账户登录、数据传输、数据生成、数据下载以及数据维持等。其中,上述应用场景仅是举例,具体可根据实际应用场景确定,在此不做限制。在本申请实施例中,基于不同业务类型所关联的样本数据可构建得到适用于生成各业务类型对应的规则决策树。In this application, the rule decision tree is a collection of multiple control rules in the decision-making process of the business system. For convenience of description, the following will take the rule decision tree for terminal device identification as an example. Different rule decision trees can be constructed for different application scenarios, terminal device data associated with different services, etc. Different rule decision trees can be applied to different application scenarios, and the generation of decision rules for multiple businesses in various application scenarios has high flexibility. The rule decision tree can be generated based on the analysis of historical terminal device data and is highly reliable. In this application, terminal device operation information is taken as an example. The corresponding application scenarios under this business may include but are not limited to account registration, account login, data transmission, data generation, data download, and data maintenance, etc. Among them, the above application scenarios are only examples, and the specific application scenarios can be determined according to the actual application scenarios, and are not limited here. In this embodiment of the present application, based on the sample data associated with different business types, a decision tree of rules suitable for generating corresponding business types can be constructed.
图1是根据一示例性实施例示出的一种基于优化规则决策树的目标安全识别方法及装置的系统框图。 Figure 1 is a system block diagram of a target safety identification method and device based on an optimized rule decision tree according to an exemplary embodiment.
如图1所示,系统架构10可以包括终端设备101、102、103,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。As shown in Figure 1, the system architecture 10 may include terminal devices 101, 102, 103, a network 104 and a server 105. The network 104 is a medium used to provide communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
用户可以使用终端设备101、102、103通过网络104与服务器105交互,以接收或发送消息等。终端设备101、102、103上可以安装有各种通讯客户端应用,例如互联网服务类应用、购物类应用、网页浏览器应用、即时通信工具、邮箱客户端、社交平台软件等。Users can use terminal devices 101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages, etc. Various communication client applications can be installed on the terminal devices 101, 102, and 103, such as Internet service applications, shopping applications, web browser applications, instant messaging tools, email clients, social platform software, etc.
终端设备101、102、103可以是具有显示屏并且支持网页浏览的各种电子设备,包括但不限于智能手机、平板电脑、膝上型便携计算机和台式计算机等等。The terminal devices 101, 102, and 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop computers, desktop computers, and the like.
服务器105可以是提供各种服务的服务器,例如对用户利用终端设备101、102、103所浏览的互联网服务类网站提供支持的后台管理服务器。后台管理服务器可以对接收到的用户数据进行分析等处理,并将处理结果(例如风险分析结果)反馈给互联网服务网站的管理员和/或终端设备101、102、103。The server 105 may be a server that provides various services, such as a backend management server that provides support for Internet service websites browsed by users using the terminal devices 101, 102, and 103. The background management server can analyze and process the received user data, and feed back the processing results (such as risk analysis results) to the administrator of the Internet service website and/or the terminal device 101, 102, 103.
服务器105可通过规则决策树各节点的底层逻辑数据分别生成对应的逻辑字符串;服务器105可根据规则决策树的树状结构确定所述逻辑字符串之间的关系;服务器105可基于所述逻辑字符串之间的关系生成规则结构图;服务器105可依据所述规则结构图分别确定所述逻辑字符串之间的关系重要度;服务器105可分析所述逻辑字符串之间的关系重要度对所述规则决策树进行优化,服务器105可通过优化后的所述规则决策树对待识别目标的目标数据进行识别,根据识别结果对所述待识别目标进行安全分级。The server 105 can generate corresponding logical strings respectively through the underlying logical data of each node of the rule decision tree; the server 105 can determine the relationship between the logical strings according to the tree structure of the rule decision tree; the server 105 can based on the logic The relationship between strings generates a rule structure diagram; the server 105 can determine the relationship importance between the logical strings according to the rule structure diagram; the server 105 can analyze the relationship importance between the logical strings. The rule decision tree is optimized, and the server 105 can identify the target data of the target to be identified through the optimized rule decision tree, and perform security classification on the target to be identified according to the recognition result.
服务器105还可例如通过优化后的所述规则决策树对终端设备101、102、103中的用户数据进行分析。The server 105 may also analyze the user data in the terminal devices 101, 102, and 103, for example, through the optimized rule decision tree.
服务器105可以是一个实体的服务器,还可例如为多个服务器组成,需要说明的是,本申请实施例所提供的基于优化规则决策树的目标安全识别方法可以由服务器105执行,相应地,基于优化规则决策树的目标安全识别装置可以设置于服务器105中。而提供给用户进行互联网服务平台浏览的网页端一般位于终端设备101、102、103中。The server 105 may be an entity server, or may also be composed of multiple servers. It should be noted that the target security identification method based on the optimized rule decision tree provided by the embodiment of the present application can be executed by the server 105. Correspondingly, based on The target security identification device for optimizing the rule decision tree may be set in the server 105 . The web pages provided for users to browse the Internet service platform are generally located in terminal devices 101, 102, and 103.
图2是根据一示例性实施例示出的一种基于优化规则决策树的目标安全识别方法的流程图。基于优化规则决策树的目标安全识别方法20至少包括步骤S202至S212。Figure 2 is a flow chart of a target security identification method based on an optimized rule decision tree according to an exemplary embodiment. The target safety identification method 20 based on the optimization rule decision tree includes at least steps S202 to S212.
如图2所示,在S202中,通过规则决策树各节点的底层逻辑数据分别生成对应的逻辑字符串。可例如,通过python语言对所述规则决策树的底层逻辑数据进行重写解析;在重写解析的过程中,由规则决策树的各节点中提取非结构化规则数据;通过非结构化规则数据生成所述逻辑字符串。As shown in Figure 2, in S202, corresponding logical strings are generated through the underlying logical data of each node of the rule decision tree. For example, the underlying logical data of the rule decision tree can be rewritten and parsed through the Python language; in the process of rewriting and parsing, unstructured rule data is extracted from each node of the rule decision tree; through the unstructured rule data Generate the logical string.
在一个具体的应用中,规则决策树的底层逻辑数据是通过Java语言中droo l s结 合jBPM技术实现。可通过python语言对规则决策树的底层逻辑数据进行重写解析,即为对Java语言实现的底层逻辑软件代码再通过python语言进行重写。In a specific application, the underlying logical data of the rule decision tree is through the droo l s structure in the Java language. Implemented with jBPM technology. The underlying logical data of the rule decision tree can be rewritten and analyzed through the python language, that is, the underlying logic software code implemented in the Java language is rewritten through the python language.
在S204中,根据规则决策树的树状结构确定所述逻辑字符串之间的关系。In S204, the relationship between the logical strings is determined according to the tree structure of the rule decision tree.
在一个实施例中,根据规则决策树的树状结构提取非结构化规则数据之间的关系以作为所述逻辑字符串之间的关系,更具体的,由所述底层逻辑数据中提取非结构化规则数据;将所述非结构化数据作为字符串、将所述非结构化数据之间的关系作为字符串之间的关系。In one embodiment, the relationship between unstructured rule data is extracted according to the tree structure of the rule decision tree as the relationship between the logical strings. More specifically, the unstructured rule data is extracted from the underlying logical data. ization rule data; use the unstructured data as strings, and use the relationships between the unstructured data as relationships between strings.
更具体的,原有的Java语言中规则对应的是非结构化数据,将非结构化数据提取出来,在python语言重写时作为字符串数据保留,原有的结构化数据之间的关系保留。More specifically, the rules in the original Java language correspond to unstructured data. The unstructured data is extracted and retained as string data when rewritten in the Python language, and the relationship between the original structured data is retained.
在S206中,基于所述逻辑字符串之间的关系生成规则结构图。可例如,将逻辑字符串作为规则结构图中的节点;将逻辑字符串之间的关系作为多个节点之间的边;通过节点和边生成所述规则结构图。In S206, a rule structure graph is generated based on the relationship between the logical strings. For example, logical strings can be used as nodes in the rule structure graph; relationships between logical strings can be used as edges between multiple nodes; the rule structure graph can be generated through nodes and edges.
更具体的,规则结构图可为有向无环图,有向无环图指的是一个无回路的有向图。如果有一个非有向无环图,且A点出发向B经C可回到A,形成一个环。将从C到A的边方向改为从A到C,则变成有向无环图。More specifically, the regular structure graph can be a directed acyclic graph, which refers to a directed graph without loops. If there is a non-directed acyclic graph, and starting from point A to B and returning to A via C, a cycle is formed. If the edge direction from C to A is changed from A to C, it becomes a directed acyclic graph.
在一个实施例中,还可对规则结构图进行验证,可将输入项输入规则结构图的输入端,经过计算后得到输出项,将输出项和原有的规则决策树中的输出项进行比对,在结果一致时,确定规则结构图构建正确。In one embodiment, the rule structure diagram can also be verified. The input items can be input into the input end of the rule structure diagram, the output items can be obtained after calculation, and the output items can be compared with the output items in the original rule decision tree. Yes, when the results are consistent, it is determined that the rule structure diagram is constructed correctly.
在S208中,依据所述规则结构图分别确定所述逻辑字符串之间的关系重要度。可例如,获取训练后的机器学习模型和其对应的样本集,所述样本集中包括多个样本数据,每个样本数据包括多个特征;生成多个特征对应的特征重要度;根据所述规则结构图的图结构和多个特征对应的特征重要度确定所述逻辑字符串之间的关系重要度。In S208, the relationship importance between the logical strings is determined according to the rule structure diagram. For example, the trained machine learning model and its corresponding sample set may be obtained. The sample set includes multiple sample data, and each sample data includes multiple features; generate feature importance corresponding to the multiple features; and according to the rules The graph structure of the structure graph and the feature importance corresponding to multiple features determine the relationship importance between the logical strings.
可例如,通过实现训练好的机器学习模型计算出每个特征的重要度,然后基于特征重要度计算规则结构图中节点和边的重要度,即为字符和字符串的重要度,然后还可以结合节点和边在规则结构图中的结构重要度,综合得到节点和边的重要度。For example, you can calculate the importance of each feature by implementing a trained machine learning model, and then calculate the importance of nodes and edges in the rule structure graph based on the feature importance, which is the importance of characters and strings. Then you can also Combined with the structural importance of nodes and edges in the regular structure graph, the importance of nodes and edges is comprehensively obtained.
“依据所述规则结构图分别确定所述逻辑字符串之间的关系重要度”的具体内容将在图3,4对应的实施例中进行详细描述。The specific content of "respectively determining the relationship importance between the logical strings according to the rule structure diagram" will be described in detail in the embodiments corresponding to Figures 3 and 4.
在S210中,分析所述逻辑字符串之间的关系重要度对所述规则决策树进行优化。可例如,根据所述逻辑字符串之间的关系重要度对所述规则结构图中的节点和边进行化简;根据化简之后的所述规则结构图生成优化规则决策树。In S210, the relationship importance between the logical strings is analyzed to optimize the rule decision tree. For example, the nodes and edges in the rule structure graph can be simplified according to the relationship importance between the logical strings; and an optimized rule decision tree can be generated according to the simplified rule structure graph.
更具体的,可例如先求出进入规则结构图的输入项的重要性,其中,输入项是指用户或者产品的特征,输入项中可包括多个特征。根据输入项找到规则决策树中用到 的且对输出项重要性较低的字段,其中,输出项是指规则判断结果,然后筛选出对输出项的重要性较低的输入项的规则,规则在规则结构图中可表现为节点或者边。将这些节点或边在规则结构图中删除,进而调整规则结构图的节点结构,生成新的规则结构图。More specifically, for example, the importance of input items entering the rule structure diagram can be first found, where the input items refer to characteristics of users or products, and the input items may include multiple characteristics. Used in finding rules based on input items in decision trees fields that are of low importance to the output item, where the output item refers to the rule judgment result, and then the rules of the input items that are of low importance to the output item are filtered out. The rules can be represented as nodes or side. Delete these nodes or edges in the rule structure graph, adjust the node structure of the rule structure graph, and generate a new rule structure graph.
在一个实施例中,还可例如,根据化简之后的所述规则结构图生成化简规则决策树;对所述化简规则决策书中的参数进行更新;通过更新后的参数和化简规则决策树生成所述优化规则决策树。In one embodiment, for example, a simplified rule decision tree can be generated based on the simplified rule structure diagram; the parameters in the simplified rule decision book can be updated; and the updated parameters and simplified rules can be used to The decision tree generates the optimization rule decision tree.
由于规则结构图进行了简化,原有的规则决策树为了准确运行,可能会需要调整期规则中的参数,可对规则结构图中的节点与边中的参数进行微调,调整到更为精确的阈值或其他考核指标,本申请不以此为限。Since the rule structure graph has been simplified, in order to run the original rule decision tree accurately, it may be necessary to adjust the parameters in the rules. The parameters in the nodes and edges in the rule structure graph can be fine-tuned to be more accurate. This application is not limited to thresholds or other assessment indicators.
在S212中,通过优化后的所述规则决策树对待识别目标的目标数据进行识别,根据识别结果对所述待识别目标进行安全分级。在实际应用过程中,可将设备作为待识别的目标,获取待识别的设备的设备数据,将设备数据输入到优化后的规则决策树中,规则决策树根据其内部的多个规则对设备数据进行判断,生成识别结果。识别结果可为高等级、中等级或低等级,识别结果还可为评分形式,本申请不以此为限。根据识别结果确定所述设备的安全等级。设备可根据其对应的安全等级访问不同的数据资源。In S212, the target data of the target to be identified is identified through the optimized rule decision tree, and the security classification of the target to be identified is performed according to the identification result. In the actual application process, the device can be used as the target to be identified, the device data of the device to be identified can be obtained, and the device data can be input into the optimized rule decision tree. The rule decision tree evaluates the device data according to its multiple internal rules. Make judgments and generate recognition results. The recognition result can be a high level, a medium level or a low level, and the recognition result can also be in the form of a score, and this application is not limited to this. The security level of the device is determined based on the identification results. Devices can access different data resources based on their corresponding security levels.
根据本申请的基于优化规则决策树的目标安全识别方法,通过规则决策树各节点的底层逻辑数据分别生成对应的逻辑字符串;根据规则决策树的树状结构确定所述逻辑字符串之间的关系;基于所述逻辑字符串之间的关系生成规则结构图;依据所述规则结构图分别确定所述逻辑字符串之间的关系重要度;分析所述逻辑字符串之间的关系重要度对所述规则决策树进行优化,通过优化后的所述规则决策树对待识别目标的目标数据进行识别,根据识别结果对所述待识别目标进行安全分级的方式,能够对复杂的规则决策树进行简化,提高业务决策效率,保证业务数据安全;还能够在业务数据出现错误时,快速计算影响程度,保证业务运行安全。本申请中的方法可以从整体考虑进行优化剪枝,使结构更加清晰无冗余,使维护更容易,在数据源出问题后能更快更准确的评估影响重新上线,在新上线的模型打分上确定更精准的阈值对终端设备进行筛选分级等。According to the target security identification method based on the optimized rule decision tree of the present application, corresponding logical strings are respectively generated through the underlying logical data of each node of the rule decision tree; the relationship between the logical strings is determined according to the tree structure of the rule decision tree. Relationship; generate a rule structure diagram based on the relationship between the logical strings; determine the relationship importance between the logical strings according to the rule structure diagram; analyze the relationship importance between the logical strings The rule decision tree is optimized, the target data of the target to be identified is identified through the optimized rule decision tree, and the target to be identified is classified securely according to the identification results, which can simplify the complex rule decision tree. , improve the efficiency of business decision-making and ensure the security of business data; it can also quickly calculate the degree of impact when errors occur in business data to ensure the safety of business operations. The method in this application can be optimized and pruned from an overall perspective, making the structure clearer and free of redundancy, making maintenance easier. After a problem with the data source, the impact can be assessed more quickly and accurately and the impact can be re-launched, and the newly launched model can be scored. Determine more accurate thresholds to filter and classify terminal devices.
应清楚地理解,本申请描述了如何形成和使用特定示例,但本申请的原理不限于这些示例的任何细节。相反,基于本申请公开的内容的教导,这些原理能够应用于许多其它实施例。It should be clearly understood that this application describes how to make and use specific examples, but that the principles of this application are not limited to any details of these examples. Rather, these principles can be applied to many other embodiments based on the teachings of this disclosure.
图3是根据另一示例性实施例示出的一种基于优化规则决策树的目标安全识别方法的流程图。图3所示的流程30是对图2所示的流程中S206“依据所述规则结构图分别确定所述逻辑字符串之间的关系重要度”的详细描述。 Figure 3 is a flow chart of a target security identification method based on an optimized rule decision tree according to another exemplary embodiment. The process 30 shown in FIG. 3 is a detailed description of S206 of "determining the relationship importance between the logical strings according to the rule structure diagram" in the process shown in FIG. 2 .
如图3所示,在S302中,获取训练后的机器学习模型和其对应的样本集,所述样本集中包括多个样本数据,每个样本数据包括多个特征。As shown in Figure 3, in S302, the trained machine learning model and its corresponding sample set are obtained. The sample set includes multiple sample data, and each sample data includes multiple features.
在一个具体的应用中,样本集合可为终端设备特征样本集合,可基于终端设备信息和特征策略生成多个特征信息。可对终端设备信息进行数据清洗和数据融合以将终端设备信息转化为多个特征数据,更具体的,可对终端设备信息进行变量缺失率分析与处理、异常值处理;还可将连续型变量离散化的用户信息进行WOE转化、离散型变量WOE转化、文本变量加工处理、文本变量的word2vec处理等等。In a specific application, the sample set may be a terminal device feature sample set, and multiple feature information may be generated based on terminal device information and feature policies. Data cleaning and data fusion can be performed on the terminal equipment information to convert the terminal equipment information into multiple feature data. More specifically, the terminal equipment information can be analyzed and processed for variable missing rate and outlier processing; continuous variables can also be converted into Discrete user information is converted to WOE, discrete variables are converted to WOE, text variables are processed, word2vec processing of text variables is performed, etc.
在本申请实施例中,终端设备可为个人用户终端设备或者企业用户终端设备。其中,目标数据可以是终端设备信息,终端设备信息可包括经过用户授权的基础信息,可例如为业务账号信息、终端设备标识信息、终端设备所处地域信息等;终端设备信息还可包括行为信息,可例如为终端设备的页面操作数据、终端设备的业务访问时长、终端设备的业务访问频率等,终端设备信息的具体内容可根据实际应用场景确定,在此不做限制。In this embodiment of the present application, the terminal device may be a personal user terminal device or an enterprise user terminal device. The target data may be terminal device information, and the terminal device information may include basic information authorized by the user, which may be, for example, business account information, terminal device identification information, terminal device location information, etc.; terminal device information may also include behavior information. , can be, for example, the page operation data of the terminal device, the service access duration of the terminal device, the service access frequency of the terminal device, etc. The specific content of the terminal device information can be determined according to the actual application scenario, and is not limited here.
通过样本数据中的多个样本和特征对机器学习模型进行训练,在训练完毕时,生成能够稳定运行在业务中的机器学习模型。该机器学习模型可例如为卷积神经网络模型,其对应的样本集中可包括多个终端设备样本,终端设备样本中可包括终端设备表示信息、终端设备操作数据、终端设备的业务访问信息等等特征。The machine learning model is trained through multiple samples and features in the sample data. When the training is completed, a machine learning model that can run stably in the business is generated. The machine learning model may be, for example, a convolutional neural network model, and its corresponding sample set may include multiple terminal device samples. The terminal device samples may include terminal device representation information, terminal device operation data, terminal device service access information, etc. feature.
在S304中,生成多个特征对应的特征重要度。可例如,生成所述机器学习模型在所述样本集上的初始性能分;生成多个特征对应的特征性能评分;根据所述初始能评分和多个特征性能评分生成多个特征重要度。In S304, feature importance corresponding to multiple features is generated. For example, an initial performance score of the machine learning model on the sample set may be generated; feature performance scores corresponding to multiple features may be generated; and multiple feature importances may be generated based on the initial performance scores and multiple feature performance scores.
在S306中,根据所述规则结构图的图结构和多个特征对应的特征重要度确定所述逻辑字符串之间的关系重要度。可例如,根据所述规则结构图的图结构确定所述逻辑字符串之间关系的结构重要度;根据多个特征对应的特征重要度确定所述逻辑字符串之间关系的特征重要度;根据所述结构重要度和所述特征重要度生成所述逻辑字符串之间关系重要度。In S306, the relationship importance between the logical strings is determined based on the graph structure of the rule structure graph and the feature importance corresponding to multiple features. For example, the structural importance of the relationship between the logical strings can be determined based on the graph structure of the rule structure diagram; the feature importance of the relationship between the logical strings can be determined based on the feature importance corresponding to multiple features; The structural importance and the feature importance generate the relationship importance between the logical strings.
在一个具体的实施例中,可基于图算法中的节点重要度算法计算求解规则结构图中的节点重要度,还可基于图算法中的相关算法计算求解规则结构图中的边的重要度。In a specific embodiment, the node importance in solving the rule structure graph can be calculated based on the node importance algorithm in the graph algorithm, and the importance of the edges in the rule structure graph can also be calculated based on the related algorithm in the graph algorithm.
在一个实施例中,可为结构重要度和特征重要度分别设置权重,进而综合计算出节点和边的重要度,对应于所述逻辑字符串之间关系重要度。In one embodiment, weights can be set for structure importance and feature importance respectively, and then the importance of nodes and edges can be comprehensively calculated, corresponding to the importance of relationships between the logical strings.
本申请的基于优化规则决策树的目标安全识别方法,能够对现有规则流进行解析,优化剪枝,去掉对结果影响较小的节点或规则。还能够够帮助策略相关工作人员,对于新上模型的阈值进行更精准的划定。优化之后使结构更加清晰明了,去掉了无用的规则节点以及模型,使该结构维护起来更加容易。并且可以快速的尝试不同的阈值对最终结果的影响,可以帮助策略人员更为精准的确定新上线模型的阈值,以及受下线 数据源影响的模型的新阈值。The target security identification method based on the optimized rule decision tree of this application can analyze the existing rule flow, optimize pruning, and remove nodes or rules that have a small impact on the results. It can also help policy-related staff to more accurately define the thresholds for new models. After optimization, the structure is made clearer and useless rule nodes and models are removed, making the structure easier to maintain. And you can quickly try the impact of different thresholds on the final result, which can help strategists more accurately determine the thresholds of new online models and the impact of offline models. New threshold for models affected by data sources.
图4是根据另一示例性实施例示出的一种基于优化规则决策树的目标安全识别方法的流程图。图4所示的流程40是对图3所示的流程中S304“生成多个特征对应的特征性能评分”的详细描述。Figure 4 is a flow chart of a target security identification method based on an optimized rule decision tree according to another exemplary embodiment. The process 40 shown in Figure 4 is a detailed description of S304 "Generating feature performance scores corresponding to multiple features" in the process shown in Figure 3 .
如图4所示,在S402中,依次提取所述样本集合的多个特征中的一个特征。As shown in Figure 4, in S402, one feature among multiple features of the sample set is extracted in sequence.
在S404中,将所述样本集中的所述特征进行随机重排生成随机样本集。In S404, the features in the sample set are randomly rearranged to generate a random sample set.
在S406中,生成所述机器学习模型在所述随机样本集上的对应于所述特征的特征性能评分。In S406, a feature performance score of the machine learning model corresponding to the feature on the random sample set is generated.
在S408中,根据多个特征对应的特征重要度确定所述逻辑字符串之间关系的特征重要度。In S408, the feature importance of the relationship between the logical strings is determined based on the feature importance corresponding to the multiple features.
可假设训练后的机器学习模型为M,其对应的样本集合为D,样本集合中可包括验证集、训练集,还可包括测试集,本申请不以此为限。假设在样本集合D中,特征包括T1,T2……Tj。更具体的,样本集合中目标A的特征可表示为“TA 1,TA 2……TA j”,目标B的特征可表示为“TB 1,TB 2……TB j”,以此类推。It can be assumed that the trained machine learning model is M, and its corresponding sample set is D. The sample set may include a verification set, a training set, and a test set. This application is not limited to this. Assume that in the sample set D, the features include T 1 , T 2 ...T j . More specifically, the characteristics of target A in the sample set can be expressed as " TA 1 , T A 2 ...T A j ", and the characteristics of target B can be expressed as "T B 1 , T B 2 ...T B j " , and so on.
可对j个特征分别计算其特征重要度,其中,每个特征可共进行k次计算,从而生成该特征的特征性能评分。The feature importance can be calculated separately for j features, where each feature can be calculated a total of k times to generate a feature performance score for the feature.
首先可假设本次进行特征计算的特征为Tj,在K次计算中的每一次计算中,首先随机重排列特征Tj,即为,在打乱特征中,用户A的特征可表示为“TA 1,TA 2……TE j”,即为,用户A对应的Tj与用户E对应的Tj替换,用户B的特征可表示为“TB 1,TB 2……TS j”,即为,用户B对应的Tj与用户S对应的Tj替换,以此类推,生成一组随机样本集合。First of all, it can be assumed that the feature for feature calculation this time is T j . In each of the K calculations, the feature T j is first randomly rearranged, that is, among the scrambled features, the feature of user A can be expressed as " T A 1 , T A 2 ……T E j ”, that is, T j corresponding to user A is replaced with T j corresponding to user E. The characteristics of user B can be expressed as “T B 1 , T B 2 ……T S j ”, that is, T j corresponding to user B is replaced with T j corresponding to user S, and so on, to generate a set of random samples.
计算机器学习模型M在原始样本集合中的性能评分为Q,计算机器学习模型M在改组随机样本集合中的性能评分,将其记录为QkjCalculate the performance score of the machine learning model M in the original sample set as Q, calculate the performance score of the machine learning model M in the shuffled random sample set, and record it as Q kj ;
经过K次计算,得到K个Qkj,然后基于如下公式计算特征Tj的重要度:
After K calculations, K Q kj are obtained, and then the importance of feature T j is calculated based on the following formula:
根据本申请的基于优化规则决策树的目标安全识别方法,通过计算输入项中每个特征的重要性,从而反推使用相应输入项的节点或规则的重要性,来达到对规则流进行优化剪枝的方法。通过用python对整个drools+jBPM架构进行重写,将现有的风控规则流构建成有向无环图,实现python语言的规则流,能够将输入项经规则流计算后得到输出项。通过对图中节点以及节点下的规则进行重要性的求解,可以将整个风控规则流中,不重要的决策节点或规则剔除,或者对于某个决策点的阈值进行调整,使其更精准的对目标进行筛选分级。According to the target security identification method based on the optimized rule decision tree of this application, by calculating the importance of each feature in the input item, and thereby inferring the importance of the node or rule using the corresponding input item, the rule flow is optimized and pruned. branch method. By rewriting the entire drools+jBPM architecture in python, the existing risk control rule flow is constructed into a directed acyclic graph to implement the rule flow in python language, and the input items can be calculated through the rule flow to obtain the output items. By solving the importance of nodes in the graph and the rules under the nodes, unimportant decision nodes or rules can be eliminated from the entire risk control rule flow, or the threshold of a certain decision point can be adjusted to make it more accurate. Filter and classify targets.
根据本申请的基于优化规则决策树的目标安全识别方法,对于规则流本身,也能 达到梳理规则流的目的,使规则下的模型在遇到外部数据源出现问题的时候能够更快的得出去掉该数据源之后的效果,为重新上线做参考。通过剪枝优化,下线掉无效的模型或者输入项,使规则流达到结构不冗余,更有益于后期维护,更快的评估数据源的影响,更精准的对终端设备进行筛选分级的目的。According to the target security identification method based on optimized rule decision trees of this application, the rule flow itself can also be The purpose of sorting out the rule flow is achieved, so that when the model under the rules encounters a problem with an external data source, it can quickly obtain the effect of removing the data source, which can serve as a reference for re-online. Through pruning optimization, invalid models or input items are removed offline, so that the rule flow achieves a non-redundant structure, which is more conducive to later maintenance, faster assessment of the impact of data sources, and more accurate screening and classification of terminal devices. .
本领域技术人员可以理解实现上述实施例的全部或部分步骤被实现为由CPU执行的计算机程序。在该计算机程序被CPU执行时,执行本申请提供的上述方法所限定的上述功能。所述的程序可以存储于一种计算机可读存储介质中,该存储介质可以是只读存储器,磁盘或光盘等。Those skilled in the art can understand that all or part of the steps for implementing the above-described embodiments are implemented as computer programs executed by a CPU. When the computer program is executed by the CPU, the above-mentioned functions defined by the above-mentioned method provided by this application are executed. The program can be stored in a computer-readable storage medium, which can be a read-only memory, a magnetic disk or an optical disk.
此外,需要注意的是,上述附图仅是根据本申请示例性实施例的方法所包括的处理的示意性说明,而不是限制目的。易于理解,上述附图所示的处理并不表明或限制这些处理的时间顺序。另外,也易于理解,这些处理可以是例如在多个模块中同步或异步执行的。In addition, it should be noted that the above-mentioned drawings are only schematic illustrations of processes included in the methods according to the exemplary embodiments of the present application, and are not intended to be limiting. It is readily understood that the processes shown in the above figures do not indicate or limit the temporal sequence of these processes. In addition, it is also easy to understand that these processes may be executed synchronously or asynchronously in multiple modules, for example.
下述为本申请装置实施例,可以用于执行本申请方法实施例。对于本申请装置实施例中未披露的细节,请参照本申请方法实施例。The following are device embodiments of the present application, which can be used to execute method embodiments of the present application. For details not disclosed in the device embodiments of this application, please refer to the method embodiments of this application.
图5是根据一示例性实施例示出的一种基于优化规则决策树的目标安全识别装置的框图。如图5所示,基于优化规则决策树的目标安全识别装置50包括:字符模块502,关系模块504,结构模块506,重要度模块508,优化模块510,识别模块512。Figure 5 is a block diagram of a target safety identification device based on an optimized rule decision tree according to an exemplary embodiment. As shown in Figure 5, the target safety identification device 50 based on the optimization rule decision tree includes: a character module 502, a relationship module 504, a structure module 506, an importance module 508, an optimization module 510, and an identification module 512.
字符模块502,用于通过规则决策树各节点的底层逻辑数据分别生成对应的逻辑字符串。The character module 502 is used to generate corresponding logical strings through the underlying logical data of each node of the rule decision tree.
关系模块504,用于根据规则决策树的树状结构确定所述逻辑字符串之间的关系。The relationship module 504 is used to determine the relationship between the logical strings according to the tree structure of the rule decision tree.
结构模块506用于基于所述逻辑字符串之间的关系生成规则结构图;结构模块504还用于将逻辑字符串作为规则结构图中的节点;将逻辑字符串之间的关系作为多个节点之间的边;通过节点和边生成所述规则结构图。The structure module 506 is used to generate a rule structure graph based on the relationship between the logical strings; the structure module 504 is also used to use the logical strings as nodes in the rule structure chart; use the relationships between the logical strings as multiple nodes the edges between them; generating the regular structure graph through nodes and edges.
重要度模块508用于依据所述规则结构图分别确定所述逻辑字符串之间的关系重要度;重要度模块506还用于获取训练后的机器学习模型和其对应的样本集,所述样本集中包括多个样本数据,每个样本数据包括多个特征;生成多个特征对应的特征重要度;根据所述规则结构图的图结构和多个特征对应的特征重要度确定所述逻辑字符串之间的关系重要度。The importance module 508 is used to determine the relationship importance between the logical strings according to the rule structure diagram; the importance module 506 is also used to obtain the trained machine learning model and its corresponding sample set. The collection includes multiple sample data, each sample data includes multiple features; generates feature importance corresponding to the multiple features; determines the logical string according to the graph structure of the rule structure diagram and the feature importance corresponding to the multiple features importance of the relationship between them.
优化模块510用于分析所述逻辑字符串之间的关系重要度对所述规则决策树进行优化。优化模块508还用于根据所述逻辑字符串之间的关系重要度对所述规则结构图中的节点和边进行化简;根据化简之后的所述规则结构图生成优化规则决策树。The optimization module 510 is configured to analyze the relationship importance between the logical strings and optimize the rule decision tree. The optimization module 508 is also configured to simplify the nodes and edges in the rule structure graph according to the relationship importance between the logical strings; and generate an optimization rule decision tree according to the simplified rule structure graph.
识别模块512用于通过优化后的所述规则决策树对待识别目标的目标数据进行识别,根据识别结果对所述待识别目标进行安全分级。The identification module 512 is configured to identify the target data of the target to be identified through the optimized rule decision tree, and perform security classification on the target to be identified based on the identification results.
根据本申请的基于优化规则决策树的目标安全识别装置,通过规则决策树各节点 的底层逻辑数据分别生成对应的逻辑字符串;根据规则决策树的树状结构确定所述逻辑字符串之间的关系;基于所述逻辑字符串之间的关系生成规则结构图;依据所述规则结构图分别确定所述逻辑字符串之间的关系重要度;分析所述逻辑字符串之间的关系重要度对所述规则决策树进行优化,通过优化后的所述规则决策树对待识别目标的目标数据进行识别,根据识别结果对所述待识别目标进行安全分级的方式,能够对复杂的规则决策树进行简化,提高业务决策效率,保证业务数据安全;还能够在业务数据出现错误时,快速计算影响程度,保证业务运行安全。According to the target safety identification device based on the optimized rule decision tree of the present application, through each node of the rule decision tree The underlying logical data generate corresponding logical strings respectively; determine the relationship between the logical strings according to the tree structure of the rule decision tree; generate a rule structure diagram based on the relationship between the logical strings; according to the rules The structure diagram determines the importance of the relationship between the logical strings respectively; analyzes the importance of the relationship between the logical strings to optimize the rule decision tree, and uses the optimized rule decision tree to determine the target to be identified. The method of identifying the target data and classifying the security of the target to be identified based on the identification results can simplify the complex rule decision tree, improve the efficiency of business decision-making, and ensure the security of business data; it can also quickly Calculate the degree of impact to ensure safe business operations.
如图6所示,本发明实施例提供了一种电子设备,包括处理器1110、通信接口1120、存储器1130和通信总线1140,其中,处理器1110,通信接口1120,存储器1130通过通信总线1140完成相互间的通信;As shown in Figure 6, the embodiment of the present invention provides an electronic device, including a processor 1110, a communication interface 1120, a memory 1130, and a communication bus 1140. The processor 1110, the communication interface 1120, and the memory 1130 are completed through the communication bus 1140. communication between each other;
存储器1130,用于存放计算机程序;Memory 1130, used to store computer programs;
处理器1110,用于执行存储器1130上所存放的程序时,实现上述任一实施例的基于优化规则决策树的目标安全识别方法。The processor 1110 is configured to implement the target safety identification method based on the optimization rule decision tree of any of the above embodiments when executing the program stored on the memory 1130.
本发明实施例提供的电子设备,处理器1110通过执行存储器1130上所存放的程序通过规则决策树的底层逻辑数据生成所述逻辑字符串之间的关系;基于所述逻辑字符串之间的关系生成规则结构图;依据所述规则结构图分别确定所述逻辑字符串之间的关系重要度;分析所述逻辑字符串之间的关系重要度对所述规则决策树进行优化,通过优化后的所述规则决策树对待识别目标的目标数据进行识别,根据识别结果对所述待识别目标进行安全分级。In the electronic device provided by the embodiment of the present invention, the processor 1110 generates the relationship between the logical strings through the underlying logical data of the rule decision tree by executing the program stored on the memory 1130; based on the relationship between the logical strings Generate a rule structure diagram; determine the relationship importance between the logical strings according to the rule structure diagram; analyze the relationship importance between the logical strings to optimize the rule decision tree, and through the optimized The rule decision tree identifies the target data of the target to be identified, and performs security classification on the target to be identified based on the identification results.
上述电子设备提到的通信总线1140可以是外设部件互连标准(PeripheralComponentInterconnect,简称PCI)总线或扩展工业标准结构(ExtendedIndustryStandardArchitecture,简称EISA)总线等。该通信总线1140可以分为地址总线、数据总线、控制总线等。为便于表示,图中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。The communication bus 1140 mentioned in the above-mentioned electronic equipment may be a Peripheral Component Interconnect (PCI for short) bus or an Extended Industrial Standard Architecture (Extended Industry Standard Architecture (EISA for short) bus), etc. The communication bus 1140 can be divided into an address bus, a data bus, a control bus, etc. For ease of presentation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus.
通信接口1120用于上述电子设备与其他设备之间的通信。The communication interface 1120 is used for communication between the above-mentioned electronic device and other devices.
存储器1130可以包括随机存取存储器1130(RandomAccessMemory,简称RAM),也可以包括非易失性存储器1130(non-volatilememory),例如至少一个磁盘存储器1130。可选的,存储器1130还可以是至少一个位于远离前述处理器1110的存储装置。The memory 1130 may include a random access memory 1130 (Random Access Memory, RAM for short), or may include a non-volatile memory 1130 (non-volatile memory), such as at least one disk memory 1130. Optionally, the memory 1130 may also be at least one storage device located far away from the aforementioned processor 1110 .
上述的处理器1110可以是通用处理器1110,包括中央处理器1110(CentralProcessingUnit,简称CPU)、网络处理器1110(NetworkProcessor,简称NP)等;还可以是数字信号处理器1110(DigitalSignalProcessing,简称DSP)、专用集成电路(ApplicationSpecificIntegratedCircuit,简称ASIC)、现场可编程门阵列(Field-ProgrammableGateArray,简称FPGA)或者其他可编程逻辑器件、分立门或 者晶体管逻辑器件、分立硬件组件。The above-mentioned processor 1110 may be a general processor 1110, including a central processing unit 1110 (Central Processing Unit, referred to as CPU), a network processor 1110 (Network Processor, referred to as NP), etc.; it may also be a digital signal processor 1110 (Digital Signal Processing, referred to as DSP) , Application Specific Integrated Circuit (ASIC for short), Field-Programmable Gate Array (FPGA for short) or other programmable logic devices, discrete gates or or transistor logic devices, discrete hardware components.
本发明实施例提供了一种计算机可读存储介质,计算机可读存储介质存储有一个或者多个程序,一个或者多个程序可被一个或者多个处理器1110执行,以实现上述任一实施例的基于优化规则决策树的目标安全识别方法。Embodiments of the present invention provide a computer-readable storage medium. The computer-readable storage medium stores one or more programs. The one or more programs can be executed by one or more processors 1110 to implement any of the above embodiments. A target security identification method based on optimized rule decision trees.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行计算机程序指令时,全部或部分地产生按照本发明实施例的流程或功能。计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘SolidStateDisk(SSD))等。In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product. A computer program product includes one or more computer instructions. When computer program instructions are loaded and executed on a computer, processes or functions according to embodiments of the present invention are generated in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable device. Computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, e.g., computer instructions may be transmitted from a website, computer, server or data center via a wired link (e.g. Coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) means to transmit to another website site, computer, server or data center. Computer-readable storage media can be any available media that can be accessed by a computer or a data storage device such as a server, data center, or other integrated media that contains one or more available media. Available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk (SSD)), etc.
以上具体地示出和描述了本申请的示例性实施例。应可理解的是,本申请不限于这里描述的详细结构、设置方式或实现方法;相反,本申请意图涵盖包含在所附权利要求的精神和范围内的各种修改和等效设置。 Exemplary embodiments of the present application have been specifically shown and described above. It is to be understood that the present application is not limited to the detailed structures, arrangements, or implementation methods described herein; on the contrary, the present application is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (13)

  1. 一种基于优化规则决策树的目标安全识别方法,其特征在于,包括:A target safety identification method based on an optimized rule decision tree, which is characterized by including:
    通过规则决策树各节点的底层逻辑数据分别生成对应的逻辑字符串;The corresponding logical strings are generated through the underlying logical data of each node of the rule decision tree;
    根据规则决策树的树状结构确定所述逻辑字符串之间的关系;Determine the relationship between the logical strings according to the tree structure of the rule decision tree;
    基于所述逻辑字符串之间的关系生成规则结构图;Generate a rule structure diagram based on the relationship between the logical strings;
    依据所述规则结构图分别确定所述逻辑字符串之间的关系重要度;Determine the relationship importance between the logical strings according to the rule structure diagram;
    分析所述逻辑字符串之间的关系重要度对所述规则决策树进行优化;Analyze the relationship importance between the logical strings to optimize the rule decision tree;
    通过优化后的所述规则决策树对待识别目标的目标数据进行识别,根据识别结果对所述待识别目标进行安全分级。The target data of the target to be identified is identified through the optimized rule decision tree, and the security classification of the target to be identified is performed according to the identification results.
  2. 如权利要求1所述的方法,其特征在于,通过规则决策树各节点的底层逻辑数据分别生成对应的逻辑字符串,包括:The method according to claim 1, characterized in that corresponding logical strings are respectively generated through the underlying logical data of each node of the rule decision tree, including:
    通过python语言对所述规则决策树的底层逻辑数据进行重写解析;Rewrite and analyze the underlying logical data of the rule decision tree through Python language;
    在重写解析的过程中,由规则决策树的各节点中提取非结构化规则数据;In the process of rewriting and parsing, unstructured rule data is extracted from each node of the rule decision tree;
    通过非结构化规则数据生成所述逻辑字符串。The logical string is generated from unstructured rule data.
  3. 如权利要求2所述的方法,其特征在于,根据规则决策树的树状结构确定所述逻辑字符串之间的关系,包括:The method of claim 2, wherein determining the relationship between the logical strings according to the tree structure of a rule decision tree includes:
    根据规则决策树的树状结构提取非结构化规则数据之间的关系以作为所述逻辑字符串之间的关系。The relationship between unstructured rule data is extracted according to the tree structure of the rule decision tree as the relationship between the logical strings.
  4. 如权利要求1所述的方法,其特征在于,基于所述逻辑字符串之间的关系生成规则结构图,包括: The method of claim 1, wherein generating a rule structure graph based on the relationship between the logical strings includes:
    将逻辑字符串作为规则结构图中的节点;Use logical strings as nodes in the rule structure graph;
    将逻辑字符串之间的关系作为多个节点之间的边;Treat relationships between logical strings as edges between multiple nodes;
    通过节点和边生成所述规则结构图。The regular structure graph is generated through nodes and edges.
  5. 如权利要求1所述的方法,其特征在于,依据所述规则结构图分别确定所述逻辑字符串之间的关系重要度,包括:The method of claim 1, wherein determining the relationship importance between the logical strings according to the rule structure diagram includes:
    获取训练后的机器学习模型和其对应的样本集,所述样本集中包括多个样本数据,每个样本数据包括多个特征;Obtain the trained machine learning model and its corresponding sample set, where the sample set includes multiple sample data, and each sample data includes multiple features;
    生成多个特征对应的特征重要度;Generate feature importance corresponding to multiple features;
    根据所述规则结构图的图结构和多个特征对应的特征重要度确定所述逻辑字符串之间的关系重要度。The relationship importance between the logical strings is determined according to the graph structure of the rule structure graph and the feature importance corresponding to multiple features.
  6. 如权利要求5所述的方法,其特征在于,生成多个特征对应的特征重要度,包括:The method of claim 5, characterized in that generating feature importance corresponding to multiple features includes:
    生成所述机器学习模型在所述样本集上的初始性能分;Generate an initial performance score of the machine learning model on the sample set;
    生成多个特征对应的特征性能评分;Generate feature performance scores corresponding to multiple features;
    根据所述初始能评分和多个特征性能评分生成多个特征重要度。A plurality of feature importances are generated according to the initial energy score and a plurality of feature performance scores.
  7. 如权利要求6所述的方法,其特征在于,生成多个特征对应的特征性能评分,包括:The method of claim 6, wherein generating feature performance scores corresponding to multiple features includes:
    依次提取所述样本集合的多个特征中的一个特征;Extract one feature among multiple features of the sample set in sequence;
    将所述样本集中的所述特征进行随机重排生成随机样本集;Randomly rearrange the features in the sample set to generate a random sample set;
    生成所述机器学习模型在所述随机样本集上的对应于所述特征的特征性能评分。Generating a feature performance score of the machine learning model corresponding to the feature on the random sample set.
  8. 如权利要求5所述的方法,其特征在于,根据所述规则结构图的图结构和多个特征对应的特征重要度确定所述逻辑字符串之间的关系重要度,包括:The method of claim 5, wherein determining the relationship importance between the logical strings according to the graph structure of the rule structure graph and the feature importance corresponding to multiple features includes:
    根据所述规则结构图的图结构确定所述逻辑字符串之间关系的结构重要度;Determine the structural importance of the relationship between the logical strings according to the graph structure of the rule structure graph;
    根据多个特征对应的特征重要度确定所述逻辑字符串之间关系 的特征重要度;Determine the relationship between the logical strings based on the feature importance corresponding to multiple features feature importance;
    根据所述结构重要度和所述特征重要度生成所述逻辑字符串之间的关系重要度。The relationship importance between the logical strings is generated according to the structural importance and the feature importance.
  9. 如权利要求1所述的方法,其特征在于,分析所述逻辑字符串之间的关系重要度对所述规则决策树进行优化,包括:The method of claim 1, wherein analyzing the relationship importance between the logical strings to optimize the rule decision tree includes:
    根据所述逻辑字符串之间的关系重要度对所述规则结构图中的节点和边进行化简;Simplify the nodes and edges in the rule structure graph according to the relationship importance between the logical strings;
    根据化简之后的所述规则结构图生成优化规则决策树。An optimized rule decision tree is generated based on the simplified rule structure graph.
  10. 如权利要求9所述的方法,其特征在于,根据化简之后的所述规则结构图生成优化规则决策树,包括:The method of claim 9, wherein generating an optimized rule decision tree based on the simplified rule structure diagram includes:
    根据化简之后的所述规则结构图生成化简规则决策树;Generate a simplified rule decision tree according to the simplified rule structure diagram;
    对所述化简规则决策书中的参数进行更新;Update the parameters in the simplification rule decision book;
    通过更新后的参数和化简规则决策树生成所述优化规则决策树。The optimized rule decision tree is generated through the updated parameters and the simplified rule decision tree.
  11. 一种基于优化规则决策树的目标安全识别装置,其特征在于,包括:A target safety identification device based on an optimized rule decision tree, which is characterized by including:
    字符模块,用于通过规则决策树各节点的底层逻辑数据分别生成对应的逻辑字符串;The character module is used to generate corresponding logical strings through the underlying logical data of each node of the rule decision tree;
    关系模块,用于根据规则决策树的树状结构确定所述逻辑字符串之间的关系;A relationship module, used to determine the relationship between the logical strings according to the tree structure of the rule decision tree;
    结构模块,用于基于所述逻辑字符串之间的关系生成规则结构图;A structure module, used to generate a rule structure diagram based on the relationship between the logical strings;
    重要度模块,用于依据所述规则结构图分别确定所述逻辑字符串之间的关系重要度;An importance module, used to respectively determine the importance of relationships between the logical strings according to the rule structure diagram;
    优化模块,用于分析所述逻辑字符串之间的关系重要度对所述规则决策树进行优化;An optimization module, used to analyze the relationship importance between the logical strings and optimize the rule decision tree;
    识别模块,用于通过优化后的所述规则决策树对待识别目标的目标数据进行识别,根据识别结果对所述待识别目标进行安全分级。An identification module is used to identify the target data of the target to be identified through the optimized rule decision tree, and perform security classification on the target to be identified based on the identification results.
  12. 一种电子设备,其特征在于,包括: An electronic device, characterized by including:
    一个或多个处理器;one or more processors;
    存储装置,用于存储一个或多个程序;A storage device for storing one or more programs;
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-10中任一所述的方法。When the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the method as described in any one of claims 1-10.
  13. 一种计算机可读介质,其上存储有计算机程序,其特征在于,所述程序被处理器执行时实现如权利要求1-10中任一所述的方法。 A computer-readable medium with a computer program stored thereon, characterized in that when the program is executed by a processor, the method according to any one of claims 1-10 is implemented.
PCT/CN2023/077880 2022-06-23 2023-02-23 Target security recognition method and apparatus based on optimization rule decision tree WO2023246146A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210726849.1A CN115310510A (en) 2022-06-23 2022-06-23 Target safety identification method and device based on optimization rule decision tree and electronic equipment
CN202210726849.1 2022-06-23

Publications (1)

Publication Number Publication Date
WO2023246146A1 true WO2023246146A1 (en) 2023-12-28

Family

ID=83855367

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/077880 WO2023246146A1 (en) 2022-06-23 2023-02-23 Target security recognition method and apparatus based on optimization rule decision tree

Country Status (2)

Country Link
CN (1) CN115310510A (en)
WO (1) WO2023246146A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115310510A (en) * 2022-06-23 2022-11-08 上海淇玥信息技术有限公司 Target safety identification method and device based on optimization rule decision tree and electronic equipment
CN116304920B (en) * 2023-02-13 2023-10-20 中国地质大学(武汉) Optimization method and device for stream data classification model
CN116579796A (en) * 2023-05-11 2023-08-11 广州一小时科技有限公司 Benefit analysis method and device for realizing intelligent store based on deep learning

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6868413B1 (en) * 2001-05-10 2005-03-15 Networks Associates Technology, Inc. System and method for customizing and processing business logic rules in a business process system
CN101751399A (en) * 2008-12-12 2010-06-23 中国移动通信集团河北有限公司 Decision tree optimization method and optimization system
CN107808245A (en) * 2017-10-25 2018-03-16 冶金自动化研究设计院 Based on the network scheduler system for improving traditional decision-tree
CN110705622A (en) * 2019-09-26 2020-01-17 支付宝(杭州)信息技术有限公司 Decision-making method and system and electronic equipment
CN115310510A (en) * 2022-06-23 2022-11-08 上海淇玥信息技术有限公司 Target safety identification method and device based on optimization rule decision tree and electronic equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6868413B1 (en) * 2001-05-10 2005-03-15 Networks Associates Technology, Inc. System and method for customizing and processing business logic rules in a business process system
CN101751399A (en) * 2008-12-12 2010-06-23 中国移动通信集团河北有限公司 Decision tree optimization method and optimization system
CN107808245A (en) * 2017-10-25 2018-03-16 冶金自动化研究设计院 Based on the network scheduler system for improving traditional decision-tree
CN110705622A (en) * 2019-09-26 2020-01-17 支付宝(杭州)信息技术有限公司 Decision-making method and system and electronic equipment
CN115310510A (en) * 2022-06-23 2022-11-08 上海淇玥信息技术有限公司 Target safety identification method and device based on optimization rule decision tree and electronic equipment

Also Published As

Publication number Publication date
CN115310510A (en) 2022-11-08

Similar Documents

Publication Publication Date Title
US11714968B2 (en) Identifying data of interest using machine learning
US11374953B2 (en) Hybrid machine learning to detect anomalies
WO2023246146A1 (en) Target security recognition method and apparatus based on optimization rule decision tree
US20180322411A1 (en) Automatic evaluation and validation of text mining algorithms
US9690938B1 (en) Methods and apparatus for machine learning based malware detection
US11941491B2 (en) Methods and apparatus for identifying an impact of a portion of a file on machine learning classification of malicious content
US10972482B2 (en) Automatic inline detection based on static data
US11595415B2 (en) Root cause analysis in multivariate unsupervised anomaly detection
US11620581B2 (en) Modification of machine learning model ensembles based on user feedback
US20210092160A1 (en) Data set creation with crowd-based reinforcement
CN110516697B (en) Evidence graph aggregation and reasoning based statement verification method and system
US20200067980A1 (en) Increasing security of network resources utilizing virtual honeypots
US11546380B2 (en) System and method for creation and implementation of data processing workflows using a distributed computational graph
Halder et al. Hands-On Machine Learning for Cybersecurity: Safeguard your system by making your machines intelligent using the Python ecosystem
EP4226292A1 (en) Systems and methods for tracking data shared with third parties using artificial intelligence-machine learning
US20220067579A1 (en) Dynamic ontology classification system
US20200175406A1 (en) Apparatus and methods for using bayesian program learning for efficient and reliable knowledge reasoning
US20240086736A1 (en) Fault detection and mitigation for aggregate models using artificial intelligence
CN113515625A (en) Test result classification model training method, classification method and device
US20230396641A1 (en) Adaptive system for network and security management
US20220400121A1 (en) Performance monitoring in the anomaly detection domain for the it environment
Sula Secriskai: a machine learning-based tool for cybersecurity risk assessment
CN114358024A (en) Log analysis method, apparatus, device, medium, and program product
Khan Detecting phishing attacks using nlp
CN116775889B (en) Threat information automatic extraction method, system, equipment and storage medium based on natural language processing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23825805

Country of ref document: EP

Kind code of ref document: A1