WO2021120845A1 - Homogeneous risk unit feature set generation method, apparatus and device, and medium - Google Patents

Homogeneous risk unit feature set generation method, apparatus and device, and medium Download PDF

Info

Publication number
WO2021120845A1
WO2021120845A1 PCT/CN2020/123373 CN2020123373W WO2021120845A1 WO 2021120845 A1 WO2021120845 A1 WO 2021120845A1 CN 2020123373 W CN2020123373 W CN 2020123373W WO 2021120845 A1 WO2021120845 A1 WO 2021120845A1
Authority
WO
WIPO (PCT)
Prior art keywords
risk
feature
similarity
unit
leaf node
Prior art date
Application number
PCT/CN2020/123373
Other languages
French (fr)
Chinese (zh)
Inventor
张慧南
张向阳
周杭
沈磊
Original Assignee
支付宝(杭州)信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 支付宝(杭州)信息技术有限公司 filed Critical 支付宝(杭州)信息技术有限公司
Publication of WO2021120845A1 publication Critical patent/WO2021120845A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance

Definitions

  • This application relates to the field of computer technology, and in particular to a method, device, equipment and medium for generating a feature set of homogeneous risk units.
  • the embodiments of this specification provide a method, device, equipment, and medium for generating a feature set of homogeneous risk units to solve the problem that only a single risk unit can be monitored and the risk unit cannot be identified at the same time.
  • the embodiment of this specification provides a method for generating a homogenous risk unit feature set, including: for a risk unit feature in a target risk unit feature set, input the risk unit feature into a pre-trained risk feature decision tree to determine the risk unit The leaf node to which the feature is divided, wherein the leaf node of the risk feature decision tree corresponds to the risk category; for the leaf node of the target risk category in the risk feature decision tree, a similarity network corresponding to the leaf node is constructed, wherein , Each node of the constructed similarity network corresponds to the characteristics of each risk unit divided into the leaf node, and the similarity between any two nodes in the constructed similarity network graph is between the two risk unit characteristics corresponding to the two nodes The first similarity is positively correlated, and the first similarity is the similarity between the corresponding risk feature units of the two nodes in the path vector corresponding to the leaf node; for each similarity network constructed, the preset community is used Discover the algorithm, generate the community of the similarity network, and use the generated risk
  • the embodiment of this specification also provides a homogeneous risk unit feature set generation device, including: a risk unit feature division module, which is used to input the risk unit feature to the pre-trained risk unit feature in the target risk unit feature set
  • the risk feature decision tree determines the leaf nodes to which the risk unit feature is divided, wherein the leaf nodes of the risk feature decision tree correspond to the risk category;
  • the similarity network construction module is used to determine the target risk in the risk feature decision tree Category leaf nodes, construct a similarity network corresponding to the leaf node, wherein each node of the constructed similarity network corresponds to each risk unit feature of the leaf node, and any two of the constructed similarity network graph
  • the similarity between nodes is positively correlated with the first similarity between the features of the two risk units corresponding to the two nodes, and the first similarity is that the risk feature units corresponding to the two nodes are between the path vectors corresponding to the leaf nodes.
  • the similarity degree; the homogenous risk generation module is used for each constructed similarity network, using the preset community discovery algorithm to generate the community of the similarity network, and use the generated risk corresponding to each node in each community Unit features generate a set of homogenous risk unit features corresponding to the community.
  • the embodiment of the present specification also provides a device for generating a feature set of homogeneous risk units, which includes: at least one processor; and a memory communicatively connected with the at least one processor; wherein the memory stores data that can be used by the An instruction executed by at least one processor, the instruction being executed by the at least one processor, so that the at least one processor can: for the risk unit feature in the target risk unit feature set, input the risk unit feature to the pre-
  • the trained risk feature decision tree determines the leaf node to which the risk unit feature is divided, wherein the leaf node of the risk feature decision tree corresponds to the risk category; for the leaf node of the target risk category in the risk feature decision tree, construct The similarity network corresponding to the leaf node, where each node of the constructed similarity network corresponds to the characteristics of each risk unit of the leaf node, and the similarity between any two nodes in the constructed similarity network graph is equal to The first similarity between the two risk unit features corresponding to the two nodes is positively correlated, and the first
  • the embodiments of this specification also provide a computer-readable storage medium that stores computer-executable instructions, and is characterized in that, when the computer-executable instructions are executed by a processor, the following steps are implemented:
  • the risk unit feature in the target risk unit feature set, the risk unit feature is input to the pre-trained risk feature decision tree, and the leaf node to which the risk unit feature is divided is determined, wherein the leaf node of the risk feature decision tree corresponds to Risk category; for the leaf node of the target risk category in the risk feature decision tree, construct a similarity network corresponding to the leaf node, wherein each node of the constructed similarity network corresponds to each risk of the leaf node Unit feature, the similarity between any two nodes in the constructed similarity network graph is positively correlated with the first similarity between the two risk unit features corresponding to the two nodes, and the first similarity is that the two nodes correspond to each other
  • the leaf node of each target risk category in the risk feature decision tree has a specific risk meaning. Compared with the prior art only identifying whether it is the target risk category, it can also give the specific characteristics of each risk unit. This kind of target risk and the subsequent identification of homogeneous risk units have a clear meaning of risk.
  • the leaf node of each target risk category in the risk feature decision tree has a specific risk meaning, and the characteristics of risk units divided into the same leaf node in different batches can use the same combination of features to determine homogeneous risks, that is, use the same set of criteria To determine the homogenous type, it is possible to track the risk trend change of the homogenous risk of this type in the target risk.
  • the feature components in the features of all risk units are divided into different feature branch paths, and then for the leaf nodes of different target risk categories, there are different feature combinations, that is, different "scales" to measure Its homogeneity realizes the automatic differentiation of homogenous risk feature types.
  • the community discovery algorithm is used to automatically determine the risk unit feature data of a leaf node in the target risk feature set of batch processing, which improves Flexibility in determining homogeneous risk characteristics.
  • the characteristics of risk units in batches instead of processing the characteristics of individual risk units, the specific number of risk units with target risks in the processed risk unit characteristics under various target risks can be obtained, which can be compared with the needs of different batches.
  • the characteristic sets of the processed risk units are compared vertically to obtain the trend of risk changes in order to formulate risk control measures in a timely manner.
  • Fig. 1 is a flowchart of a method for generating a feature set of homogeneous risk units provided by the first embodiment of this specification.
  • FIG. 2 is a flowchart of the training steps provided in the second embodiment of this specification.
  • Fig. 3 is a schematic diagram of the risk feature decision tree provided by the third embodiment of this specification.
  • Fig. 4 is a schematic diagram of a community provided by the fourth embodiment of this specification.
  • Fig. 5 is a flowchart of another method for generating a feature set of homogeneous risk units provided by the fifth embodiment of the present specification.
  • Fig. 6 is a schematic structural diagram of a device for generating a feature set of homogeneous risk units provided by the sixth embodiment of the present specification.
  • various financial systems and related industries for example, banking, securities, insurance, virtual currency
  • various violations for example, money laundering
  • monitoring rules usually deploy different monitoring rules from multiple dimensions, thereby realizing a single risk unit (For example, users, mobile phone numbers, vehicles, real estate) for monitoring, each period of monitoring may generate a large amount of risk warning information.
  • the current risk monitoring may have the following problems.
  • the first embodiment of this specification proposes a method for generating a feature set of homogeneous risk units.
  • FIG. 1 shows a process 100 of an embodiment of a method for generating a feature set of homogeneous risk units.
  • the method is mainly applied to an electronic device with a certain computing capability as an example.
  • the method for generating a feature set of homogeneous risk units includes steps 101-103.
  • Step 101 For the risk unit feature in the target risk unit feature set, input the risk unit feature into a pre-trained risk feature decision tree, and determine the leaf node to which the risk unit feature is divided.
  • the execution subject of the method for generating a homogeneous risk unit feature set may first obtain the target risk unit feature set.
  • the risk unit refers to the smallest unit that cannot be reasonably divided in terms of risk.
  • the risk unit can also be different accordingly.
  • the risk unit may be a life insurance user.
  • the risk unit can also be real estate.
  • the risk unit characteristics are the characteristics obtained after feature extraction of the risk unit information describing the risk unit.
  • the risk unit information may include the life insurance user's name, mobile phone number, policy number, insurance amount, insurance liability description, and so on.
  • the risk unit information can include the real estate certificate number, real estate area, real estate location, unit area valuation and so on.
  • the target risk unit feature set can be composed of any risk unit features with the same combination of features.
  • only the target range feature set is taken as an example for illustration.
  • the target risk unit feature set may be the risk unit feature set acquired in the same historical time period. In this way, the homogeneous risk unit feature set generation method can obtain the homogeneous risk feature in the historical time period.
  • the number of risk unit features in the target risk unit feature set may be at least two.
  • the specific feature extraction method may be different according to the information of the risk unit, and the feature extraction method is an existing technology widely researched and applied in the field, and will not be repeated here.
  • the above-mentioned execution subject can input the risk unit characteristics in the target risk unit characteristic set into the pre-trained risk characteristic decision tree, and determine the leaf node to which the risk unit characteristic is divided.
  • the above-mentioned execution subject may input each risk unit feature in the target risk unit feature set into the pre-trained risk feature decision tree.
  • the above-mentioned execution subject may also input part of the risk unit characteristics in the target risk unit characteristic set into the pre-trained risk characteristic decision tree.
  • the risk unit features that meet the preset conditions in the target risk unit feature set can be input into the pre-trained risk feature decision tree.
  • the risk feature decision tree is a decision tree used to characterize the corresponding relationship between the risk unit feature and the risk category, and the leaf nodes of the risk feature decision tree correspond to the risk category. That is, inputting the characteristics of the risk unit into the risk characteristic decision tree can determine which leaf node the input risk unit characteristic is divided into, and then it can be determined that the input risk unit characteristic is the risk category corresponding to the divided leaf node.
  • the risk category may be a risk category in the preset risk category set.
  • the preset risk category set may include two risk categories: "money laundering risk” and “no money laundering risk”, or preset risk categories
  • the set can also include three risk categories: "high money laundering risk”, “medium money laundering risk” and "low money laundering risk”.
  • the risk feature decision tree may be a decision tree made in advance by a technician based on the specific scenarios applied by the above homogeneous risk unit feature set generation method, based on a large number of historical risk unit features and corresponding risk categories.
  • the aforementioned risk feature decision tree may also be obtained through training in advance according to the training step 200 shown in FIG. 2. Please refer to FIG. 2, which shows a flowchart of the training steps provided in the second embodiment of this specification.
  • the flow 200 of the training steps may include steps 201 to 203.
  • the execution subject of the training step may be the same as or different from the execution subject of the method for generating the homogeneous risk unit feature set. If they are the same, the executor of the training step can store the relevant information of the trained risk feature decision tree locally after the risk feature decision tree is trained. If they are different, the execution subject of the training step can send relevant information of the trained risk feature decision tree to the execution subject of the method for generating the homogeneous risk unit feature set after the risk feature decision tree is trained.
  • Step 201 Obtain a set of reference samples.
  • the reference samples in the acquired reference sample set may include sample risk unit information and corresponding sample risk categories.
  • the sample risk category corresponding to the sample risk unit information can be manually labeled.
  • sample risk unit information and the corresponding sample risk category can also be obtained from an authoritative financial institution.
  • Step 202 Perform feature extraction on sample risk unit information in each reference sample in the reference sample information set to obtain corresponding sample features.
  • the method for feature extraction of the sample risk unit information may be the same as the method for feature extraction of the risk unit information in step 101.
  • Step 203 For the reference samples in the reference sample set, take the sample feature corresponding to the sample risk information in the reference sample as the input, and take the sample risk category in the reference sample as the expected output, train the decision tree to obtain the risk feature decision tree .
  • the risk feature decision tree trained by using the training step 200 shown in FIG. 2 is obtained by supervised training of the decision tree based on reference samples. Furthermore, by using the risk feature decision tree obtained through the above training steps to classify the risk unit features, the accuracy of risk classification can be improved.
  • Step 102 For the leaf node of the target risk category in the risk feature decision tree, construct a similarity network corresponding to the leaf node.
  • the executive body of the method for generating a homogeneous risk unit feature set can divide the risk unit features in the target risk unit feature set into different leaf nodes of the risk feature decision tree, and then compare the target risk in the risk feature decision tree.
  • the leaf node of the category constructs a similarity network corresponding to the leaf node.
  • each node of the constructed similarity network corresponds to the characteristics of each risk unit of the leaf node
  • the similarity between any two nodes in the constructed similarity network graph corresponds to the characteristics of the two risk units corresponding to the two nodes.
  • the first similarity between the two nodes is positively correlated, and the above-mentioned first similarity is the similarity between the corresponding risk feature units of the two nodes and the path vectors corresponding to the leaf nodes.
  • the similarity between any two nodes in the constructed similarity network graph may also be linearly positively correlated with the first similarity between the features of the two risk units corresponding to the two nodes.
  • the similarity between any two nodes in the constructed similarity network graph may be the first similarity between the features of two risk units corresponding to the two nodes.
  • the similarity between any two nodes in the constructed similarity network graph may also be positively non-linearly correlated with the first similarity between the features of the two risk units corresponding to the two nodes.
  • the target risk category may belong to a risk category set composed of risk categories corresponding to each leaf node of the risk feature decision tree.
  • the target risk category is used as an example for explanation, and it is not limited to the specified risk category.
  • the target risk category may be a risk category.
  • the target risk category can be “money laundering risk” "This risk category.
  • step 102 may be to construct a similarity network corresponding to the leaf node whose risk category is "money laundering risk” in the risk feature decision tree.
  • the target risk category may also include more than one risk category.
  • the target risk category can be These are two risk categories: “high money laundering risk” and "medium money laundering risk”.
  • step 102 may be to construct a similarity network corresponding to the leaf nodes corresponding to the two risk categories of "high money laundering risk” and "medium money laundering risk” in the risk feature decision tree.
  • the risk unit features in the target risk unit feature set in step 101 include seven feature components A, B, C, D, E, F, and G.
  • FIG. 3 is a schematic diagram of the risk feature decision tree provided by the third embodiment of this specification.
  • the risk feature decision tree 300 has seven split points 301, 302, 303, 304, 305, 305, and 307. These seven split points have an effect on features A, B, C, D, E, F, Different values of the seven characteristic components of G are judged.
  • the risk feature decision tree also includes 9 leaf nodes 308, 309, 310, 311, 312, 313, 314, 315, and 316. The above 9 leaf nodes correspond to risk category 1, risk category 2, risk category 3, and risk category 2 respectively. , Risk category 3, risk category 1, risk category 1, risk category 3, risk category 2.
  • each risk unit feature in the target risk unit feature set is divided into each leaf node of the risk feature decision tree.
  • Table 1 shows the division of the characteristics of risk units with serial numbers 1-100 after step 101.
  • a similarity network corresponding to the leaf node 308 can be constructed; for the leaf node 313, a similarity network corresponding to the leaf node 308 can be constructed; For the leaf node 314, a similarity network corresponding to the leaf node 314 is constructed.
  • the leaf node 308 corresponds to risk category 1
  • the leaf node 313 also corresponds to risk category 1.
  • the similarity network constructed first has 25 nodes, and the above 25 nodes Corresponding to the characteristics of each risk unit with serial numbers 39-63.
  • calculate the first similarity between the characteristics of each risk unit with serial numbers 39-63 that is, calculate the pair of risk unit characteristics among the characteristics of each risk unit with serial numbers 39-63 at the leaf node 313 pairs
  • the path vector the similarity between A, B, and F.
  • leaf nodes 308 and leaf nodes 313 can correspond to different risk meanings, and the feature combinations for measuring whether the risk unit characteristics of different risk meanings belong to a certain risk category can be different. Yes, that is, use different scales to judge whether it is a certain risk category.
  • the aforementioned first degree of similarity may be any value that is known or developed in the future to characterize the degree of similarity between vectors.
  • the above-mentioned first similarity may be the cosine similarity between the corresponding risk feature units of the two nodes and the path vectors corresponding to the leaf nodes. It should be noted that how to calculate the cosine similarity between two vectors is a well-known technique in the art, and will not be repeated here.
  • Step 103 For each constructed similarity network, use a preset community discovery algorithm to generate the community of the similarity network, and use the generated characteristics of the risk unit corresponding to each node in each community to generate the identity corresponding to the community. A collection of characteristics of qualitative risk units.
  • the executive body of the method for generating a feature set of homogeneous risk units may construct a similarity network corresponding to the leaf nodes of the target risk category in the risk feature decision tree in step 102, and then determine the similarity
  • the network first uses a preset community discovery algorithm to generate a community of the similarity network.
  • the community generated by the community discovery algorithm is a subgraph corresponding to a sub-set of nodes that are closely connected in the similarity network. That is, the feature similarity of the risk unit corresponding to each node included in each subgraph is relatively similar to each other. In other words, the characteristics of risk units corresponding to each node included in each subgraph are generally similar in type, quality, performance, value, etc. and have homogeneous risks.
  • the characteristics of risk units corresponding to each node included in each subgraph form a group Homogeneous risk unit characteristics. Therefore, the above-mentioned executive body can generate a homogeneous risk unit feature set corresponding to the community by using the risk unit characteristics corresponding to each node in each generated community after generating the community.
  • step 103 for the leaf node 308, a community corresponding to the constructed similarity network corresponding to the leaf node 308 can be generated; for the leaf node 313, a community with the constructed and The community of the similarity network corresponding to the leaf node 308; for the leaf node 314, a community corresponding to the constructed similarity network corresponding to the leaf node 314 is generated.
  • FIG. 4 is a schematic diagram of a community provided in the fourth embodiment of this specification.
  • FIG. 4 shows a schematic diagram of a community of the similarity network generated through step 103, taking the leaf node 308 as an example. It can be seen from Figure 4 that there are 15 nodes in the similarity network corresponding to the characteristics of risk units with serial numbers 1-15, and three communities are generated after step 103. Among them, the community 401 includes the corresponding serial numbers 1-5. The similarity network node corresponding to each risk unit feature, the community 402 includes the similarity network node corresponding to each risk unit feature of serial number 6-10, and the community 403 includes the similarity corresponding to each risk unit feature of serial number 11-15. Network node.
  • homogeneous risk unit feature set including the characteristics of each risk unit with serial numbers 1-5, a homogeneous risk unit feature set including the characteristics of each risk unit with serial numbers 6-10, and a feature set of each risk unit including serial numbers 11-15 Another set of characteristics of homogeneous risk units.
  • the aforementioned preset community discovery algorithm may be any known or future community discovery algorithm, which is not specifically limited in this embodiment.
  • the aforementioned preset community discovery algorithm may be a tag propagation algorithm.
  • the method provided by the above-mentioned embodiments of this specification processes the characteristics of risk units in batches, first uses a pre-trained risk characteristic decision tree to classify and screen the characteristics of risk units, and then divides them into different risk characteristic decision trees by constructing The similarity network of the characteristics of each risk unit of the leaf nodes of the target risk category, and finally use the community discovery algorithm to generate a community, and then generate a set of risk unit characteristics with homogeneous risks in the target risk category with different meanings.
  • the fifth embodiment of this specification proposes yet another method for generating a feature set of homogeneous risk units.
  • FIG. 5 shows a process 500 of another embodiment of a method for generating a feature set of homogeneous risk units.
  • the process 500 of the method for generating a feature set of homogeneous risk units includes steps 501 to 504.
  • Step 501 For the risk unit features in the target risk unit feature set, input the risk unit features into a pre-trained risk feature decision tree, and determine the leaf node to which the risk unit feature is divided.
  • Step 502 For the leaf node of the target risk category in the risk feature decision tree, construct a similarity network corresponding to the leaf node.
  • Step 503 For each constructed similarity network, use a preset community discovery algorithm to generate the community of the similarity network, and use the generated characteristics of the risk unit corresponding to each node in each community to generate the identity corresponding to the community. A collection of characteristics of qualitative risk units.
  • step 501, step 502, and step 503 are basically the same as the operations of step 101, step 102, and step 103 in the embodiment shown in FIG. 1, and will not be repeated here.
  • Step 504 for the leaf node of the target risk category in the risk feature decision tree, output at least one item of the following information: the number of nodes of the similarity network corresponding to the leaf node, and each community of the similarity network corresponding to the leaf node A collection of characteristics of homogeneous risk units.
  • step 501 to step 503 the executive body of the method for generating the homogeneous risk unit feature set has divided each risk unit feature in the risk unit feature set into different risk categories, and each risk unit feature classified into the target risk category has also been divided
  • the leaf nodes of decision trees with different risk characteristics have different meanings.
  • the characteristics of each risk unit classified into the leaf nodes of the same target risk category are also divided into different communities to form a homogeneous risk unit feature set.
  • the execution subject may output at least one of the following information: the number of nodes in the similarity network corresponding to the leaf node of the target risk category in the risk feature decision tree, and the similarity network corresponding to the leaf node The set of characteristics of homogeneous risk units corresponding to each community of.
  • step 504 if the number of nodes in the similarity network corresponding to the leaf nodes of the target risk category in the risk feature decision tree is output, the feature sets of risk units to be processed in different batches can be determined to belong to the specific same risk meaning (ie The risk scale of risk unit characteristics that are divided into the same leaf node) can be compared vertically to determine the trend of risk changes. If the homogenous risk unit feature set is output in step 504, it can be specifically specified which risk unit features in the target risk unit feature set belong to the homogeneous risk, and of course the number of risk unit features in the same homogeneous risk unit feature set can also be specified. Furthermore, it is possible to collectively monitor the characteristics of specific risk units belonging to homogeneous risks.
  • the output of the information in step 504 may be various output methods.
  • the execution body may store the information to be output locally in the execution body, or the execution body may present the information to be output in various presentation methods (for example, text, graphics, voice, etc.) in the execution body.
  • the above-mentioned executive body may also send the above-mentioned information to be output to other electronic devices connected to its network, and the above-mentioned electronic equipment may store or page the above-mentioned information to be output by the above-mentioned electronic equipment in various presentation methods (for example, , Text, graphics, voice, etc.) are presented on the display terminal of the above-mentioned electronic device.
  • the process 500 of the method for generating a feature set of homogeneous risk units in this embodiment has more steps of information output. Therefore, the solution described in this embodiment can realize information output, thereby realizing more comprehensive risk monitoring.
  • the sixth embodiment of this specification provides a device for generating a feature set of homogenous risk units, including: a risk unit feature division module 601, which is used for determining the target risk unit feature set The risk unit characteristic of the risk unit is input into the pre-trained risk characteristic decision tree, and the leaf node to which the risk unit characteristic is divided is determined, wherein the leaf node of the risk characteristic decision tree corresponds to the risk category; similarity network The construction module 602 is used for constructing a similarity network corresponding to the leaf node of the target risk category in the risk feature decision tree, wherein each node of the constructed similarity network is correspondingly divided into the leaf node The similarity between any two nodes in the constructed similarity network graph is positively correlated with the first similarity between the features of the two risk units corresponding to the two nodes, and the first similarity is the The similarity of the risk feature unit corresponding to each node in the path vector corresponding to the leaf node; the homogenous risk generation module 603
  • Figure 1 corresponds to the relevant descriptions of step 101, step 102, and step 103 in the embodiment, and will not be repeated here.
  • the risk feature decision tree may be obtained by training through the following training steps: obtaining a reference sample set, where the reference sample includes sample risk unit information and a corresponding sample risk category; The sample risk unit information in each reference sample is feature extracted to obtain the corresponding sample feature; for the reference sample in the reference sample set, the sample feature corresponding to the sample risk information in the reference sample is used as input, and the reference sample The sample risk category in is used as the expected output, and the decision tree is trained to obtain the risk feature decision tree.
  • the device may further include: an output module 604, configured to output at least one item of the following information for the leaf node of the target risk category in the risk feature decision tree: the node of the similarity network corresponding to the leaf node Number, the homogenous risk unit feature set corresponding to each community of the similarity network corresponding to the leaf node.
  • an output module 604 configured to output at least one item of the following information for the leaf node of the target risk category in the risk feature decision tree: the node of the similarity network corresponding to the leaf node Number, the homogenous risk unit feature set corresponding to each community of the similarity network corresponding to the leaf node.
  • the preset community discovery algorithm may be a label propagation algorithm and/or the first similarity may be the cosine similarity between the corresponding risk feature units of the two nodes and the path vectors corresponding to the leaf nodes.
  • the seventh embodiment of this specification provides a device for generating a feature set of homogeneous risk units, including: at least one processor; and, a memory communicatively connected with the at least one processor;
  • the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor so that the at least one processor can: For the risk unit features in the target risk unit feature set, Input the characteristics of the risk unit into a pre-trained risk characteristic decision tree, and determine the leaf nodes to which the characteristics of the risk unit are divided, wherein the leaf nodes of the risk characteristic decision tree correspond to the risk category; for the risk characteristic decision tree
  • the leaf node of the target risk category constructs a similarity network corresponding to the leaf node, where each node of the constructed similarity network corresponds to the characteristics of each risk unit of the leaf node, and the constructed similarity network graph
  • the similarity between any two nodes is positively correlated with the first similarity between the two risk unit features corresponding to the two nodes, and the first similarity
  • the eighth embodiment of this specification provides a computer-readable storage medium that stores computer-executable instructions, and is characterized in that the computer-executable instructions are executed by a processor
  • the following steps are implemented during execution: for the risk unit feature in the target risk unit feature set, input the risk unit feature to a pre-trained risk feature decision tree, and determine the leaf node to which the risk unit feature is divided, wherein the The leaf node of the risk feature decision tree corresponds to the risk category; for the leaf node of the target risk category in the risk feature decision tree, a similarity network corresponding to the leaf node is constructed, wherein each node of the constructed similarity network corresponds to each Each risk unit feature divided into the leaf node, the similarity between any two nodes in the constructed similarity network graph is positively correlated with the first similarity between the two risk unit features corresponding to the two nodes, and the first The similarity is the similarity between the corresponding risk feature units of the two nodes in the path vector corresponding to the leaf node; for each similarity
  • the apparatus, equipment, non-volatile computer readable storage medium and method provided in the embodiments of this specification correspond to each other. Therefore, the apparatus, equipment, and non-volatile computer storage medium also have beneficial technical effects similar to the corresponding method.
  • the beneficial technical effects of the method have been described in detail above, therefore, the beneficial technical effects of the corresponding device, equipment, and non-volatile computer storage medium will not be repeated here.
  • a programmable logic device for example, a Field Programmable Gate Array (Field Programmable Gate Array, FPGA)
  • PLD Programmable Logic Device
  • FPGA Field Programmable Gate Array
  • HDL Hardware Description Language
  • ABEL Advanced Boolean Expression Language
  • AHDL Altera Hardware Description Language
  • HDCal JHDL
  • Lava Lava
  • Lola MyHDL
  • PALASM RHDL
  • VHDL Very-High-Speed Integrated Circuit Hardware Description Language
  • Verilog Verilog
  • the controller can be implemented in any suitable manner.
  • the controller can take the form of, for example, a microprocessor or a processor and a computer-readable medium storing computer-readable program codes (such as software or firmware) executable by the (micro)processor. , Logic gates, switches, application specific integrated circuits (ASICs), programmable logic controllers and embedded microcontrollers. Examples of controllers include but are not limited to the following microcontrollers: ARC625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicon Labs C8051F320, the memory controller can also be implemented as part of the memory control logic.
  • controllers in addition to implementing the controller in a purely computer-readable program code manner, it is entirely possible to program the method steps to make the controller use logic gates, switches, application-specific integrated circuits, programmable logic controllers, and embedded logic.
  • the same function can be realized in the form of a microcontroller or the like. Therefore, such a controller can be regarded as a hardware component, and the devices included in it for realizing various functions can also be regarded as a structure within the hardware component. Or even, the device for realizing various functions can be regarded as both a software module for realizing the method and a structure within a hardware component.
  • a typical implementation device is a computer.
  • the computer may be, for example, a personal computer, a laptop computer, a cell phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or Any combination of these devices.
  • this application can be provided as methods, systems, or computer program products. Therefore, this application may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, this application may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
  • the device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
  • the instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • the computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • the memory may include non-permanent memory in computer readable media, random access memory (RAM) and/or non-volatile memory, such as read-only memory (ROM) or flash memory (flash RAM). Memory is an example of computer readable media.
  • RAM random access memory
  • ROM read-only memory
  • flash RAM flash memory
  • Computer-readable media include permanent and non-permanent, removable and non-removable media, and information storage can be realized by any method or technology.
  • the information can be computer-readable instructions, data structures, program modules, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disc (DVD) or other optical storage, Magnetic cartridges, magnetic tape storage or other magnetic storage devices or any other non-transmission media can be used to store information that can be accessed by computing devices. According to the definition in this article, computer-readable media does not include transitory media, such as modulated data signals and carrier waves.
  • program modules include routines, programs, objects, components, data structures, etc. that perform specific tasks or implement specific abstract data types.
  • This specification can also be practiced in distributed computing environments where tasks are performed by remote processing devices connected through a communication network.
  • program modules can be located in local and remote computer storage media including storage devices.

Abstract

Disclosed are a homogeneous risk unit feature set generation method and apparatus. A particular embodiment of the method comprises: for a risk unit feature in a target risk unit feature set, inputting the risk unit feature into a pre-trained risk feature decision tree, and determining a leaf node into which the risk unit feature is classified; for a leaf node, in the risk feature decision tree, of a target risk category, constructing a similarity network corresponding to the leaf node; and for each constructed similarity network, generating a community of the similarity network by means of a preset community detection algorithm, and generating, by means of risk unit features corresponding to all nodes in each generated community, a homogeneous risk unit feature set corresponding to the community. By means of the embodiment, the generation of a homogeneous risk unit feature set is realized.

Description

一种同质风险单位特征集合生成方法、装置、设备及介质Method, device, equipment and medium for generating feature set of homogeneous risk units 技术领域Technical field
本申请涉及计算机技术领域,尤其涉及一种同质风险单位特征集合生成方法、装置、设备及介质。This application relates to the field of computer technology, and in particular to a method, device, equipment and medium for generating a feature set of homogeneous risk units.
背景技术Background technique
当前,各金融系统(例如,银行、证券、保险、虚拟货币)以及相关行业为避免洗钱行为,通常从多个维度部署不同的监测规则,进而实现对单个风险单位进行周期性监控,每期监测可能产生大量的风险预警信息。Currently, various financial systems (for example, banking, securities, insurance, virtual currency) and related industries usually deploy different monitoring rules from multiple dimensions to avoid money laundering, so as to realize periodic monitoring of a single risk unit, and each period of monitoring A large amount of risk warning information may be generated.
发明内容Summary of the invention
有鉴于此,本说明书实施例提供了一种同质风险单位特征集合生成方法、装置、设备及介质,用于解决只能对单个风险单位进行监控,无法同时识别风险单位的问题。In view of this, the embodiments of this specification provide a method, device, equipment, and medium for generating a feature set of homogeneous risk units to solve the problem that only a single risk unit can be monitored and the risk unit cannot be identified at the same time.
本说明书实施例采用下述技术方案。The embodiments of this specification adopt the following technical solutions.
本说明书实施例提供了一种同质风险单位特征集合生成方法,包括:对于目标风险单位特征集合中的风险单位特征,将该风险单位特征输入至预先训练的风险特征决策树,确定该风险单位特征所划分到的叶子节点,其中,所述风险特征决策树的叶子节点对应风险类别;对于所述风险特征决策树中目标风险类别的叶子节点,构建与该叶子节点对应的相似度网络,其中,所构建的相似度网络的各节点分别对应划分到该叶子节点的各风险单位特征,所构建的相似度网络图中任两节点间的相似度与该两节点对应的两个风险单位特征间的第一相似度正相关,所述第一相似度为该两节点各自对应的风险特征单位在该叶子节点对应的路径向量间的相似度;对于所构建的各相似度网络,利用预设社区发现算法,生成该相似度网络的社区,以及用所生成的每个社区中各节点对应的风险单位特征生成与该社区对应的同质风险单位特征集合。The embodiment of this specification provides a method for generating a homogenous risk unit feature set, including: for a risk unit feature in a target risk unit feature set, input the risk unit feature into a pre-trained risk feature decision tree to determine the risk unit The leaf node to which the feature is divided, wherein the leaf node of the risk feature decision tree corresponds to the risk category; for the leaf node of the target risk category in the risk feature decision tree, a similarity network corresponding to the leaf node is constructed, wherein , Each node of the constructed similarity network corresponds to the characteristics of each risk unit divided into the leaf node, and the similarity between any two nodes in the constructed similarity network graph is between the two risk unit characteristics corresponding to the two nodes The first similarity is positively correlated, and the first similarity is the similarity between the corresponding risk feature units of the two nodes in the path vector corresponding to the leaf node; for each similarity network constructed, the preset community is used Discover the algorithm, generate the community of the similarity network, and use the generated risk unit features corresponding to each node in each community to generate a homogeneous risk unit feature set corresponding to the community.
本说明书实施例还提供了一种同质风险单位特征集合生成装置,包括:风险单位特征划分模块,用于对于目标风险单位特征集合中的风险单位特征,将该风险单位特征输入至预先训练的风险特征决策树,确定该风险单位特征所划分到的叶子节点,其中,所述风险特征决策树的叶子节点对应风险类别;相似度网络构建模块,用于对于所述风 险特征决策树中目标风险类别的叶子节点,构建与该叶子节点对应的相似度网络,其中,所构建的相似度网络的各节点分别对应划分到该叶子节点的各风险单位特征,所构建的相似度网络图中任两节点间的相似度与该两节点对应的两个风险单位特征间的第一相似度正相关,所述第一相似度为该两节点各自对应的风险特征单位在该叶子节点对应的路径向量间的相似度;同质风险生成模块,用于对于所构建的各相似度网络,利用预设社区发现算法,生成该相似度网络的社区,以及用所生成的每个社区中各节点对应的风险单位特征生成与该社区对应的同质风险单位特征集合。The embodiment of this specification also provides a homogeneous risk unit feature set generation device, including: a risk unit feature division module, which is used to input the risk unit feature to the pre-trained risk unit feature in the target risk unit feature set The risk feature decision tree determines the leaf nodes to which the risk unit feature is divided, wherein the leaf nodes of the risk feature decision tree correspond to the risk category; the similarity network construction module is used to determine the target risk in the risk feature decision tree Category leaf nodes, construct a similarity network corresponding to the leaf node, wherein each node of the constructed similarity network corresponds to each risk unit feature of the leaf node, and any two of the constructed similarity network graph The similarity between nodes is positively correlated with the first similarity between the features of the two risk units corresponding to the two nodes, and the first similarity is that the risk feature units corresponding to the two nodes are between the path vectors corresponding to the leaf nodes. The similarity degree; the homogenous risk generation module is used for each constructed similarity network, using the preset community discovery algorithm to generate the community of the similarity network, and use the generated risk corresponding to each node in each community Unit features generate a set of homogenous risk unit features corresponding to the community.
本说明书实施例还提供了一种同质风险单位特征集合生成设备,包括:至少一个处理器;以及,与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够:对于目标风险单位特征集合中的风险单位特征,将该风险单位特征输入至预先训练的风险特征决策树,确定该风险单位特征所划分到的叶子节点,其中,所述风险特征决策树的叶子节点对应风险类别;对于所述风险特征决策树中目标风险类别的叶子节点,构建与该叶子节点对应的相似度网络,其中,所构建的相似度网络的各节点分别对应划分到该叶子节点的各风险单位特征,所构建的相似度网络图中任两节点间的相似度与该两节点对应的两个风险单位特征间的第一相似度正相关,所述第一相似度为该两节点各自对应的风险特征单位在该叶子节点对应的路径向量间的相似度;对于所构建的各相似度网络,利用预设社区发现算法,生成该相似度网络的社区,以及用所生成的每个社区中各节点对应的风险单位特征生成与该社区对应的同质风险单位特征集合。The embodiment of the present specification also provides a device for generating a feature set of homogeneous risk units, which includes: at least one processor; and a memory communicatively connected with the at least one processor; wherein the memory stores data that can be used by the An instruction executed by at least one processor, the instruction being executed by the at least one processor, so that the at least one processor can: for the risk unit feature in the target risk unit feature set, input the risk unit feature to the pre- The trained risk feature decision tree determines the leaf node to which the risk unit feature is divided, wherein the leaf node of the risk feature decision tree corresponds to the risk category; for the leaf node of the target risk category in the risk feature decision tree, construct The similarity network corresponding to the leaf node, where each node of the constructed similarity network corresponds to the characteristics of each risk unit of the leaf node, and the similarity between any two nodes in the constructed similarity network graph is equal to The first similarity between the two risk unit features corresponding to the two nodes is positively correlated, and the first similarity is the similarity between the respective risk feature units corresponding to the two nodes in the path vector corresponding to the leaf node; The constructed similarity network uses the preset community discovery algorithm to generate the community of the similarity network, and uses the risk unit characteristics corresponding to each node in each community to generate a homogeneous risk unit feature set corresponding to the community .
本说明书实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可执行指令,其特征在于,所述计算机可执行指令被处理器执行时实现如下的步骤:对于目标风险单位特征集合中的风险单位特征,将该风险单位特征输入至预先训练的风险特征决策树,确定该风险单位特征所划分到的叶子节点,其中,所述风险特征决策树的叶子节点对应风险类别;对于所述风险特征决策树中目标风险类别的叶子节点,构建与该叶子节点对应的相似度网络,其中,所构建的相似度网络的各节点分别对应划分到该叶子节点的各风险单位特征,所构建的相似度网络图中任两节点间的相似度与该两节点对应的两个风险单位特征间的第一相似度正相关,所述第一相似度为该两节点各自对应的风险特征单位在该叶子节点对应的路径向量间的相似度;对于所构建的各相似度网络,利用预设社区发现算法,生成该相似度网络的社区,以及用所生成的每个社区中各节点对应的风险单位特征生成与该社区对应的同质风险单位特征集合。The embodiments of this specification also provide a computer-readable storage medium that stores computer-executable instructions, and is characterized in that, when the computer-executable instructions are executed by a processor, the following steps are implemented: The risk unit feature in the target risk unit feature set, the risk unit feature is input to the pre-trained risk feature decision tree, and the leaf node to which the risk unit feature is divided is determined, wherein the leaf node of the risk feature decision tree corresponds to Risk category; for the leaf node of the target risk category in the risk feature decision tree, construct a similarity network corresponding to the leaf node, wherein each node of the constructed similarity network corresponds to each risk of the leaf node Unit feature, the similarity between any two nodes in the constructed similarity network graph is positively correlated with the first similarity between the two risk unit features corresponding to the two nodes, and the first similarity is that the two nodes correspond to each other The similarity of the risk feature unit in the path vector corresponding to the leaf node; for each similarity network constructed, use the preset community discovery algorithm to generate the community of the similarity network, and use each generated community The risk unit characteristics corresponding to each node generate a homogeneous risk unit characteristic set corresponding to the community.
本说明书实施例采用的上述至少一个技术方案能够达到包括但不限于以下有益效果。The above at least one technical solution adopted in the embodiments of this specification can achieve the following beneficial effects including but not limited to.
第一,风险特征决策树中的每个目标风险类别的叶子节点具有特定风险含义,相对于现有技术仅识别是否是目标风险类别而言,还可以给出每个风险单位特征具体为哪一种目标风险,以及使得后续的同质风险单位识别有了明确的风险含义。First, the leaf node of each target risk category in the risk feature decision tree has a specific risk meaning. Compared with the prior art only identifying whether it is the target risk category, it can also give the specific characteristics of each risk unit. This kind of target risk and the subsequent identification of homogeneous risk units have a clear meaning of risk.
第二,风险特征决策树中的每个目标风险类别的叶子节点具有特定风险含义,不同批次划分到同一叶子节点的风险单位特征可以使用同样的特征组合确定同质风险,即使用同一套标准去确定同质类型,故可以跟踪目标风险中该类型同质风险的风险趋势变化。Second, the leaf node of each target risk category in the risk feature decision tree has a specific risk meaning, and the characteristics of risk units divided into the same leaf node in different batches can use the same combination of features to determine homogeneous risks, that is, use the same set of criteria To determine the homogenous type, it is possible to track the risk trend change of the homogenous risk of this type in the target risk.
第三,通过风险特征决策树将所有风险单位特征中的特征分量划分成不同的特征分支路径,进而针对不同的目标风险类别的叶子节点,有不同的特征组合,即不同的“标尺”去衡量其同质性,实现同质风险特征类型的自动区分。Third, through the risk feature decision tree, the feature components in the features of all risk units are divided into different feature branch paths, and then for the leaf nodes of different target risk categories, there are different feature combinations, that is, different "scales" to measure Its homogeneity realizes the automatic differentiation of homogenous risk feature types.
第四,不需要预先确定某一叶子节点下同质风险特征的种类数目,而是采用社区发现算法根据批量处理的目标风险特征集合中划分到某叶子节点的风险单位特征数据自动确定,提高了同质风险特征确定的灵活性。Fourth, there is no need to pre-determine the number of homogenous risk features under a leaf node, but the community discovery algorithm is used to automatically determine the risk unit feature data of a leaf node in the target risk feature set of batch processing, which improves Flexibility in determining homogeneous risk characteristics.
第五,通过批量处理风险单位特征,而不是处理单个风险单位特征,可以获得所处理的风险单位特征中存在目标风险的风险单位在各种目标风险下的具体数量,进而可以和不同批次需要处理的风险单位特征集合进行纵向比较,获得风险变化趋势,以便及时制定风险控制措施。Fifth, by processing the characteristics of risk units in batches, instead of processing the characteristics of individual risk units, the specific number of risk units with target risks in the processed risk unit characteristics under various target risks can be obtained, which can be compared with the needs of different batches. The characteristic sets of the processed risk units are compared vertically to obtain the trend of risk changes in order to formulate risk control measures in a timely manner.
附图说明Description of the drawings
为了更清楚地说明本说明书实施例中的技术方案,下面将对本说明书实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本说明书中记载的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly describe the technical solutions in the embodiments of this specification, the following will briefly introduce the drawings that need to be used in the description of the embodiments of this specification. Obviously, the drawings in the following description are only some of the descriptions in this specification. Embodiments, for those of ordinary skill in the art, without creative work, other drawings can be obtained based on these drawings.
图1是本说明书第一个实施例提供的一种同质风险单位特征集合生成方法的流程图。Fig. 1 is a flowchart of a method for generating a feature set of homogeneous risk units provided by the first embodiment of this specification.
图2是本说明书的第二个实施例提供的训练步骤的流程图。Figure 2 is a flowchart of the training steps provided in the second embodiment of this specification.
图3是本说明书的第三个实施例提供的风险特征决策树的示意图。Fig. 3 is a schematic diagram of the risk feature decision tree provided by the third embodiment of this specification.
图4是本说明书的第四个实施例提供的社区的示意图。Fig. 4 is a schematic diagram of a community provided by the fourth embodiment of this specification.
图5是本说明书的第五个实施例提供的又一种同质风险单位特征集合生成方法的流程图。Fig. 5 is a flowchart of another method for generating a feature set of homogeneous risk units provided by the fifth embodiment of the present specification.
图6是本说明书第六个实施例提供的一种同质风险单位特征集合生成装置的结构示意图。Fig. 6 is a schematic structural diagram of a device for generating a feature set of homogeneous risk units provided by the sixth embodiment of the present specification.
具体实施方式Detailed ways
当前,各金融系统以及相关行业(例如,银行、证券、保险、虚拟货币)为避免各种违规行为(例如,洗钱行为),通常从多个维度部署不同的监测规则,进而实现对单个风险单位(例如,用户、手机号码、车辆、房产)进行监控,每期监测可能产生大量的风险预警信息。At present, various financial systems and related industries (for example, banking, securities, insurance, virtual currency), in order to avoid various violations (for example, money laundering), usually deploy different monitoring rules from multiple dimensions, thereby realizing a single risk unit (For example, users, mobile phone numbers, vehicles, real estate) for monitoring, each period of monitoring may generate a large amount of risk warning information.
然而,目前的风险监测可能存在以下问题。However, the current risk monitoring may have the following problems.
首先,由于是对单个风险单位进行监控,监控效率较低。First of all, because the monitoring of a single risk unit is performed, the monitoring efficiency is low.
其次,由于是对单个风险单位进行监控,而没有纵向比较,容易只见树木不见森林,风险感知灵敏度不高,当风险出现新变化时或某类风险激增时,不能及时的捕获风险变化,风险较大时才突然发现风险的存在,不容易提前控制风险。Secondly, because it monitors a single risk unit and does not have a vertical comparison, it is easy to see the trees but not the forest, and the sensitivity of risk perception is not high. When there is a new change in risk or a certain type of risk increases, the risk change cannot be captured in time. Risks are discovered suddenly when they are relatively large, and it is not easy to control risks in advance.
因此,需要一种方法能识别其中的同质风险,进行集中处理,既能加快风险响应效率,同时又能了解各种风险的规模及发展趋势,进而及时的制定针对性的风险控制措施,对风险管理意义较为重大。Therefore, a method is needed to identify the homogenous risks and conduct centralized processing, which can speed up the efficiency of risk response, and at the same time understand the scale and development trend of various risks, and then formulate targeted risk control measures in a timely manner. Risk management is of great significance.
为了使本技术领域的人员更好地理解本说明书中的技术方案,下面将结合本说明书实施例中的附图,对本说明书实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本说明书实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本申请保护的范围。In order to enable those skilled in the art to better understand the technical solutions in this specification, the following will clearly and completely describe the technical solutions in the embodiments of this specification in conjunction with the drawings in the embodiments of this specification. Obviously, the described The embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Based on the embodiments of this specification, all other embodiments obtained by a person of ordinary skill in the art without creative work shall fall within the protection scope of this application.
以下结合附图,详细说明本说明书各实施例提供的技术方案。The technical solutions provided by the embodiments of this specification are described in detail below with reference to the accompanying drawings.
本说明书的第一个实施例提出了一种同质风险单位特征集合生成方法。The first embodiment of this specification proposes a method for generating a feature set of homogeneous risk units.
请参考图1,其示出了同质风险单位特征集合生成的方法的一个实施例的流程100。本实施例主要以该方法应用于有一定运算能力的电子设备中来举例说明。该同质风险单 位特征集合生成方法,包括步骤101~103。Please refer to FIG. 1, which shows a process 100 of an embodiment of a method for generating a feature set of homogeneous risk units. In this embodiment, the method is mainly applied to an electronic device with a certain computing capability as an example. The method for generating a feature set of homogeneous risk units includes steps 101-103.
步骤101,对于目标风险单位特征集合中的风险单位特征,将该风险单位特征输入至预先训练的风险特征决策树,确定该风险单位特征所划分到的叶子节点。Step 101: For the risk unit feature in the target risk unit feature set, input the risk unit feature into a pre-trained risk feature decision tree, and determine the leaf node to which the risk unit feature is divided.
在本实施例中,同质风险单位特征集合生成方法的执行主体可以首先获取目标风险单位特征集合。In this embodiment, the execution subject of the method for generating a homogeneous risk unit feature set may first obtain the target risk unit feature set.
这里,风险单位是指在风险上不可能再合理分割的最小单位。根据风险的不同,风险单位也可以相应不同。作为示例,当风险是指人寿保险中的洗钱风险时,风险单位可以是人寿保险用户。当风险是指财产保险中的洗钱风险时,风险单位也可以是房产。而风险单位特征则是对风险单位进行描述的风险单位信息进行特征提取后所得到的特征。例如,当风险单位为人寿保险用户时,风险单位信息可以包括人寿保险用户的姓名、手机号、保单号码、保险金额、保险责任描述等等。当风险单位是某房产时,风险单位信息可以包括该房产的房产证编号、房产面积、房产位置、单位面积估价等等。Here, the risk unit refers to the smallest unit that cannot be reasonably divided in terms of risk. Depending on the risk, the risk unit can also be different accordingly. As an example, when the risk refers to the money laundering risk in life insurance, the risk unit may be a life insurance user. When the risk refers to the money laundering risk in property insurance, the risk unit can also be real estate. The risk unit characteristics are the characteristics obtained after feature extraction of the risk unit information describing the risk unit. For example, when the risk unit is a life insurance user, the risk unit information may include the life insurance user's name, mobile phone number, policy number, insurance amount, insurance liability description, and so on. When the risk unit is a certain real estate, the risk unit information can include the real estate certificate number, real estate area, real estate location, unit area valuation and so on.
需要说明的是,目标风险单位特征集合可以是由任意具有相同特征组合的风险单位特征组成的,这里仅以目标范围特征集合为例进行说明。实践中,目标风险单位特征集合可以是同一历史时间段内所获取的风险单位特征集合,这样,通过该同质风险单位特征集合生成方法可以获得该历史时间段内的同质风险特征。另外,目标风险单位特征集合中风险单位特征的数目可以为至少两个。It should be noted that the target risk unit feature set can be composed of any risk unit features with the same combination of features. Here, only the target range feature set is taken as an example for illustration. In practice, the target risk unit feature set may be the risk unit feature set acquired in the same historical time period. In this way, the homogeneous risk unit feature set generation method can obtain the homogeneous risk feature in the historical time period. In addition, the number of risk unit features in the target risk unit feature set may be at least two.
另外需要说明的是,根据风险单位信息的不同,其具体的特征提取方法也可以不同,且特征提取方法是本领域广泛研究和应用的现有技术,在此不再赘述。In addition, it should be noted that the specific feature extraction method may be different according to the information of the risk unit, and the feature extraction method is an existing technology widely researched and applied in the field, and will not be repeated here.
然后,上述执行主体可以对于目标风险单位特征集合中的风险单位特征,将该风险单位特征输入至预先训练的风险特征决策树,确定该风险单位特征所划分到的叶子节点。Then, the above-mentioned execution subject can input the risk unit characteristics in the target risk unit characteristic set into the pre-trained risk characteristic decision tree, and determine the leaf node to which the risk unit characteristic is divided.
这里,上述执行主体可以将目标风险单位特征集合中的每个风险单位特征输入至预先训练的风险特征决策树。上述执行主体也可以将目标风险单位特征集合中的部分风险单位特征输入至预先训练的风险特征决策树。例如,可以将目标风险单位特征集合中的满足预设条件的风险单位特征输入至预先训练的风险特征决策树。Here, the above-mentioned execution subject may input each risk unit feature in the target risk unit feature set into the pre-trained risk feature decision tree. The above-mentioned execution subject may also input part of the risk unit characteristics in the target risk unit characteristic set into the pre-trained risk characteristic decision tree. For example, the risk unit features that meet the preset conditions in the target risk unit feature set can be input into the pre-trained risk feature decision tree.
在本实施例中,风险特征决策树是用于表征风险单位特征和风险类别之间的对应关系的决策树,风险特征决策树的叶子节点对应风险类别。即,将风险单位特征输入风险特征决策树,可以确定所输入的风险单位特征划分到哪个叶子节点,进而可以确定所 输入的风险单位特征是所划分到的叶子节点对应的风险类别。In this embodiment, the risk feature decision tree is a decision tree used to characterize the corresponding relationship between the risk unit feature and the risk category, and the leaf nodes of the risk feature decision tree correspond to the risk category. That is, inputting the characteristics of the risk unit into the risk characteristic decision tree can determine which leaf node the input risk unit characteristic is divided into, and then it can be determined that the input risk unit characteristic is the risk category corresponding to the divided leaf node.
在本实施例中,风险类别可以是预设风险类别集合中的风险类别。例如,当上述同质风险单位特征集合生成方法应用到反洗钱场景中时,预设风险类别集合可以包括“存在洗钱风险”和“不存在洗钱风险”这两个风险类别,或者预设风险类别集合也可以包括“高洗钱风险”、“中洗钱风险”和“低洗钱风险”这三个风险类别。In this embodiment, the risk category may be a risk category in the preset risk category set. For example, when the above-mentioned homogenous risk unit feature set generation method is applied to an anti-money laundering scenario, the preset risk category set may include two risk categories: "money laundering risk" and "no money laundering risk", or preset risk categories The set can also include three risk categories: "high money laundering risk", "medium money laundering risk" and "low money laundering risk".
作为示例,风险特征决策树可以是技术人员根据上述同质风险单位特征集合生成方法所应用的具体场景,基于大量的历史风险单位特征和对应的风险类别而预先制定的决策树。As an example, the risk feature decision tree may be a decision tree made in advance by a technician based on the specific scenarios applied by the above homogeneous risk unit feature set generation method, based on a large number of historical risk unit features and corresponding risk categories.
在一些可选实现方式中,上述风险特征决策树也可以是预先按照如图2所示的训练步骤200训练得到的。请参考图2,其示出了本说明书的第二个实施例提供的训练步骤的流程图,该训练步骤的流程200可以包括步骤201~203。In some optional implementation manners, the aforementioned risk feature decision tree may also be obtained through training in advance according to the training step 200 shown in FIG. 2. Please refer to FIG. 2, which shows a flowchart of the training steps provided in the second embodiment of this specification. The flow 200 of the training steps may include steps 201 to 203.
这里,训练步骤的执行主体可以与同质风险单位特征集合生成的方法的执行主体相同或者不同。如果相同,则训练步骤的执行主体可以在训练得到风险特征决策树后将训练好的风险特征决策树的相关信息存储在本地。如果不同,则训练步骤的执行主体可以在训练得到风险特征决策树后将训练好的风险特征决策树的相关信息发送给同质风险单位特征集合生成的方法的执行主体。Here, the execution subject of the training step may be the same as or different from the execution subject of the method for generating the homogeneous risk unit feature set. If they are the same, the executor of the training step can store the relevant information of the trained risk feature decision tree locally after the risk feature decision tree is trained. If they are different, the execution subject of the training step can send relevant information of the trained risk feature decision tree to the execution subject of the method for generating the homogeneous risk unit feature set after the risk feature decision tree is trained.
步骤201,获取参考样本集合。Step 201: Obtain a set of reference samples.
这里,所获取的参考样本集合中的参考样本可以包括样本风险单位信息和对应的样本风险类别。Here, the reference samples in the acquired reference sample set may include sample risk unit information and corresponding sample risk categories.
在一些可选实现方式中,可以人工对样本风险单位信息的标注对应的样本风险类别。In some optional implementations, the sample risk category corresponding to the sample risk unit information can be manually labeled.
在一些可选实现方式中,也可以从权威金融机构获取样本风险单位信息以及对应的样本风险类别。In some optional implementations, the sample risk unit information and the corresponding sample risk category can also be obtained from an authoritative financial institution.
步骤202,对参考样本信息集合中的每个参考样本中样本风险单位信息进行特征提取,得到对应的样本特征。Step 202: Perform feature extraction on sample risk unit information in each reference sample in the reference sample information set to obtain corresponding sample features.
需要说明的是,这里对样本风险单位信息进行特征提取的方法可以与步骤101中对风险单位信息进行特征提取的方法相同。It should be noted that the method for feature extraction of the sample risk unit information here may be the same as the method for feature extraction of the risk unit information in step 101.
步骤203,对于参考样本集合中的参考样本,以该参考样本中的样本风险信息对应 的样本特征作为输入,以该参考样本中的样本风险类别作为期望输出,训练决策树,得到风险特征决策树。Step 203: For the reference samples in the reference sample set, take the sample feature corresponding to the sample risk information in the reference sample as the input, and take the sample risk category in the reference sample as the expected output, train the decision tree to obtain the risk feature decision tree .
需要说明的是,如何训练决策树是目前广泛研究和应用的现有技术,在此不再赘述。It needs to be explained that how to train a decision tree is an existing technology that has been extensively researched and applied, and will not be repeated here.
采用如图2所示的训练步骤200训练得到的风险特征决策树,是基于参考样本对决策树进行有监督训练得到的。进而,采用上述经过训练步骤得到的风险特征决策树对风险单位特征进行风险分类,可以提高风险分类的准确率。The risk feature decision tree trained by using the training step 200 shown in FIG. 2 is obtained by supervised training of the decision tree based on reference samples. Furthermore, by using the risk feature decision tree obtained through the above training steps to classify the risk unit features, the accuracy of risk classification can be improved.
步骤102,对于风险特征决策树中目标风险类别的叶子节点,构建与该叶子节点对应的相似度网络。Step 102: For the leaf node of the target risk category in the risk feature decision tree, construct a similarity network corresponding to the leaf node.
在本实施例中,同质风险单位特征集合生成方法的执行主体可以在将目标风险单位特征集合中的风险单位特征划分到风险特征决策树的不同叶子节点后,对于风险特征决策树中目标风险类别的叶子节点,构建与该叶子节点对应的相似度网络。其中,所构建的相似度网络的各节点分别对应划分到该叶子节点的各风险单位特征,所构建的相似度网络图中任两节点间的相似度与该两节点对应的两个风险单位特征间的第一相似度正相关,上述第一相似度为该两节点各自对应的风险特征单位在该叶子节点对应的路径向量间的相似度。In this embodiment, the executive body of the method for generating a homogeneous risk unit feature set can divide the risk unit features in the target risk unit feature set into different leaf nodes of the risk feature decision tree, and then compare the target risk in the risk feature decision tree. The leaf node of the category constructs a similarity network corresponding to the leaf node. Among them, each node of the constructed similarity network corresponds to the characteristics of each risk unit of the leaf node, and the similarity between any two nodes in the constructed similarity network graph corresponds to the characteristics of the two risk units corresponding to the two nodes. The first similarity between the two nodes is positively correlated, and the above-mentioned first similarity is the similarity between the corresponding risk feature units of the two nodes and the path vectors corresponding to the leaf nodes.
在本实施例中,可以采用各种实现方式实现所构建的相似度网络图中任两节点间的相似度与该两节点对应的两个风险单位特征间的第一相似度正相关。In this embodiment, various implementations can be adopted to realize that the similarity between any two nodes in the constructed similarity network graph is positively correlated with the first similarity between the features of the two risk units corresponding to the two nodes.
例如,所构建的相似度网络图中任两节点间的相似度还可以与该两节点对应的两个风险单位特征间的第一相似度线性正相关。比如,所构建的相似度网络图中任两节点间的相似度可以是该两节点对应的两个风险单位特征间的第一相似度。For example, the similarity between any two nodes in the constructed similarity network graph may also be linearly positively correlated with the first similarity between the features of the two risk units corresponding to the two nodes. For example, the similarity between any two nodes in the constructed similarity network graph may be the first similarity between the features of two risk units corresponding to the two nodes.
例如,所构建的相似度网络图中任两节点间的相似度还可以与该两节点对应的两个风险单位特征间的第一相似度非线性正相关。For example, the similarity between any two nodes in the constructed similarity network graph may also be positively non-linearly correlated with the first similarity between the features of the two risk units corresponding to the two nodes.
在本实施例中,目标风险类别可以属于风险特征决策树的各个叶子节点所对应的风险类别所组成的风险类别集合。这里,仅以目标风险类别为例进行说明,并不限定于指定的风险类别。In this embodiment, the target risk category may belong to a risk category set composed of risk categories corresponding to each leaf node of the risk feature decision tree. Here, only the target risk category is used as an example for explanation, and it is not limited to the specified risk category.
在一些可选实现方式中,目标风险类别可以是一个风险类别。例如,当风险特征决策树的各个叶子节点所对应的风险类别所组成的风险类别集合包括“存在洗钱风险”和“不存在洗钱风险”这两个风险类别,目标风险类别可以是“存在洗钱风险”这个风 险类别。那么步骤102就可以是对于风险特征决策树中对应的风险类别为“存在洗钱风险”的叶子节点,构建与该叶子节点对应的相似度网络。In some alternative implementations, the target risk category may be a risk category. For example, when the risk category set composed of the risk categories corresponding to each leaf node of the risk feature decision tree includes two risk categories: “money laundering risk” and “no money laundering risk”, the target risk category can be “money laundering risk” "This risk category. Then, step 102 may be to construct a similarity network corresponding to the leaf node whose risk category is "money laundering risk" in the risk feature decision tree.
在一些可选实现方式中,目标风险类别也可以包括多于一个的风险类别。例如,当风险特征决策树的各个叶子节点所对应的风险类别所组成的风险类别集合包括“高洗钱风险”、“中洗钱风险”和“低洗钱风险”这三个风险类别,目标风险类别可以是“高洗钱风险”和“中洗钱风险”这两个风险类别。那么步骤102就可以是对于风险特征决策树中对应的风险类别为“高洗钱风险”和“中洗钱风险”这两种风险类别的叶子节点,构建与该叶子节点对应的相似度网络。In some alternative implementations, the target risk category may also include more than one risk category. For example, when the risk category set of the risk categories corresponding to each leaf node of the risk feature decision tree includes the three risk categories of "high money laundering risk", "medium money laundering risk" and "low money laundering risk", the target risk category can be These are two risk categories: "high money laundering risk" and "medium money laundering risk". Then, step 102 may be to construct a similarity network corresponding to the leaf nodes corresponding to the two risk categories of "high money laundering risk" and "medium money laundering risk" in the risk feature decision tree.
为便于理解上述相似度网络,下面举例说明。In order to facilitate the understanding of the above-mentioned similarity network, an example is given below.
例如,步骤101中的目标风险单位特征集合中的风险单位特征包括A、B、C、D、E、F、G七个特征分量。风险特征决策树的树结构请参考图3。图3是是本说明书的第三个实施例提供的风险特征决策树的示意图。For example, the risk unit features in the target risk unit feature set in step 101 include seven feature components A, B, C, D, E, F, and G. Refer to Figure 3 for the tree structure of the risk feature decision tree. Fig. 3 is a schematic diagram of the risk feature decision tree provided by the third embodiment of this specification.
如图3所示,风险特征决策树300具有7个分裂点301、302、303、304、305、305和307,这七个分裂点分别对特征A、B、C、D、E、F、G七个特征分量的不同取值进行判断。风险特征决策树还包括9个叶子节点308、309、310、311、312、313、314、315、316,上述9个叶子节点分别对应风险类别1、风险类别2、风险类别3、风险类别2、风险类别3、风险类别1、风险类别1、风险类别3、风险类别2。As shown in Figure 3, the risk feature decision tree 300 has seven split points 301, 302, 303, 304, 305, 305, and 307. These seven split points have an effect on features A, B, C, D, E, F, Different values of the seven characteristic components of G are judged. The risk feature decision tree also includes 9 leaf nodes 308, 309, 310, 311, 312, 313, 314, 315, and 316. The above 9 leaf nodes correspond to risk category 1, risk category 2, risk category 3, and risk category 2 respectively. , Risk category 3, risk category 1, risk category 1, risk category 3, risk category 2.
假设,目标风险单位特征集合中有100个风险单位特征,我们可以以序号1-100来标识这100个风险单位特征。经过步骤101,目标风险单位特征集合中的各风险单位特征被划分到风险特征决策树的各叶子节点。为清晰起见,请参考表1,其示出了序号为1-100的风险单位特征在经过步骤101后的划分情况。Assuming that there are 100 risk unit characteristics in the target risk unit characteristic set, we can identify these 100 risk unit characteristics with serial numbers 1-100. After step 101, each risk unit feature in the target risk unit feature set is divided into each leaf node of the risk feature decision tree. For clarity, please refer to Table 1, which shows the division of the characteristics of risk units with serial numbers 1-100 after step 101.
风险单位特征序号Characteristic number of risk unit 叶子结点Leaf node 风险类别Risk category 风险单位特征数量Number of risk unit characteristics
1-151-15 308308 11 1515
16-2016-20 309309 22 55
21-2921-29 310310 33 99
30-3730-37 311311 22 88
3838 312312 33 11
39-6339-63 313313 11 2525
64-8564-85 314314 11 22twenty two
86-9386-93 315315 33 88
94-10094-100 316316 22 77
表1Table 1
假设目标风险类别是风险类别1,那么,在步骤102中可以对于叶子节点308,构建与该叶子节点308对应的相似度网络;对于叶子节点313,构建与该叶子节点308对应的相似度网络;对于叶子节点314,构建与该叶子节点314对应的相似度网络。Assuming that the target risk category is risk category 1, then in step 102, for the leaf node 308, a similarity network corresponding to the leaf node 308 can be constructed; for the leaf node 313, a similarity network corresponding to the leaf node 308 can be constructed; For the leaf node 314, a similarity network corresponding to the leaf node 314 is constructed.
从表1中可以看出,在构建与叶子节点308对应的相似度网络时,首先所构建的相似度网络有15个节点,并且上述15个节点分别对应序号为1-15的各风险单位特征。其次,计算序号为1-15的各风险单位特征中两两风险单位特征间的第一相似度,即,计算序号为1-15的各风险单位特征中两两风险单位特征在叶子节点308对的路径向量:A,B,D间的相似度。换言之,在计算序号为1-15的各风险单位特征中两两风险单位特征间的第一相似度时,只需考虑A,B,D这三个特征分量即可。It can be seen from Table 1 that when constructing the similarity network corresponding to the leaf node 308, the similarity network constructed first has 15 nodes, and the above 15 nodes correspond to the characteristics of each risk unit with serial numbers 1-15. . Secondly, calculate the first similarity between the characteristics of each risk unit with serial numbers 1-15, that is, calculate the first similarity between the characteristics of each risk unit with serial numbers 1-15 at the leaf node 308 The path vector: the similarity between A, B, and D. In other words, when calculating the first similarity between the characteristics of the two risk units among the characteristics of each risk unit with serial numbers 1-15, only the three characteristic components A, B, and D need to be considered.
另外,叶子节点308对应风险类别1,而叶子节点313也对应风险类别1,在构建与叶子节点313对应的相似度网络时,首先所构建的相似度网络有25个节点,并且上述25个节点分别对应序号为39-63的各风险单位特征。其次,计算序号为39-63的各风险单位特征中两两风险单位特征间的第一相似度,即,计算序号为39-63的各风险单位特征中两两风险单位特征在叶子节点313对的路径向量:A,B,F间的相似度。换言之,在计算序号为39-63的各风险单位特征中两两风险单位特征间的第一相似度时,只需考虑A,B,F这三个特征分量即可。In addition, the leaf node 308 corresponds to risk category 1, and the leaf node 313 also corresponds to risk category 1. When constructing the similarity network corresponding to the leaf node 313, the similarity network constructed first has 25 nodes, and the above 25 nodes Corresponding to the characteristics of each risk unit with serial numbers 39-63. Secondly, calculate the first similarity between the characteristics of each risk unit with serial numbers 39-63, that is, calculate the pair of risk unit characteristics among the characteristics of each risk unit with serial numbers 39-63 at the leaf node 313 pairs The path vector: the similarity between A, B, and F. In other words, when calculating the first similarity between the characteristics of each risk unit with serial numbers 39-63, only the three characteristic components of A, B, and F need to be considered.
从上述举例可以看出,对于同样风险类别1,不同的叶子节点308和叶子节点313可以对应不同的风险含义,且衡量不同风险含义的风险单位特征是否属于某一风险类别的特征组合可以是不同的,即用不同的标尺去判断是否是某一风险类别。It can be seen from the above example that for the same risk category 1, different leaf nodes 308 and leaf nodes 313 can correspond to different risk meanings, and the feature combinations for measuring whether the risk unit characteristics of different risk meanings belong to a certain risk category can be different. Yes, that is, use different scales to judge whether it is a certain risk category.
在本实施例中,上述第一相似度可以是任何已知或者将来开发的用于表征向量间的相似程度的值。In this embodiment, the aforementioned first degree of similarity may be any value that is known or developed in the future to characterize the degree of similarity between vectors.
在一些可选的实现方式中,上述第一相似度可以是两节点各自对应的风险特征单位在该叶子节点对应的路径向量间的余弦相似度。需要说明的是,如何计算两向量间的余弦相似度是本领域的公知技术,在此不再赘述。In some optional implementation manners, the above-mentioned first similarity may be the cosine similarity between the corresponding risk feature units of the two nodes and the path vectors corresponding to the leaf nodes. It should be noted that how to calculate the cosine similarity between two vectors is a well-known technique in the art, and will not be repeated here.
步骤103,对于所构建的各相似度网络,利用预设社区发现算法,生成该相似度网络的社区,以及用所生成的每个社区中各节点对应的风险单位特征生成与该社区对应的同质风险单位特征集合。Step 103: For each constructed similarity network, use a preset community discovery algorithm to generate the community of the similarity network, and use the generated characteristics of the risk unit corresponding to each node in each community to generate the identity corresponding to the community. A collection of characteristics of qualitative risk units.
在本实施例中,同质风险单位特征集合生成方法的执行主体可以在步骤102中构建了与风险特征决策树中目标风险类别的叶子节点对应的相似度网络后,对于所构建的 各相似度网络,首先利用预设社区发现算法,生成该相似度网络的社区。利用社区发现算法生成的社区是相似度网络中连接比较紧密的节点子集合对应的子图。即,每个子图所包括的各个节点对应的风险单位特征相似度互相之间相似度较高。换言之每个子图所包括的各个节点对应的风险单位特征在种类、品质、性能、价值等方面大体相近而具有同质风险,则每个子图所包括的各个节点对应的风险单位特征组成了一组同质风险单位特征。因此,上述执行主体可以在生成社区之后,用所生成的每个社区中各节点对应的风险单位特征生成与该社区对应的同质风险单位特征集合。In this embodiment, the executive body of the method for generating a feature set of homogeneous risk units may construct a similarity network corresponding to the leaf nodes of the target risk category in the risk feature decision tree in step 102, and then determine the similarity The network first uses a preset community discovery algorithm to generate a community of the similarity network. The community generated by the community discovery algorithm is a subgraph corresponding to a sub-set of nodes that are closely connected in the similarity network. That is, the feature similarity of the risk unit corresponding to each node included in each subgraph is relatively similar to each other. In other words, the characteristics of risk units corresponding to each node included in each subgraph are generally similar in type, quality, performance, value, etc. and have homogeneous risks. Then the characteristics of risk units corresponding to each node included in each subgraph form a group Homogeneous risk unit characteristics. Therefore, the above-mentioned executive body can generate a homogeneous risk unit feature set corresponding to the community by using the risk unit characteristics corresponding to each node in each generated community after generating the community.
为便于理解,下面举例说明。这里,继续沿用上述举例,请参考图3和表1。假设目标风险类别是风险类别1,那么,在步骤103中可以对于叶子节点308,生成与所构建的与该叶子节点308对应的相似度网络的社区;对于叶子节点313,生成与所构建的与该叶子节点308对应的相似度网络的社区;对于叶子节点314,生成与所构建的与该叶子节点314对应的相似度网络的社区。For ease of understanding, examples are given below. Here, continue to use the above example, please refer to Figure 3 and Table 1. Assuming that the target risk category is risk category 1, then in step 103, for the leaf node 308, a community corresponding to the constructed similarity network corresponding to the leaf node 308 can be generated; for the leaf node 313, a community with the constructed and The community of the similarity network corresponding to the leaf node 308; for the leaf node 314, a community corresponding to the constructed similarity network corresponding to the leaf node 314 is generated.
请参考图4,图4是本说明书的第四个实施例提供的社区的示意图。图4示出了以叶子节点308为例,经过步骤103所生成相似度网络的社区的示意图。从图4中可看出,相似度网络中有15个节点分别与序号为1-15的风险单位特征对应,经过步骤103生成了三个社区,其中,社区401包括分别对应序号1-5的各风险单位特征对应的相似度网络节点,社区402包括分别对应序号6-10的各风险单位特征对应的相似度网络节点,社区403包括分别对应序号11-15的各风险单位特征对应的相似度网络节点。以及生成了包括序号1-5的各风险单位特征的同质风险单位特征集合,包括序号6-10的各风险单位特征的又一同质风险单位特征集合和包括序号11-15的各风险单位特征的另一同质风险单位特征集合。Please refer to FIG. 4, which is a schematic diagram of a community provided in the fourth embodiment of this specification. FIG. 4 shows a schematic diagram of a community of the similarity network generated through step 103, taking the leaf node 308 as an example. It can be seen from Figure 4 that there are 15 nodes in the similarity network corresponding to the characteristics of risk units with serial numbers 1-15, and three communities are generated after step 103. Among them, the community 401 includes the corresponding serial numbers 1-5. The similarity network node corresponding to each risk unit feature, the community 402 includes the similarity network node corresponding to each risk unit feature of serial number 6-10, and the community 403 includes the similarity corresponding to each risk unit feature of serial number 11-15. Network node. And generated a homogeneous risk unit feature set including the characteristics of each risk unit with serial numbers 1-5, a homogeneous risk unit feature set including the characteristics of each risk unit with serial numbers 6-10, and a feature set of each risk unit including serial numbers 11-15 Another set of characteristics of homogeneous risk units.
在本实施例中,上述预设社区发现算法可以是任何已知或者未来将开发的社区发现算法,本实施例对此不做具体限定。In this embodiment, the aforementioned preset community discovery algorithm may be any known or future community discovery algorithm, which is not specifically limited in this embodiment.
在一些可选的实现方式中,上述预设社区发现算法可以是标签传播算法。In some optional implementation manners, the aforementioned preset community discovery algorithm may be a tag propagation algorithm.
本说明书的上述实施例提供的方法通过对风险单位特征批量处理,先利用预先训练的风险特征决策树对风险单位特征进行风险分类以及进行特征筛选,再通过构建划分到风险特征决策树中不同的目标风险类别的叶子节点的各风险单位特征的相似度网络,最后利用社区发现算法生成社区,进而生成具有不同含义的目标风险类别中具有同质风险的风险单位特征集合。The method provided by the above-mentioned embodiments of this specification processes the characteristics of risk units in batches, first uses a pre-trained risk characteristic decision tree to classify and screen the characteristics of risk units, and then divides them into different risk characteristic decision trees by constructing The similarity network of the characteristics of each risk unit of the leaf nodes of the target risk category, and finally use the community discovery algorithm to generate a community, and then generate a set of risk unit characteristics with homogeneous risks in the target risk category with different meanings.
本说明书的第五个实施例提出了又一种同质风险单位特征集合生成方法。The fifth embodiment of this specification proposes yet another method for generating a feature set of homogeneous risk units.
进一步参考图5,其示出了同质风险单位特征集合生成方法的又一个实施例的流程500。该同质风险单位特征集合生成方法的流程500,包括步骤501~步骤504。With further reference to FIG. 5, it shows a process 500 of another embodiment of a method for generating a feature set of homogeneous risk units. The process 500 of the method for generating a feature set of homogeneous risk units includes steps 501 to 504.
步骤501,对于目标风险单位特征集合中的风险单位特征,将该风险单位特征输入至预先训练的风险特征决策树,确定该风险单位特征所划分到的叶子节点。Step 501: For the risk unit features in the target risk unit feature set, input the risk unit features into a pre-trained risk feature decision tree, and determine the leaf node to which the risk unit feature is divided.
步骤502,对于风险特征决策树中目标风险类别的叶子节点,构建与该叶子节点对应的相似度网络。Step 502: For the leaf node of the target risk category in the risk feature decision tree, construct a similarity network corresponding to the leaf node.
步骤503,对于所构建的各相似度网络,利用预设社区发现算法,生成该相似度网络的社区,以及用所生成的每个社区中各节点对应的风险单位特征生成与该社区对应的同质风险单位特征集合。Step 503: For each constructed similarity network, use a preset community discovery algorithm to generate the community of the similarity network, and use the generated characteristics of the risk unit corresponding to each node in each community to generate the identity corresponding to the community. A collection of characteristics of qualitative risk units.
在本实施例中,步骤501、步骤502和步骤503的具体操作与图1所示的实施例中步骤101、步骤102和步骤103的操作基本相同,在此不再赘述。In this embodiment, the specific operations of step 501, step 502, and step 503 are basically the same as the operations of step 101, step 102, and step 103 in the embodiment shown in FIG. 1, and will not be repeated here.
步骤504,对于风险特征决策树中目标风险类别的叶子节点,输出至少一项以下信息:与该叶子节点对应的相似度网络的节点数,与该叶子节点对应的相似度网络的每个社区对应的同质风险单位特征集合。 Step 504, for the leaf node of the target risk category in the risk feature decision tree, output at least one item of the following information: the number of nodes of the similarity network corresponding to the leaf node, and each community of the similarity network corresponding to the leaf node A collection of characteristics of homogeneous risk units.
经过步骤501到步骤503,同质风险单位特征集合生成方法的执行主体已经将风险单位特征集合中各风险单位特征划分到不同的风险类别,并且划分到目标风险类别的各风险单位特征还被划分到不同的风险特征决策树的叶子节点,即具有了不同的含义。另外,划分到同一目标风险类别的叶子节点的各风险单位特征还被划分到不同的社区形成了同质风险单位特征集合。After step 501 to step 503, the executive body of the method for generating the homogeneous risk unit feature set has divided each risk unit feature in the risk unit feature set into different risk categories, and each risk unit feature classified into the target risk category has also been divided The leaf nodes of decision trees with different risk characteristics have different meanings. In addition, the characteristics of each risk unit classified into the leaf nodes of the same target risk category are also divided into different communities to form a homogeneous risk unit feature set.
那么,这里在步骤504中,上述执行主体可以输出至少一项以下信息:与上述风险特征决策树中目标风险类别的叶子节点对应的相似度网络的节点数,与该叶子节点对应的相似度网络的每个社区对应的同质风险单位特征集合。Then, in step 504, the execution subject may output at least one of the following information: the number of nodes in the similarity network corresponding to the leaf node of the target risk category in the risk feature decision tree, and the similarity network corresponding to the leaf node The set of characteristics of homogeneous risk units corresponding to each community of.
换言之,步骤504中如果输出风险特征决策树中目标风险类别的叶子节点对应的相似度网络的节点数,可以实现对于不同批次的待处理的风险单位特征集合,确定属于具体同一风险含义(即被划分到同一叶子节点)的风险单位特征的风险规模,实现纵向比较,确定风险变化趋势。步骤504中如果输出同质风险单位特征集合,可以具体明确目标风险单位特征集合中哪些风险单位特征属于同质风险,当然也可以具体明确属于同一同质风险单位特征集合中风险单位特征的数量,进而可以对对属于同质风险的具体风 险单位特征进行集体监控。In other words, in step 504, if the number of nodes in the similarity network corresponding to the leaf nodes of the target risk category in the risk feature decision tree is output, the feature sets of risk units to be processed in different batches can be determined to belong to the specific same risk meaning (ie The risk scale of risk unit characteristics that are divided into the same leaf node) can be compared vertically to determine the trend of risk changes. If the homogenous risk unit feature set is output in step 504, it can be specifically specified which risk unit features in the target risk unit feature set belong to the homogeneous risk, and of course the number of risk unit features in the same homogeneous risk unit feature set can also be specified. Furthermore, it is possible to collectively monitor the characteristics of specific risk units belonging to homogeneous risks.
这里,步骤504中对信息进行输出可以是各种输出方式。例如,上述执行主体可以将上述待输出的信息存储在上述执行主体本地,或者上述执行主体可以将上述待输出的信息以各种呈现方式(例如,文字、图表、语音等形式)呈现在上述执行主体的显示终端。又例如,上述执行主体也可以将上述待输出的信息发送给与其网络连接的其他电子设备,并由上述电子设备存储或者页可以由上述电子设备将上述待输出的信息以各种呈现方式(例如,文字、图表、语音等形式)呈现在上述电子设备的显示终端。Here, the output of the information in step 504 may be various output methods. For example, the execution body may store the information to be output locally in the execution body, or the execution body may present the information to be output in various presentation methods (for example, text, graphics, voice, etc.) in the execution body. The main body's display terminal. For another example, the above-mentioned executive body may also send the above-mentioned information to be output to other electronic devices connected to its network, and the above-mentioned electronic equipment may store or page the above-mentioned information to be output by the above-mentioned electronic equipment in various presentation methods (for example, , Text, graphics, voice, etc.) are presented on the display terminal of the above-mentioned electronic device.
从图5中可以看出,与图1对应的实施例相比,本实施例中的同质风险单位特征集合生成方法的流程500多出了信息输出的步骤。由此,本实施例描述的方案可以实现信息输出,进而实现更全面的风险监控。It can be seen from FIG. 5 that, compared with the embodiment corresponding to FIG. 1, the process 500 of the method for generating a feature set of homogeneous risk units in this embodiment has more steps of information output. Therefore, the solution described in this embodiment can realize information output, thereby realizing more comprehensive risk monitoring.
基于同样的思路,如图6所示,本说明书的第六个实施例提供了一种同质风险单位特征集合生成装置,包括:风险单位特征划分模块601,用于对于目标风险单位特征集合中的风险单位特征,将该风险单位特征输入至预先训练的风险特征决策树,确定该风险单位特征所划分到的叶子节点,其中,所述风险特征决策树的叶子节点对应风险类别;相似度网络构建模块602,用于对于所述风险特征决策树中目标风险类别的叶子节点,构建与该叶子节点对应的相似度网络,其中,所构建的相似度网络的各节点分别对应划分到该叶子节点的各风险单位特征,所构建的相似度网络图中任两节点间的相似度与该两节点对应的两个风险单位特征间的第一相似度正相关,所述第一相似度为该两节点各自对应的风险特征单位在该叶子节点对应的路径向量间的相似度;同质风险生成模块603,用于对于所构建的各相似度网络,利用预设社区发现算法,生成该相似度网络的社区,以及用所生成的每个社区中各节点对应的风险单位特征生成与该社区对应的同质风险单位特征集合。Based on the same idea, as shown in FIG. 6, the sixth embodiment of this specification provides a device for generating a feature set of homogenous risk units, including: a risk unit feature division module 601, which is used for determining the target risk unit feature set The risk unit characteristic of the risk unit is input into the pre-trained risk characteristic decision tree, and the leaf node to which the risk unit characteristic is divided is determined, wherein the leaf node of the risk characteristic decision tree corresponds to the risk category; similarity network The construction module 602 is used for constructing a similarity network corresponding to the leaf node of the target risk category in the risk feature decision tree, wherein each node of the constructed similarity network is correspondingly divided into the leaf node The similarity between any two nodes in the constructed similarity network graph is positively correlated with the first similarity between the features of the two risk units corresponding to the two nodes, and the first similarity is the The similarity of the risk feature unit corresponding to each node in the path vector corresponding to the leaf node; the homogenous risk generation module 603 is used to generate the similarity network for each similarity network constructed by using a preset community discovery algorithm Community, and use the generated risk unit characteristics corresponding to each node in each community to generate a homogeneous risk unit characteristic set corresponding to the community.
在本实施例中,同质风险单位特征集合生成装置600的风险单位特征划分模块601、相似度网络构建模块602和同质风险生成模块603的具体处理及其所带来的技术效果可分别参考图1对应实施例中步骤101、步骤102和步骤103的相关说明,在此不再赘述。In this embodiment, the specific processing of the risk unit feature division module 601, the similarity network construction module 602, and the homogeneous risk generation module 603 of the homogeneous risk unit feature set generation device 600 and the technical effects brought by them can be referred to respectively. Figure 1 corresponds to the relevant descriptions of step 101, step 102, and step 103 in the embodiment, and will not be repeated here.
可选的,所述风险特征决策树可以是通过如下训练步骤训练得到的:获取参考样本集合,其中,参考样本包括样本风险单位信息和对应的样本风险类别;对所述参考样本信息集合中的每个参考样本中样本风险单位信息进行特征提取,得到对应的样本特征;对于所述参考样本集合中的参考样本,以该参考样本中的样本风险信息对应的样本特征作为输入,以该参考样本中的样本风险类别作为期望输出,训练决策树,得到所述风险 特征决策树。Optionally, the risk feature decision tree may be obtained by training through the following training steps: obtaining a reference sample set, where the reference sample includes sample risk unit information and a corresponding sample risk category; The sample risk unit information in each reference sample is feature extracted to obtain the corresponding sample feature; for the reference sample in the reference sample set, the sample feature corresponding to the sample risk information in the reference sample is used as input, and the reference sample The sample risk category in is used as the expected output, and the decision tree is trained to obtain the risk feature decision tree.
可选的,所述装置还可以包括:输出模块604,用于对于所述风险特征决策树中目标风险类别的叶子节点,输出至少一项以下信息:与该叶子节点对应的相似度网络的节点数,与该叶子节点对应的相似度网络的每个社区对应的同质风险单位特征集合。Optionally, the device may further include: an output module 604, configured to output at least one item of the following information for the leaf node of the target risk category in the risk feature decision tree: the node of the similarity network corresponding to the leaf node Number, the homogenous risk unit feature set corresponding to each community of the similarity network corresponding to the leaf node.
可选的,所述预设社区发现算法可以为标签传播算法和/或所述第一相似度可以为该两节点各自对应的风险特征单位在该叶子节点对应的路径向量间的余弦相似度。Optionally, the preset community discovery algorithm may be a label propagation algorithm and/or the first similarity may be the cosine similarity between the corresponding risk feature units of the two nodes and the path vectors corresponding to the leaf nodes.
需要说明的是,本说明书实施例提供的同质风险单位特征集合生成装置中各模块的实现细节和技术效果可以参考本说明书中其它实施例的说明,在此不再赘述。It should be noted that the implementation details and technical effects of each module in the homogenous risk unit feature set generation device provided in the embodiment of this specification can refer to the description of other embodiments in this specification, and will not be repeated here.
基于同样的思路,本说明书的第七个实施例提供了一种同质风险单位特征集合生成设备,包括:至少一个处理器;以及,与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够:对于目标风险单位特征集合中的风险单位特征,将该风险单位特征输入至预先训练的风险特征决策树,确定该风险单位特征所划分到的叶子节点,其中,所述风险特征决策树的叶子节点对应风险类别;对于所述风险特征决策树中目标风险类别的叶子节点,构建与该叶子节点对应的相似度网络,其中,所构建的相似度网络的各节点分别对应划分到该叶子节点的各风险单位特征,所构建的相似度网络图中任两节点间的相似度与该两节点对应的两个风险单位特征间的第一相似度正相关,所述第一相似度为该两节点各自对应的风险特征单位在该叶子节点对应的路径向量间的相似度;对于所构建的各相似度网络,利用预设社区发现算法,生成该相似度网络的社区,以及用所生成的每个社区中各节点对应的风险单位特征生成与该社区对应的同质风险单位特征集合。Based on the same idea, the seventh embodiment of this specification provides a device for generating a feature set of homogeneous risk units, including: at least one processor; and, a memory communicatively connected with the at least one processor; The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor so that the at least one processor can: For the risk unit features in the target risk unit feature set, Input the characteristics of the risk unit into a pre-trained risk characteristic decision tree, and determine the leaf nodes to which the characteristics of the risk unit are divided, wherein the leaf nodes of the risk characteristic decision tree correspond to the risk category; for the risk characteristic decision tree The leaf node of the target risk category constructs a similarity network corresponding to the leaf node, where each node of the constructed similarity network corresponds to the characteristics of each risk unit of the leaf node, and the constructed similarity network graph The similarity between any two nodes is positively correlated with the first similarity between the two risk unit features corresponding to the two nodes, and the first similarity is the path corresponding to the risk feature unit of the two nodes in the leaf node. The similarity between vectors; for each constructed similarity network, use the preset community discovery algorithm to generate the community of the similarity network, and use the generated characteristics of the risk unit corresponding to each node in each community to generate the community Corresponding set of characteristics of homogeneous risk units.
基于同样的思路,本说明书的第八个实施例提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可执行指令,其特征在于,所述计算机可执行指令被处理器执行时实现如下的步骤:对于目标风险单位特征集合中的风险单位特征,将该风险单位特征输入至预先训练的风险特征决策树,确定该风险单位特征所划分到的叶子节点,其中,所述风险特征决策树的叶子节点对应风险类别;对于所述风险特征决策树中目标风险类别的叶子节点,构建与该叶子节点对应的相似度网络,其中,所构建的相似度网络的各节点分别对应划分到该叶子节点的各风险单位特征,所构建的相似度网络图中任两节点间的相似度与该两节点对应的两个风险单位特征间的第一相似度正相关,所述第一相似度为该两节点各自对应的风险特征单位在该叶子节点对应的路径向量间 的相似度;对于所构建的各相似度网络,利用预设社区发现算法,生成该相似度网络的社区,以及用所生成的每个社区中各节点对应的风险单位特征生成与该社区对应的同质风险单位特征集合。Based on the same idea, the eighth embodiment of this specification provides a computer-readable storage medium that stores computer-executable instructions, and is characterized in that the computer-executable instructions are executed by a processor The following steps are implemented during execution: for the risk unit feature in the target risk unit feature set, input the risk unit feature to a pre-trained risk feature decision tree, and determine the leaf node to which the risk unit feature is divided, wherein the The leaf node of the risk feature decision tree corresponds to the risk category; for the leaf node of the target risk category in the risk feature decision tree, a similarity network corresponding to the leaf node is constructed, wherein each node of the constructed similarity network corresponds to each Each risk unit feature divided into the leaf node, the similarity between any two nodes in the constructed similarity network graph is positively correlated with the first similarity between the two risk unit features corresponding to the two nodes, and the first The similarity is the similarity between the corresponding risk feature units of the two nodes in the path vector corresponding to the leaf node; for each similarity network constructed, a preset community discovery algorithm is used to generate the community of the similarity network, and Use the generated risk unit characteristics corresponding to each node in each community to generate a homogeneous risk unit characteristic set corresponding to the community.
上述对本说明书特定实施例进行了描述,其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,附图中描绘的过程不一定必须按照示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。The specific embodiments of this specification have been described above, and other embodiments are within the scope of the appended claims. In some cases, the actions or steps described in the claims can be performed in a different order than in the embodiments and still achieve desired results. In addition, the processes depicted in the drawings do not necessarily have to be in the specific order or sequential order shown in order to achieve the desired result. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
本说明书实施例提供的装置、设备、非易失性计算机可读存储介质与方法是对应的,因此,装置、设备、非易失性计算机存储介质也具有与对应方法类似的有益技术效果,由于上面已经对方法的有益技术效果进行了详细说明,因此,这里不再赘述对应装置、设备、非易失性计算机存储介质的有益技术效果。The apparatus, equipment, non-volatile computer readable storage medium and method provided in the embodiments of this specification correspond to each other. Therefore, the apparatus, equipment, and non-volatile computer storage medium also have beneficial technical effects similar to the corresponding method. The beneficial technical effects of the method have been described in detail above, therefore, the beneficial technical effects of the corresponding device, equipment, and non-volatile computer storage medium will not be repeated here.
在20世纪90年代,对于一个技术的改进可以很明显地区分是硬件上的改进(例如,对二极管、晶体管、开关等电路结构的改进)还是软件上的改进(对于方法流程的改进)。然而,随着技术的发展,当今的很多方法流程的改进已经可以视为硬件电路结构的直接改进。设计人员几乎都通过将改进的方法流程编程到硬件电路中来得到相应的硬件电路结构。因此,不能说一个方法流程的改进就不能用硬件实体模块来实现。例如,可编程逻辑器件(Programmable Logic Device,PLD)(例如现场可编程门阵列(Field Programmable Gate Array,FPGA))就是这样一种集成电路,其逻辑功能由用户对器件编程来确定。由设计人员自行编程来把一个数字系统“集成”在一片PLD上,而不需要请芯片制造厂商来设计和制作专用的集成电路芯片。而且,如今,取代手工地制作集成电路芯片,这种编程也多半改用“逻辑编译器(logic compiler)”软件来实现,它与程序开发撰写时所用的软件编译器相类似,而要编译之前的原始代码也得用特定的编程语言来撰写,此称之为硬件描述语言(Hardware Description Language,HDL),而HDL也并非仅有一种,而是有许多种,如ABEL(Advanced Boolean Expression Language)、AHDL(Altera Hardware Description Language)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL(Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby Hardware Description Language)等,目前最普遍使用的是VHDL(Very-High-Speed Integrated Circuit Hardware Description Language)与Verilog。本领域技术人员也应该清楚,只需要将方法流程用上述几种硬件描述语言 稍作逻辑编程并编程到集成电路中,就可以很容易得到实现该逻辑方法流程的硬件电路。In the 1990s, the improvement of a technology can be clearly distinguished between hardware improvements (for example, improvements in circuit structures such as diodes, transistors, switches, etc.) or software improvements (improvements in method flow). However, with the development of technology, the improvement of many methods and procedures of today can be regarded as a direct improvement of the hardware circuit structure. Designers almost always get the corresponding hardware circuit structure by programming the improved method flow into the hardware circuit. Therefore, it cannot be said that the improvement of a method flow cannot be realized by hardware entity modules. For example, a programmable logic device (Programmable Logic Device, PLD) (for example, a Field Programmable Gate Array (Field Programmable Gate Array, FPGA)) is such an integrated circuit whose logic function is determined by the user's programming of the device. It is programmed by the designer to "integrate" a digital system on a PLD without requiring the chip manufacturer to design and manufacture a dedicated integrated circuit chip. Moreover, nowadays, instead of manually making integrated circuit chips, this kind of programming is mostly realized by using "logic compiler" software, which is similar to the software compiler used in program development and writing, but before compilation The original code must also be written in a specific programming language, which is called Hardware Description Language (HDL), and there is not only one type of HDL, but many types, such as ABEL (Advanced Boolean Expression Language) , AHDL (Altera Hardware Description Language), Confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), Lava, Lola, MyHDL, PALASM, RHDL (Ruby Hardware Description), etc., currently most commonly used It is VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog. It should also be clear to those skilled in the art that only a little bit of logic programming of the method flow in the above-mentioned hardware description languages and programming into an integrated circuit, the hardware circuit that implements the logic method flow can be easily obtained.
控制器可以按任何适当的方式实现,例如,控制器可以采取例如微处理器或处理器以及存储可由该(微)处理器执行的计算机可读程序代码(例如软件或固件)的计算机可读介质、逻辑门、开关、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程逻辑控制器和嵌入微控制器的形式,控制器的例子包括但不限于以下微控制器:ARC 625D、Atmel AT91SAM、Microchip PIC18F26K20以及Silicone Labs C8051F320,存储器控制器还可以被实现为存储器的控制逻辑的一部分。本领域技术人员也知道,除了以纯计算机可读程序代码方式实现控制器以外,完全可以通过将方法步骤进行逻辑编程来使得控制器以逻辑门、开关、专用集成电路、可编程逻辑控制器和嵌入微控制器等的形式来实现相同功能。因此这种控制器可以被认为是一种硬件部件,而对其内包括的用于实现各种功能的装置也可以视为硬件部件内的结构。或者甚至,可以将用于实现各种功能的装置视为既可以是实现方法的软件模块又可以是硬件部件内的结构。The controller can be implemented in any suitable manner. For example, the controller can take the form of, for example, a microprocessor or a processor and a computer-readable medium storing computer-readable program codes (such as software or firmware) executable by the (micro)processor. , Logic gates, switches, application specific integrated circuits (ASICs), programmable logic controllers and embedded microcontrollers. Examples of controllers include but are not limited to the following microcontrollers: ARC625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicon Labs C8051F320, the memory controller can also be implemented as part of the memory control logic. Those skilled in the art also know that, in addition to implementing the controller in a purely computer-readable program code manner, it is entirely possible to program the method steps to make the controller use logic gates, switches, application-specific integrated circuits, programmable logic controllers, and embedded logic. The same function can be realized in the form of a microcontroller or the like. Therefore, such a controller can be regarded as a hardware component, and the devices included in it for realizing various functions can also be regarded as a structure within the hardware component. Or even, the device for realizing various functions can be regarded as both a software module for realizing the method and a structure within a hardware component.
上述实施例阐明的系统、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机。具体的,计算机例如可以为个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任何设备的组合。The systems, devices, modules, or units illustrated in the above embodiments may be specifically implemented by computer chips or entities, or implemented by products with certain functions. A typical implementation device is a computer. Specifically, the computer may be, for example, a personal computer, a laptop computer, a cell phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or Any combination of these devices.
为了描述的方便,描述以上装置时以功能分为各种单元分别描述。当然,在实施本说明书的实施例时可以把各单元的功能在同一个或多个软件和/或硬件中实现。For the convenience of description, when describing the above device, the functions are divided into various units and described separately. Of course, when implementing the embodiments of this specification, the functions of each unit may be implemented in the same or multiple software and/or hardware.
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present application can be provided as methods, systems, or computer program products. Therefore, this application may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, this application may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框 中指定的功能的装置。This application is described with reference to flowcharts and/or block diagrams of methods, devices (systems), and computer program products according to embodiments of this application. It should be understood that each process and/or block in the flowchart and/or block diagram, and the combination of processes and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, an embedded processor, or other programmable data processing equipment to generate a machine, so that the instructions executed by the processor of the computer or other programmable data processing equipment are generated It is a device that realizes the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device. The device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment. The instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。In a typical configuration, the computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。The memory may include non-permanent memory in computer readable media, random access memory (RAM) and/or non-volatile memory, such as read-only memory (ROM) or flash memory (flash RAM). Memory is an example of computer readable media.
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带式磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。Computer-readable media include permanent and non-permanent, removable and non-removable media, and information storage can be realized by any method or technology. The information can be computer-readable instructions, data structures, program modules, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disc (DVD) or other optical storage, Magnetic cartridges, magnetic tape storage or other magnetic storage devices or any other non-transmission media can be used to store information that can be accessed by computing devices. According to the definition in this article, computer-readable media does not include transitory media, such as modulated data signals and carrier waves.
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。It should also be noted that the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, commodity or equipment including a series of elements not only includes those elements, but also includes Other elements that are not explicitly listed, or also include elements inherent to such processes, methods, commodities, or equipment. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, method, commodity, or equipment that includes the element.
本说明书可以在由计算机执行的计算机可执行指令的一般上下文中描述,例如程 序模块。一般地,程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本说明书,在这些分布式计算环境中,由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中,程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。This specification may be described in the general context of computer-executable instructions executed by a computer, such as a program module. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform specific tasks or implement specific abstract data types. This specification can also be practiced in distributed computing environments where tasks are performed by remote processing devices connected through a communication network. In a distributed computing environment, program modules can be located in local and remote computer storage media including storage devices.
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于系统实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。The various embodiments in this specification are described in a progressive manner, and the same or similar parts between the various embodiments can be referred to each other, and each embodiment focuses on the difference from other embodiments. In particular, as for the system embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for related parts, please refer to the part of the description of the method embodiment.
以上所述仅为本说明书实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。The above descriptions are only examples of this specification, and are not intended to limit this application. For those skilled in the art, this application can have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this application shall be included in the scope of the claims of this application.

Claims (10)

  1. 一种同质风险单位特征集合生成方法,包括:A method for generating a feature set of homogeneous risk units, including:
    对于目标风险单位特征集合中的风险单位特征,将该风险单位特征输入至预先训练的风险特征决策树,确定该风险单位特征所划分到的叶子节点,其中,所述风险特征决策树的叶子节点对应风险类别;For the risk unit feature in the target risk unit feature set, input the risk unit feature into a pre-trained risk feature decision tree, and determine the leaf node to which the risk unit feature is divided, wherein the leaf node of the risk feature decision tree Corresponding risk category;
    对于所述风险特征决策树中目标风险类别的叶子节点,构建与该叶子节点对应的相似度网络,其中,所构建的相似度网络的各节点分别对应划分到该叶子节点的各风险单位特征,所构建的相似度网络图中任两节点间的相似度与该两节点各自对应的风险单位特征间的第一相似度正相关,所述第一相似度为该两节点各自对应的风险特征单位在该叶子节点对应的路径向量间的相似度;For the leaf node of the target risk category in the risk feature decision tree, construct a similarity network corresponding to the leaf node, wherein each node of the constructed similarity network corresponds to each risk unit feature of the leaf node, The similarity between any two nodes in the constructed similarity network graph is positively correlated with the first similarity between the corresponding risk unit features of the two nodes, and the first similarity is the risk feature unit corresponding to the two nodes. The similarity between the path vectors corresponding to the leaf node;
    对于所构建的各相似度网络,利用预设社区发现算法,生成该相似度网络的社区,以及用所生成的每个社区中各节点对应的风险单位特征生成与该社区对应的同质风险单位特征集合。For each similarity network constructed, use the preset community discovery algorithm to generate the community of the similarity network, and use the generated risk unit characteristics corresponding to each node in each community to generate a homogeneous risk unit corresponding to the community Feature collection.
  2. 如权利要求1所述的方法,所述风险特征决策树是通过如下训练步骤训练得到的:The method according to claim 1, wherein the risk feature decision tree is obtained by training through the following training steps:
    获取参考样本集合,其中,参考样本包括样本风险单位信息和对应的样本风险类别;Obtain a reference sample set, where the reference sample includes sample risk unit information and corresponding sample risk category;
    对所述参考样本信息集合中的每个参考样本中样本风险单位信息进行特征提取,得到对应的样本特征;Performing feature extraction on sample risk unit information in each reference sample in the reference sample information set to obtain corresponding sample features;
    对于所述参考样本集合中的参考样本,以该参考样本中的样本风险信息对应的样本特征作为输入,以该参考样本中的样本风险类别作为期望输出,训练决策树,得到所述风险特征决策树。For the reference samples in the reference sample set, the sample feature corresponding to the sample risk information in the reference sample is used as input, and the sample risk category in the reference sample is used as the expected output, and the decision tree is trained to obtain the risk feature decision tree.
  3. 如权利要求2所述的方法,所述方法还包括:The method of claim 2, further comprising:
    对于所述风险特征决策树中目标风险类别的叶子节点,输出至少一项以下信息:与该叶子节点对应的相似度网络的节点数,与该叶子节点对应的相似度网络的每个社区对应的同质风险单位特征集合。For the leaf node of the target risk category in the risk feature decision tree, output at least one item of the following information: the number of nodes in the similarity network corresponding to the leaf node, and the number of nodes in the similarity network corresponding to the leaf node. A collection of characteristics of homogeneous risk units.
  4. 如权利要求1-3中任一所述的方法,所述预设社区发现算法为标签传播算法和/或所述第一相似度为该两节点各自对应的风险特征单位在该叶子节点对应的路径向量间的余弦相似度。The method according to any one of claims 1 to 3, wherein the preset community discovery algorithm is a label propagation algorithm and/or the first similarity is the risk characteristic unit corresponding to each of the two nodes corresponding to the leaf node The cosine similarity between path vectors.
  5. 一种同质风险单位特征集合生成装置,包括:A device for generating a feature set of homogeneous risk units, including:
    风险单位特征划分模块,用于对于目标风险单位特征集合中的风险单位特征,将该风险单位特征输入至预先训练的风险特征决策树,确定该风险单位特征所划分到的叶子 节点,其中,所述风险特征决策树的叶子节点对应风险类别;The risk unit feature division module is used to input the risk unit feature in the target risk unit feature set into the pre-trained risk feature decision tree, and determine the leaf node to which the risk unit feature is divided. The leaf nodes of the risk feature decision tree correspond to the risk category;
    相似度网络构建模块,用于对于所述风险特征决策树中目标风险类别的叶子节点,构建与该叶子节点对应的相似度网络,其中,所构建的相似度网络的各节点分别对应划分到该叶子节点的各风险单位特征,所构建的相似度网络图中任两节点间的相似度与该两节点各自对应的两个风险单位特征间的第一相似度正相关,所述第一相似度为该两节点各自对应的风险特征单位在该叶子节点对应的路径向量间的相似度;The similarity network construction module is used to construct a similarity network corresponding to the leaf node of the target risk category in the risk feature decision tree, wherein each node of the constructed similarity network is correspondingly divided into the For each risk unit feature of the leaf node, the similarity between any two nodes in the constructed similarity network graph is positively correlated with the first similarity between the two risk unit features corresponding to the two nodes, and the first similarity Is the similarity between the corresponding risk feature units of the two nodes and the path vectors corresponding to the leaf nodes;
    同质风险生成模块,用于对于所构建的各相似度网络,利用预设社区发现算法,生成该相似度网络的社区,以及用所生成的每个社区中各节点对应的风险单位特征生成与该社区对应的同质风险单位特征集合。The homogenous risk generation module is used for each constructed similarity network, using a preset community discovery algorithm to generate the community of the similarity network, and use the generated characteristics of the risk unit corresponding to each node in each community to generate and The feature set of homogeneous risk units corresponding to the community.
  6. 如权利要求5所述的装置,所述风险特征决策树是通过如下训练步骤训练得到的:The device according to claim 5, wherein the risk feature decision tree is obtained by training through the following training steps:
    获取参考样本集合,其中,参考样本包括样本风险单位信息和对应的样本风险类别;Obtain a reference sample set, where the reference sample includes sample risk unit information and corresponding sample risk category;
    对所述参考样本信息集合中的每个参考样本中样本风险单位信息进行特征提取,得到对应的样本特征;Performing feature extraction on sample risk unit information in each reference sample in the reference sample information set to obtain corresponding sample features;
    对于所述参考样本集合中的参考样本,以该参考样本中的样本风险信息对应的样本特征作为输入,以该参考样本中的样本风险类别作为期望输出,训练决策树,得到所述风险特征决策树。For the reference samples in the reference sample set, the sample feature corresponding to the sample risk information in the reference sample is used as input, and the sample risk category in the reference sample is used as the expected output, and the decision tree is trained to obtain the risk feature decision tree.
  7. 如权利要求6所述的装置,所述装置还包括:The device according to claim 6, further comprising:
    输出模块,用于对于所述风险特征决策树中目标风险类别的叶子节点,输出至少一项以下信息:与该叶子节点对应的相似度网络的节点数,与该叶子节点对应的相似度网络的每个社区对应的同质风险单位特征集合。The output module is used to output at least one item of the following information for the leaf node of the target risk category in the risk feature decision tree: the number of nodes of the similarity network corresponding to the leaf node, and the number of the similarity network corresponding to the leaf node The feature set of homogeneous risk units corresponding to each community.
  8. 如权利要求6-7中任一所述的装置,所述预设社区发现算法为标签传播算法和/或所述第一相似度为该两节点各自对应的风险特征单位在该叶子节点对应的路径向量间的余弦相似度。7. The device according to any one of claims 6-7, wherein the preset community discovery algorithm is a label propagation algorithm and/or the first similarity is the risk characteristic unit corresponding to each of the two nodes corresponding to the leaf node The cosine similarity between path vectors.
  9. 一种同质风险单位特征集合生成设备,包括:A device for generating a feature set of homogeneous risk units, including:
    至少一个处理器;At least one processor;
    以及,as well as,
    与所述至少一个处理器通信连接的存储器;A memory connected in communication with the at least one processor;
    其中,among them,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够:The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can:
    对于目标风险单位特征集合中的风险单位特征,将该风险单位特征输入至预先训练的风险特征决策树,确定该风险单位特征所划分到的叶子节点,其中,所述风险特征决策树的叶子节点对应风险类别;For the risk unit feature in the target risk unit feature set, input the risk unit feature into a pre-trained risk feature decision tree, and determine the leaf node to which the risk unit feature is divided, wherein the leaf node of the risk feature decision tree Corresponding risk category;
    对于所述风险特征决策树中目标风险类别的叶子节点,构建与该叶子节点对应的相似度网络,其中,所构建的相似度网络的各节点分别对应划分到该叶子节点的各风险单位特征,所构建的相似度网络图中任两节点间的相似度与该两节点对应的两个风险单位特征间的第一相似度正相关,所述第一相似度为该两节点各自对应的风险特征单位在该叶子节点对应的路径向量间的相似度;For the leaf node of the target risk category in the risk feature decision tree, construct a similarity network corresponding to the leaf node, wherein each node of the constructed similarity network corresponds to each risk unit feature of the leaf node, The similarity between any two nodes in the constructed similarity network graph is positively correlated with the first similarity between the two risk unit features corresponding to the two nodes, and the first similarity is the risk feature corresponding to each of the two nodes. The similarity of the unit between the path vectors corresponding to the leaf node;
    对于所构建的各相似度网络,利用预设社区发现算法,生成该相似度网络的社区,以及用所生成的每个社区中各节点对应的风险单位特征生成与该社区对应的同质风险单位特征集合。For each similarity network constructed, use the preset community discovery algorithm to generate the community of the similarity network, and use the generated risk unit characteristics corresponding to each node in each community to generate a homogeneous risk unit corresponding to the community Feature collection.
  10. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可执行指令,其特征在于,所述计算机可执行指令被处理器执行时实现如下的步骤:A computer-readable storage medium storing computer-executable instructions, wherein the computer-executable instructions are executed by a processor to implement the following steps:
    对于目标风险单位特征集合中的风险单位特征,将该风险单位特征输入至预先训练的风险特征决策树,确定该风险单位特征所划分到的叶子节点,其中,所述风险特征决策树的叶子节点对应风险类别;For the risk unit feature in the target risk unit feature set, input the risk unit feature into a pre-trained risk feature decision tree, and determine the leaf node to which the risk unit feature is divided, wherein the leaf node of the risk feature decision tree Corresponding risk category;
    对于所述风险特征决策树中目标风险类别的叶子节点,构建与该叶子节点对应的相似度网络,其中,所构建的相似度网络的各节点分别对应划分到该叶子节点的各风险单位特征,所构建的相似度网络图中任两节点间的相似度与该两节点对应的两个风险单位特征间的第一相似度正相关,所述第一相似度为该两节点各自对应的风险特征单位在该叶子节点对应的路径向量间的相似度;For the leaf node of the target risk category in the risk feature decision tree, construct a similarity network corresponding to the leaf node, wherein each node of the constructed similarity network corresponds to each risk unit feature of the leaf node, The similarity between any two nodes in the constructed similarity network graph is positively correlated with the first similarity between the two risk unit features corresponding to the two nodes, and the first similarity is the risk feature corresponding to each of the two nodes. The similarity of the unit between the path vectors corresponding to the leaf node;
    对于所构建的各相似度网络,利用预设社区发现算法,生成该相似度网络的社区,以及用所生成的每个社区中各节点对应的风险单位特征生成与该社区对应的同质风险单位特征集合。For each similarity network constructed, use the preset community discovery algorithm to generate the community of the similarity network, and use the generated risk unit characteristics corresponding to each node in each community to generate a homogeneous risk unit corresponding to the community Feature collection.
PCT/CN2020/123373 2019-12-19 2020-10-23 Homogeneous risk unit feature set generation method, apparatus and device, and medium WO2021120845A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911314914.4A CN111126476A (en) 2019-12-19 2019-12-19 Homogeneous risk unit feature set generation method, device, equipment and medium
CN201911314914.4 2019-12-19

Publications (1)

Publication Number Publication Date
WO2021120845A1 true WO2021120845A1 (en) 2021-06-24

Family

ID=70500085

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/123373 WO2021120845A1 (en) 2019-12-19 2020-10-23 Homogeneous risk unit feature set generation method, apparatus and device, and medium

Country Status (2)

Country Link
CN (1) CN111126476A (en)
WO (1) WO2021120845A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111126476A (en) * 2019-12-19 2020-05-08 支付宝(杭州)信息技术有限公司 Homogeneous risk unit feature set generation method, device, equipment and medium
CN112348659B (en) * 2020-10-21 2024-03-19 上海淇玥信息技术有限公司 User identification policy distribution method and device and electronic equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150294249A1 (en) * 2014-04-11 2015-10-15 International Business Machines Corporation Risk prediction for service contracts vased on co-occurence clusters
CN109544166A (en) * 2018-11-05 2019-03-29 阿里巴巴集团控股有限公司 A kind of Risk Identification Method and device
CN110232525A (en) * 2019-06-14 2019-09-13 腾讯科技(深圳)有限公司 A kind of business risk monitoring method, device, server and storage medium
CN110503565A (en) * 2019-07-05 2019-11-26 中国平安人寿保险股份有限公司 Behaviorist risk recognition methods, system, equipment and readable storage medium storing program for executing
CN110516910A (en) * 2019-07-23 2019-11-29 平安科技(深圳)有限公司 Declaration form core based on big data protects model training method and core protects methods of risk assessment
CN111126476A (en) * 2019-12-19 2020-05-08 支付宝(杭州)信息技术有限公司 Homogeneous risk unit feature set generation method, device, equipment and medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109922032B (en) * 2017-12-13 2022-04-19 百度在线网络技术(北京)有限公司 Method, device, equipment and storage medium for determining risk of logging in account
CN109828995B (en) * 2018-12-14 2020-12-11 中国科学院计算技术研究所 Visual feature-based graph data detection method and system
CN110570111A (en) * 2019-08-30 2019-12-13 阿里巴巴集团控股有限公司 Enterprise risk prediction method, model training method, device and equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150294249A1 (en) * 2014-04-11 2015-10-15 International Business Machines Corporation Risk prediction for service contracts vased on co-occurence clusters
CN109544166A (en) * 2018-11-05 2019-03-29 阿里巴巴集团控股有限公司 A kind of Risk Identification Method and device
CN110232525A (en) * 2019-06-14 2019-09-13 腾讯科技(深圳)有限公司 A kind of business risk monitoring method, device, server and storage medium
CN110503565A (en) * 2019-07-05 2019-11-26 中国平安人寿保险股份有限公司 Behaviorist risk recognition methods, system, equipment and readable storage medium storing program for executing
CN110516910A (en) * 2019-07-23 2019-11-29 平安科技(深圳)有限公司 Declaration form core based on big data protects model training method and core protects methods of risk assessment
CN111126476A (en) * 2019-12-19 2020-05-08 支付宝(杭州)信息技术有限公司 Homogeneous risk unit feature set generation method, device, equipment and medium

Also Published As

Publication number Publication date
CN111126476A (en) 2020-05-08

Similar Documents

Publication Publication Date Title
TWI679592B (en) Method and device for generating risk control rules
CN108629687B (en) Anti-money laundering method, device and equipment
US20200211106A1 (en) Method, apparatus, and device for training risk management models
TWI769190B (en) Risk management method and device
EP3690787A1 (en) Graphical structure model-based method for credit risk control, and device and equipment
TWI673666B (en) Method and device for data risk control
TW202008237A (en) Method and device for training prediction model for new scenario
TW202029079A (en) Method and device for identifying irregular group
TW201833851A (en) Risk control event automatic processing method and apparatus
WO2021103909A1 (en) Risk prediction method and apparatus, risk prediction model training method and apparatus, and electronic device
TW201913522A (en) Risk feature screening, description message generation method, device and electronic device
TW201913441A (en) Model safety detection method, device and electronic device
CN110472802B (en) Data characteristic evaluation method, device and equipment
WO2021120845A1 (en) Homogeneous risk unit feature set generation method, apparatus and device, and medium
CN110633989A (en) Method and device for determining risk behavior generation model
WO2021143370A1 (en) Method and device for processing resource data
CN111260368A (en) Account transaction risk judgment method and device and electronic equipment
WO2020155831A1 (en) Data tag generation method and apparatus, model training method and apparatus, and event identification method and apparatus
WO2021249526A1 (en) Risk prevention and control information processing method, apparatus and device
US20160085857A1 (en) Grouping data using dynamic thresholds
CN109636181A (en) A kind of user credit divides calculation method and system
CN111932273B (en) Transaction risk identification method, device, equipment and medium
CN112966113A (en) Data risk prevention and control method, device and equipment
CN113379528A (en) Wind control model establishing method and device and risk control method
CN114611850A (en) Service analysis method and device and electronic equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20902786

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20902786

Country of ref document: EP

Kind code of ref document: A1