WO2021174693A1 - Procédé et appareil d'analyse de données, et système informatique et support de stockage lisible - Google Patents

Procédé et appareil d'analyse de données, et système informatique et support de stockage lisible Download PDF

Info

Publication number
WO2021174693A1
WO2021174693A1 PCT/CN2020/093201 CN2020093201W WO2021174693A1 WO 2021174693 A1 WO2021174693 A1 WO 2021174693A1 CN 2020093201 W CN2020093201 W CN 2020093201W WO 2021174693 A1 WO2021174693 A1 WO 2021174693A1
Authority
WO
WIPO (PCT)
Prior art keywords
risk
node
path
model
rate
Prior art date
Application number
PCT/CN2020/093201
Other languages
English (en)
Chinese (zh)
Inventor
段洪云
汪伟
彭琛
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021174693A1 publication Critical patent/WO2021174693A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification

Definitions

  • This application relates to the field of computer technology, which relates to data mining technology for big data, and in particular to a data analysis method, device, computer system, and readable storage medium.
  • the current knowledge graph only describes the node data and the relationship between the nodes, but the information alone can only provide the basic information of the node and the related information between the nodes, without deep mining of the information in the knowledge graph. As a result, it can only provide simple information, but cannot provide users with valuable information that can be used directly.
  • the purpose of this application is to provide a data analysis method, device, computer system, and readable storage medium, which are used to solve the problem of in-depth mining of the information in the knowledge graph in the prior art, resulting in that it can only provide simple information.
  • this application provides a data analysis method based on big data, including:
  • the creation server creates a directed graph used to describe the association relationship and asset relationship between nodes, and calculates the risk transmission coefficient of each path in the directed graph through the infectious disease model to obtain a scale-free model, and sends it to the risk server;
  • the node refers to the information owner
  • the association relationship is used to reflect the involvement and influence between the information owners
  • the asset relationship is used to reflect the asset association ratio between the information owners;
  • the risk server identifies the infected node in the scale-free model, and calculates the risk transmission rate of each path in the scale-free model according to the infected node and combined with the risk transmission coefficient to obtain the risk infection model and send it Computing server
  • the calculation server extracts the node in the risk infection model according to the node request sent by the user terminal and uses it as the target node, and calculates the risk transmission rate in the incoming direction and the risk transmission rate in the outgoing direction of the target node to obtain transmission Incoming risk rate and outgoing risk rate; wherein the node request includes the node name corresponding to the node in the risk infection model, which is used to extract the node in the risk infection model.
  • this application also provides a data analysis device based on big data, including:
  • Create a server used to create a directed graph describing the relationship between nodes and asset relationships, and calculate the risk transmission coefficient of each path in the directed graph through the infectious disease model to obtain a scale-free model, and send it Risk server; wherein the node refers to the information owner, the association relationship is used to reflect the involvement and influence between the information owners, and the asset relationship is used to reflect the asset association ratio between the information owners;
  • the risk server is used to identify infected nodes in the scale-free model, and calculate the risk transmission rate of each path in the scale-free model according to the infected nodes and in combination with the risk transmission coefficient to obtain the risk infection model and Send computing server;
  • the calculation server is used to extract the node in the risk infection model and use it as the target node according to the node request sent by the user terminal, and calculate the risk transmission rate of the target node in the incoming direction and the risk transmission rate in the outgoing direction to obtain Incoming risk rate and outgoing risk rate; wherein the node request includes a node name corresponding to the node in the risk infection model, which is used to extract the node in the risk infection model.
  • the present application also provides a computer system, which includes a plurality of computer devices, each computer device includes a memory, a processor, and a computer program stored in the memory and running on the processor.
  • each computer device includes a memory, a processor, and a computer program stored in the memory and running on the processor.
  • the processor of the device executes the computer program, the steps of the above-mentioned data analysis method are jointly implemented.
  • the present application also provides a computer-readable storage medium, which includes multiple storage media, each of which stores a computer program, and when the computer program stored in the multiple storage media is executed by a processor Jointly implement the steps of the above data analysis method.
  • the data analysis method, device, computer system, and readable storage medium provided by this application create a directed graph for describing the association relationship and asset relationship between nodes, and calculate each of the directed graphs through an infectious disease model.
  • the risk transmission coefficient of the path is obtained to obtain a scale-free model to describe the degree of association between the nodes; by identifying the infected node in the scale-free model, the non-standard model is calculated according to the infected node and combined with the risk transmission coefficient.
  • the risk transmission rate of each path in the degree model is used to obtain the risk infection model; therefore, according to the infection node, the risk transmission coefficient and the average transmission rate are used to calculate the risk transmission rate of each path in the scale-free model, which is used to express the risk from infection.
  • the risk transmission rate of a node to other nodes because the risk transmission rate is obtained based on the average conductivity and the risk transmission coefficient, so it can reflect the probability of risk transmission between the two related nodes in the most true manner.
  • the node in the risk infection model is extracted and used as the target node, and the risk transmission rate in the incoming direction and the outgoing direction of the target node are calculated to obtain the incoming risk rate and Outgoing risk rate; Achieve a comprehensive knowledge of the outgoing risk rate and incoming risk rate of the target node, in order to comprehensively evaluate the risk characteristics of the target node, so as to comprehensively convey the risk environment of the target node to users, which is conducive to users according to the link Make judgments.
  • the four-quadrant model is used to calculate the incoming risk rate and the outgoing risk rate to obtain the judgment result, and output it to the user terminal.
  • the four-quadrant model is used to evaluate the risk characteristics of the target node, and the characteristics are named Or icons are output to the user terminal, so that the user can quickly learn about the risk environment describing the impact of surrounding nodes on the target node, and the impact of the target node on surrounding nodes, and realize the in-depth mining of the information in the knowledge graph, thereby solving
  • the lack of in-depth mining of the information in the knowledge map results in a technical problem that it can only provide simple information, but cannot provide users with valuable information that can be directly used.
  • FIG. 1 is a flowchart of Embodiment 1 of the data analysis method of this application;
  • 3 is a tree diagram of the data association mode of Euler Atlas in the first embodiment of the data analysis method of this application;
  • Embodiment 4 is a directed graph with primary conductivity in Embodiment 1 of the data analysis method of this application;
  • FIG. 5 is a directed graph with a primary conduction coefficient in which the risk conduction coefficient is loaded on each path in the first embodiment of the data analysis method of this application;
  • Fig. 6 is a directed graph of the risk infection model in the first embodiment of the data analysis method of this application.
  • FIG. 7 is a schematic diagram of the four-quadrant model in Embodiment 1 of the data analysis method of this application;
  • FIG. 8 is a schematic diagram of program modules of Embodiment 2 of the data analysis device of this application.
  • FIG. 9 is a schematic diagram of the hardware structure of the computer equipment in the third embodiment of the computer system of this application.
  • the data analysis method, device, computer system, and readable storage medium provided in this application are suitable for the computer field and provide a data analysis based on a big data creation module, an infection model creation module, a risk calculation module, and a risk judgment module method.
  • This application creates a directed graph used to describe the association relationship and asset relationship between nodes, and calculates the risk transmission coefficient of each path in the directed graph through an infectious disease model to obtain a scale-free model; identify the scale-free model
  • the infection node in the model calculates the risk transmission rate of each path in the scale-free model according to the infection node and the risk transmission coefficient to obtain the risk infection model; extracts the risk infection model according to the node request sent by the client And use it as the target node, calculate the risk transmission rate in the incoming direction and the outgoing direction of the target node to obtain the incoming risk rate and the outgoing risk rate; calculate the transmission risk rate through the four-quadrant model Incoming risk rate and outgoing risk rate to obtain the judgment result, and output it to the user terminal
  • a data analysis method based on big data in this embodiment includes:
  • S1 Create a server to create a directed graph used to describe the relationship between nodes and asset relationships, and calculate the risk transmission coefficient of each path in the directed graph through the infectious disease model to obtain a scale-free model, and send it to risk Server; wherein, the node refers to the information owner, the association relationship is used to reflect the involvement and influence between the information owners, and the asset relationship is used to reflect the asset association ratio between the information owners;
  • the risk server identifies the infected node in the scale-free model, and calculates the risk transmission rate of each path in the scale-free model according to the infected node and combined with the risk transmission coefficient to obtain the risk infection model and It sends the computing server;
  • the calculation server extracts the node in the risk infection model according to the node request sent by the user and uses it as the target node, and calculates the risk transmission rate in the incoming direction and the outgoing direction of the target node to Obtain the incoming risk rate and the outgoing risk rate; wherein the node request includes a node name corresponding to the node in the risk infection model, which is used to extract the node in the risk infection model.
  • the calculation of the risk transmission rate in the incoming direction and the risk transmission rate in the outgoing direction of the target node to obtain the incoming risk rate and the outgoing risk rate includes:
  • S4 Enter the incoming risk rate and outgoing risk rate into a preset four-quadrant model to obtain risk points, identify the area where the risk point is located and use the name of the area as the judgment result, and output the judgment result to the user end.
  • a directed graph is created to express the association relationship and asset relationship between nodes, where the node can be an enterprise or a natural person. Therefore, the name or name of the enterprise or natural person can be used as the node.
  • Calculate the average conduction probability of the directed graph in equilibrium that is, the number of nodes affected by the infected node is equal to the number of nodes free from the impact of the infected node
  • the risk transmission probability of each path in the graph is loaded on each path to obtain a scale-free model; wherein the scale-free model is a directed graph that reflects the relationship between the nodes and the asset relationship Based on the infectious disease model, the risk transmission probability of each path in the directed graph is calculated and loaded on the data model of each path.
  • a directed graph is a mathematical method of representing the relationship between objects and objects. It consists of some small dots (called vertices or nodes) and straight lines or curves connecting these dots (called edges). ) Composed of; in this embodiment, the dots correspond to the nodes in this application, the straight lines or curves connecting these dots, the arrows connecting the dots or the curves are loaded on the straight line Or the information on the curve corresponds to the asset relationship.
  • the infectious disease model is a standard measurement model used to calculate the influence of nodes. The nodes in the infectious disease model have three states: susceptible, infected, and recovered.
  • a susceptible person can be infected by an infected person, an infected person is infected and has the ability to infect a susceptible person, and a recovered person is recovered from an infected person, who no longer has the ability to infect and will not be infected again.
  • the infectious disease model parameters include the recovery rate ⁇ , the infection probability ⁇ , the number of repeated simulations T, and the simulation time timespace.
  • the simulation process is as follows: select a node i in the network as the infected node, and spread the virus to the neighbor susceptible nodes connected to i with the infection probability ⁇ , and the infected nodes continue to infect their neighbor susceptible nodes with the probability ⁇ .
  • each diseased node transforms into a recovery node with a probability of ⁇ at each stage.
  • the blacklist system can be used to identify the infected nodes in the scale-free model, that is: if a node belongs to the blacklist, the node is determined to be an infected node; the risk of an infected node is set to 100%, and it will pass through the path with the infected node
  • the connected nodes are regarded as directly connected nodes, and the risk transmission rate of the risk from the infected node to the directly connected node is calculated according to the risk transmission coefficient on the path; then the nodes connected to the connected node and the connected nodes are identified And set the connected nodes as indirectly connected nodes, and calculate the risk conduction rate of the risk from the infected node to the indirectly connected node through the directly connected node through the risk transmission coefficient on the connected path, By analogy, until all nodes directly connected and indirectly connected to the infected node in the scale-free model are calculated, the risk infection model is obtained.
  • the node request includes the node name used to correspond to the node in the risk infection model, the node with the node name in the risk infection model is used as the target node, the path in the risk infection model is identified, and the path to the target node is identified As the incoming path, the path pointed out by the target node is taken as the outgoing path, the incoming risk rate in the incoming direction is obtained according to the risk conduction rate of the incoming path, and the outgoing risk rate is obtained according to the risk conduction rate of the outgoing path The outgoing risk rate of the direction.
  • the user can set an area in the coordinate system of the four-quadrant model and assign a name to the area; according to the incoming risk rate and the outgoing risk rate, the risk point is obtained in the four-quadrant model, and the name of the area where the risk point is located is used as the judgment As a result, the judgment result is output to the user terminal.
  • the creation of the directed graph used to describe the association relationship and asset relationship between nodes includes:
  • S101 Obtain node data and the association relationship between the nodes from the service system, and construct a directed graph for describing the association relationship between the nodes according to the association relationship.
  • the knowledge graph is obtained from the service system, the node data and the association relationship between the nodes are obtained from the knowledge graph, and if there is an association relationship between the two node data, then the two node data are Draw a path between the corresponding nodes. If node A is an investment relationship with node B, the path between node A and node B is A to B; among them, because the path in the knowledge graph has arrows, it can be learned from the knowledge The relationship between node A and node B is directly learned in the graph, so it will not be repeated in this application.
  • the service system stores a large amount of enterprise and personal data, and establishes a large amount of enterprise relationship data, equity relationship data, litigation relationship data, etc. based on the enterprise and personal data.
  • the service system can cover industry, commerce, finance and economics. Announcements, legal documents, social media, and overseas public opinion; the service system described in this application uses Euler Atlas, which is a type of storage for corporate and personal data, and is constructed based on corporate and personal data.
  • the corporate network of related persons and related companies from the knowledge map of the six major data relationships of shareholder stock relationship, foreign investment relationship, supply chain relationship, equity pledge relationship, financing guarantee relationship and corporate executives; this application is to be resolved
  • the technical problem is how to deeply dig the knowledge relationship of the current knowledge graph (Euler graph in this application) to obtain information of potential application value; and for obtaining node data and various information according to the knowledge graph (Euler graph in this application)
  • the association relationship between the nodes can be easily obtained by a person skilled in the art through the extracted Euler graph, so it will not be repeated here. Since this system is an existing technology, it will not be repeated here.
  • S102 Calculate the asset association ratio between the interconnected nodes in the directed graph, and load it on the path between the interconnected nodes to describe the asset relationship between the nodes in the directed graph.
  • the node data of two nodes that are related to each other are extracted, and the invested data and investment data in the node data are obtained, wherein the one of the two nodes that emits the arrow (path) is regarded as the investment node, and the The party pointed by the arrow (path) is used as the invested node, extract the investment data in the investment node, and the invested data associated with the investment data in the invested node, and divide the invested data by the investment data to obtain the asset association And load the asset correlation ratio on the path.
  • the node data of the two interconnected nodes is information describing that one of the nodes invests in the other node, so in the node data of the two nodes, one of the node data of the two nodes must be the node data of the investment node, and the other is the node data of the investment node.
  • the node data of the invested node refers to the ratio of the amount received by the invested node to the total amount of foreign investment of the investment node; the invested data associated with the investment data refers to the ratio of the investment data to the invested node.
  • the invested data generated by the investment for example, the investment data of the investment node is 1 million, the invested data of the invested node is 500,000, and the 200,000 of the invested data is based on the investment data of the investment node,
  • the other 300,000 is obtained based on the investment data of other investment nodes, so the asset correlation ratio between the investment node and the invested node is 20%.
  • the calculation of the risk transmission coefficients of the paths in the directed graph through the infectious disease model to obtain the scale-free model includes:
  • S111 Calculate the model index of the directed graph through the infectious disease model to obtain the average transmission probability.
  • the infectious disease model has an equilibrium condition setting, and the average conductivity of the model index of the directed graph under the equilibrium condition is calculated through the infectious disease model.
  • infectious disease model is a standard measurement model used to calculate the influence of nodes. Based on the principle of the infectious disease model, the process of risk transmission in the directed graph is that the susceptible person receives the infection of the risk node. The infected person will recover and become the recoverer, and the recovered person will become the infected person again due to the infection of the risk node again.
  • Probability refers to the probability of risk transmission that can maintain the equilibrium condition of the directed graph.
  • the objective function is as follows:
  • the method first defines the following:
  • K refers to the total number of individuals in the infectious disease model, corresponding to the total number of nodes; risk individuals are recorded as I, healthy individuals are recorded as S; S(t) is the number of healthy individuals at time t; ⁇ (t) is at time t K ⁇ is the ratio of the number of individuals at risk that can be infected per unit time to the total number of healthy individuals at the time, corresponding to the product of the node degree and the average degree; ⁇ refers to the average density of the infectious disease model, corresponding to The average density; risk individuals pass the risk ⁇ through probability ⁇ , and the risk individuals will recover health but will not be immune;
  • the equilibrium state is obtained (that is, the number of newly infected people in the current period is equal to the number of newly recovered healthy people in the current period.
  • Number is equal to the overall infection probability ⁇ , and the overall infection probability ⁇ is set as the average transmission probability.
  • the objective function of the commercial environment infectious disease model is only in a balanced state, and the medical infectious disease model dynamically simulates the change in the number of infected patients in the population to obtain the population's infectious proportion in the balanced state.
  • S112 The average conduction probability is respectively multiplied by the asset correlation ratio on the path of the directed graph to obtain the primary conduction coefficient, and the correlation degree between the interconnected nodes in the directed graph is identified through the random forest model , And load it on the path of the interconnected nodes;
  • the path between the two nodes is assigned a correlation value of 1; if it is judged that the node data of the two nodes are not related, then the The path between the two nodes is assigned a correlation value of 0; the risk conduction coefficient is loaded on the path of the directed graph.
  • a random forest is a classifier that contains multiple decision trees, and its output category is a model that is determined by the mode of the category output by an individual tree; because there are many decision trees in the random forest, and each There is no correlation between a decision tree, so after getting the forest, when a new input sample enters, let each decision tree in the forest make a judgment separately to see which sample should belong to One category (for the classification algorithm), and then see which category is selected the most, and then predict which category this sample belongs to.
  • the random forest is obtained by training in the following way:
  • N samples are prepared, and N samples are randomly selected for replacement (one sample is randomly selected each time, and then returned to continue selection).
  • the selected N samples are used to train a decision tree as the samples at the root node of the decision tree.
  • each node In the process of forming the decision tree, each node must be split according to step 2 until it can no longer be split.
  • the user trains the initial random forest by setting the m attributes and classification strategies and using the above-mentioned method to obtain node data for identifying two nodes that are related to each other, whether the random forest is related Model; Because the training method of random forest belongs to the common sense of the public of those skilled in the art, the training process of training random forest by setting m attributes and classification strategies will not be repeated here.
  • S113 Multiply the primary conduction coefficient and the correlation degree on the path of the directed graph to obtain a risk conduction coefficient, and load the risk conduction coefficient on the path of the directed graph to obtain a scale-free model.
  • the directed graph with primary conductivity includes nodes labeled A-G, and its layout is shown in FIG. 4.
  • the overall average conductivity of the scale-free model is calculated to simulate the risk transmission probability of the entire scale-free model in a dynamic equilibrium state, which is used to provide users with ideal, A quantifiable average risk transmission probability of the full model.
  • the random forest model is used to identify the correlation between interconnected nodes in the directed graph, and based on this The correlation degree judges whether the risk will be transmitted between the two associated nodes, and the risk transmission method is described in a true and accurate manner. The goal is to identify the correlation degree between the two nodes.
  • the identifying the infected node in the scale-free model includes:
  • a blacklist with unit names in the blacklist; compare the names of each node in the scale-free model with those in the blacklist, and if the name of the node matches a unit in the blacklist If the names are consistent, it is determined that the node belongs to the blacklist.
  • S202 Set nodes belonging to the blacklist as infected nodes.
  • the nodes belonging to the blacklist in the scale-free model are set as infectious nodes.
  • the calculation of the risk conduction rate of each path in the scale-free model according to the infected node and the risk conduction coefficient includes:
  • S211 Identify a continuous path continuously associated with the infected node in the scale-free model, and sequentially number the sub-paths of the continuous path starting from the infected node; wherein the continuous path is Refers to the entire path of the associated nodes connected in series with the infected node in the scale-free model, and the sub path is the path between two adjacent nodes in the continuous path.
  • the continuous path in the aforementioned scale-free model includes: BDF and BDG; the sub-paths of BDF include BD and DF, and the sub-paths of BDG include BD and DG; because the infected node of the continuous path is Node B, so take Node B as the starting position; in the BDF continuous path, the number of the sub-path BD is 01, and the number of the sub-path DF is 02; in the BDG continuous path, the number of the sub-path BD is set to 11. Set the number of the sub-path DG to 12.
  • S212 Set any sub-path on the continuous path as a path to be calculated, identify the sub-path with a number smaller than the path to be calculated, extract the risk transmission coefficients therein, and then summarize them to form a coefficient set, and combine the risks of the coefficients
  • the conduction coefficient is multiplied by the risk conduction coefficient of the path to be calculated to obtain the true conduction coefficient, and it is loaded on the path to be calculated.
  • the path to be calculated is a sub-path BD, and its risk transmission coefficient is 0.2, because the sub-path does not have a sub-path with a number less than the path to be calculated in the continuous path BDF and BDG, therefore,
  • the true conductivity coefficient of the path to be calculated is set to 0.2 and loaded on the path to be calculated BD;
  • the path to be calculated is DF and its risk conduction coefficient is 0.4
  • the risk conduction coefficient in the sub-path BD is extracted and the coefficient set (0.2 ), multiply the risk transmission coefficient of 0.2 with the coefficient concentration and 0.4 of the risk transmission coefficient of the path to be calculated to obtain a risk transmission ratio of 0.08, and load it on the path to be calculated DF;
  • the path to be calculated is DG and its risk conduction coefficient is 0.5
  • the risk conduction coefficient in the sub-path BD is extracted and the coefficient set (0.2 )
  • the risk transmission rate of each path in the scale-free model is calculated, which is used to express the risk transmission rate of the risk from the infected node to other nodes, because of the risk transmission
  • the rate is obtained based on the average conductivity and the risk transmission coefficient, so it can most truly reflect the probability of risk transmission between the two related nodes.
  • the blacklist system is adopted to identify infected nodes, and the risk transmission rate is loaded on the corresponding path of the scale-free model to obtain the risk infection model, so as to quickly identify the incoming risk of any node in the risk infection model And outgoing risk.
  • the extracting and using the node in the risk infection model as the target node according to the node request sent by the user terminal includes:
  • S301 Receive a node request with a node name sent by the user terminal.
  • the node name in this step can be the company name or the company number.
  • S302 Compare the node names of all nodes in the risk infection model with the node request, and extract a node that matches the node request, and use the node as a target node.
  • the node request sent by the user terminal has a node name D
  • it will be compared with the nodes A-G in the scale-free model in turn, and the node D is obtained as a node matching the node name D.
  • the calculating the risk transmission rate in the incoming direction and the risk transmission rate in the outgoing direction of the target node to obtain the incoming risk rate and the outgoing risk rate includes:
  • S311 Set a node matching the node request as a target node, and extract a path connected to the target node in the scale-free model;
  • S312 Set the path pointing to the target node as an incoming path, and set the path pointed out from the target node as an outgoing path;
  • the path in this step has a direction.
  • the target node is set as the incoming path, which is used to describe the situation where the risk is passed from the outside to the target node; the path pointed out from the target node is set as the outgoing path, which is used to describe the risk The situation transmitted from the target node.
  • S313 Calculate the risk transmission rate of the incoming path of the target node by a weighted adjustment formula to obtain the incoming risk rate, and calculate the risk transmission rate of the outgoing path of the target node by the weighted adjustment formula to obtain the outgoing risk Rate.
  • the weighting adjustment formula is
  • T is the incoming risk rate or the outgoing risk rate, where the formula (1) in T is used to calculate the incoming risk rate, and the formula (2) is used to calculate the outgoing risk rate;
  • x is the risk transmission rate of the incoming path
  • a is the incoming coefficient
  • m is the incoming adjustment value
  • the incoming coefficient and the incoming adjustment value can be adjusted according to the needs of users
  • y is the risk transmission rate of the outgoing path
  • b Is the outgoing coefficient
  • n is the outgoing adjustment value
  • the outgoing coefficient and outgoing adjustment value can be adjusted according to the user's needs.
  • the paths connected to node D in the scale-free model which are BD, DF, and DG
  • the layout diagram of the risk infection model shows that BD is the incoming path, and DF and DG are transmission paths.
  • the risk characteristics of the target node can be comprehensively evaluated, so that the risk environment of the target node can be fully communicated to the user, which is conducive to the user's judgment based on this link.
  • inputting the incoming risk rate and outgoing risk rate into a preset four-quadrant model to obtain a risk point includes:
  • S401 Enter the incoming risk rate and the outgoing risk rate into a preset four-quadrant model
  • the four-quadrant model is a coordinate system that describes the risk characteristics of the node based on the incoming risk rate and the outgoing risk rate of the node.
  • the origin of the coordinate system is the incoming risk rate and the outgoing risk rate, respectively.
  • the incoming risk rate increases along the extending direction of the vertical axis of the above-mentioned coordinate system
  • the outgoing risk rate increases along the extending direction of the horizontal axis of the above-mentioned coordinate system, as specifically shown in FIG. 7.
  • the coordinate system is divided into four parts by crossing the abscissa dividing line and the ordinate dividing line, which are used to express both the incoming risk rate and the outgoing risk rate.
  • Silent glacier area used to express the "expanse ocean” area with high incoming risk rate but low outgoing risk rate, and used to express "active volcano” area with low incoming risk rate but high outgoing risk rate , And used to express the "storm center” area where both the incoming risk rate and the outgoing risk rate are high.
  • the abscissa dividing line and the ordinate dividing line of the above four parts can be adjusted according to user requirements.
  • the incoming risk rate of 0.2 and the outgoing risk rate of 0.18 are respectively entered into the ordinate and abscissa of the four-quadrant model to obtain risk points.
  • the area where the risk point is located is used to describe the risk characteristics of the node corresponding to the node request sent by the user end, and the name of the area is used as the judgment result; for example, based on the above example, the horizontal axis dividing line and the vertical axis The intersection of the coordinate dividing lines is (0.5, 0.5), and it is identified that the risk point is located in the "silent glacier" area, so the "silent glacier" is used as the judgment result.
  • the four-quadrant model is used to evaluate the risk characteristics of the target node, and the characteristics are output to the user terminal in the form of a name or icon, so that the user can quickly learn the impact of the surrounding nodes on the target node, and the target node The technical effect of the risk environment on the impact of surrounding nodes.
  • a data analysis device 1 based on big data in this embodiment includes:
  • the creation server 11 is used to create a directed graph for describing the association relationship between nodes and asset relationships, and calculate the risk transmission coefficient of each path in the directed graph through an infectious disease model to obtain a scale-free model, and Send the risk server; wherein, the node refers to the information owner, the association relationship is used to reflect the involvement and influence between the information owners, and the asset relationship is used to reflect the asset association ratio between the information owners ;
  • the risk server 12 is configured to identify infected nodes in the scale-free model, and calculate the risk transmission rate of each path in the scale-free model according to the infected nodes and in combination with the risk transmission coefficient, so as to obtain the risk infection model and It sends the computing server;
  • the calculation server 13 is configured to extract the node in the risk infection model and use it as the target node according to the node request sent by the user terminal, and calculate the risk transmission rate of the target node in the incoming direction and the risk transmission rate in the outgoing direction. Obtain the incoming risk rate and the outgoing risk rate; wherein the node request includes a node name corresponding to the node in the risk infection model, which is used to extract the node in the risk infection model.
  • the data analysis device 1 further includes:
  • the judgment server 14 is configured to input the incoming risk rate and the outgoing risk rate into a preset four-quadrant model to obtain a risk point, identify the area where the risk point is located, and use the name of the area as the judgment result, and use the judgment The result is output to the user terminal.
  • this application also provides a computer system, which includes a plurality of computer devices 5.
  • the components of the data analysis apparatus 1 of the second embodiment can be dispersed in different computer devices, and the computer devices can be executable programs.
  • the computer equipment of this embodiment at least includes but is not limited to: a memory 21 and a processor 22 that can be communicatively connected to each other through a system bus, as shown in FIG. 9.
  • FIG. 9 only shows a computer device with components, but it should be understood that it is not required to implement all the components shown, and more or fewer components may be implemented instead.
  • the memory 21 (ie, readable storage medium) includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), Read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disk, optical disk, etc.
  • the memory 21 may be an internal storage unit of a computer device, such as a hard disk or memory of the computer device.
  • the memory 21 may also be an external storage device of the computer device, such as a plug-in hard disk equipped on the computer device, a smart memory card (Smart Media Card, SMC), and Secure Digital (SD).
  • the memory 21 may also include both an internal storage unit of the computer device and an external storage device thereof.
  • the memory 21 is generally used to store an operating system and various application software installed in a computer device, such as the program code of the data analysis device in the first embodiment.
  • the memory 21 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 22 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips.
  • the processor 22 is generally used to control the overall operation of the computer equipment.
  • the processor 22 is used to run the program code or process data stored in the memory 21, for example, to run a data analysis device, so as to implement the data analysis method of the first embodiment.
  • this application also provides a computer-readable storage system, which includes multiple storage media.
  • the storage media may be non-volatile or volatile, such as flash memory, hard disk, multimedia card, and card.
  • Type memory for example, SD or DX memory, etc.
  • RAM random access memory
  • SRAM static random access memory
  • ROM read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • PROM programmable only
  • the read memory (PROM), magnetic memory, magnetic disk, optical disk, server, App application store, etc. have computer programs stored thereon, and the programs are executed by the processor 22 to realize corresponding functions.
  • the computer-readable storage medium of this embodiment is used to store a data analysis device, and when executed by the processor 22, the data analysis method of the first embodiment is implemented.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Educational Administration (AREA)
  • Development Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Software Systems (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

La présente invention concerne un procédé et un appareil d'analyse de données, ainsi qu'un système informatique et un support de stockage lisible, qui se rapportent au domaine des données volumineuses. Le procédé consiste à : créer un graphe orienté utilisé pour décrire une relation d'association et une relation d'actif entre des nœuds, et calculer un coefficient de conduction de risque de chaque chemin dans le graphe orienté au moyen d'un modèle de maladie infectieuse, de façon à obtenir un modèle sans échelle; identifier un nœud d'infection dans le modèle sans échelle, et calculer la conductivité de risque de chaque chemin dans le modèle sans échelle selon le nœud d'infection en combinaison avec le coefficient de conduction de risque, de façon à obtenir un modèle d'infection à risque; et extraire un nœud dans le modèle d'infection à risque selon une demande de nœud envoyée par une extrémité d'utilisateur, prendre le nœud comme nœud cible, et calculer la conductivité de risque du nœud cible dans une direction entrante et la conductivité de risque de celui-ci dans une direction sortante, de façon à obtenir un taux de risque entrant et un taux de risque sortant. Selon la présente invention, les effets techniques d'exploitation profonde d'informations dans un graphe de connaissances et de fourniture d'informations profondes et précieuses sont obtenus.
PCT/CN2020/093201 2020-03-05 2020-05-29 Procédé et appareil d'analyse de données, et système informatique et support de stockage lisible WO2021174693A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010146003.1 2020-03-05
CN202010146003.1A CN111401700B (zh) 2020-03-05 2020-03-05 一种数据分析方法、装置、计算机系统及可读存储介质

Publications (1)

Publication Number Publication Date
WO2021174693A1 true WO2021174693A1 (fr) 2021-09-10

Family

ID=71430500

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/093201 WO2021174693A1 (fr) 2020-03-05 2020-05-29 Procédé et appareil d'analyse de données, et système informatique et support de stockage lisible

Country Status (2)

Country Link
CN (1) CN111401700B (fr)
WO (1) WO2021174693A1 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114048330A (zh) * 2021-11-29 2022-02-15 平安银行股份有限公司 风险传导概率知识图谱生成方法、装置、设备及存储介质
CN115086013A (zh) * 2022-06-13 2022-09-20 北京奇艺世纪科技有限公司 风险识别方法、装置、电子设备、存储介质和计算机程序产品
CN115795055A (zh) * 2022-12-19 2023-03-14 广州城市规划技术开发服务部有限公司 一种关于土地用途数据的知识图谱构建方法及装置
CN115964507A (zh) * 2022-11-28 2023-04-14 北京海致星图科技有限公司 一种基于知识平台的图谱管理系统及计算机可读存储介质

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112084343A (zh) * 2020-09-10 2020-12-15 杭州安恒信息安全技术有限公司 一种社会关系图谱的量化方法、装置和介质
CN112800242B (zh) * 2021-01-28 2023-07-28 平安科技(深圳)有限公司 谱系挖掘方法、装置、电子设备及计算机可读存储介质
CN112948381B (zh) * 2021-02-25 2022-10-28 平安科技(深圳)有限公司 数据处理方法、系统、计算机设备及可读存储介质
CN112883278A (zh) * 2021-03-23 2021-06-01 西安电子科技大学昆山创新研究院 基于智慧社区大数据知识图谱的不良舆论传播抑制方法
CN113077267B (zh) * 2021-03-31 2024-03-12 商运(江苏)科创发展有限公司 企业集群协调的供应链关系管理系统
CN115292424B (zh) * 2022-10-08 2022-12-20 凯美瑞德(苏州)信息科技股份有限公司 一种风险传导的分析方法、电子设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100198631A1 (en) * 2009-01-30 2010-08-05 Bank Of America Corporation Supplier stratification
CN106204264A (zh) * 2016-07-05 2016-12-07 天云融创数据科技(北京)有限公司 一种信贷担保网络风险传播模型构建方法
CN107563645A (zh) * 2017-09-04 2018-01-09 杭州云算信达数据技术有限公司 一种基于大数据的金融风险分析方法
CN108090709A (zh) * 2018-02-09 2018-05-29 重庆誉存大数据科技有限公司 一种基于风险传导模型的企业评估方法及系统
CN109472485A (zh) * 2018-11-01 2019-03-15 成都数联铭品科技有限公司 企业失信风险传播查询系统及方法
CN109949164A (zh) * 2019-03-28 2019-06-28 中山大学 一种基于投资关系网络的重要节点挖掘方法及装置

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170024531A1 (en) * 2015-07-22 2017-01-26 Radicalogic Technologies, Inc. Dba Rl Solutions Systems and methods for near-real or real-time contact tracing
US10747876B2 (en) * 2017-05-17 2020-08-18 Threatmodeler Software Inc. Systems and methods for assisted model generation
CN108335120A (zh) * 2018-03-07 2018-07-27 物数(上海)信息科技有限公司 基于区块链的资产溯源方法、装置、电子设备、存储介质
US20190311428A1 (en) * 2018-04-07 2019-10-10 Brighterion, Inc. Credit risk and default prediction by smart agents
CN110245165B (zh) * 2019-05-20 2023-04-11 平安科技(深圳)有限公司 风险传导关联图谱优化方法、装置和计算机设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100198631A1 (en) * 2009-01-30 2010-08-05 Bank Of America Corporation Supplier stratification
CN106204264A (zh) * 2016-07-05 2016-12-07 天云融创数据科技(北京)有限公司 一种信贷担保网络风险传播模型构建方法
CN107563645A (zh) * 2017-09-04 2018-01-09 杭州云算信达数据技术有限公司 一种基于大数据的金融风险分析方法
CN108090709A (zh) * 2018-02-09 2018-05-29 重庆誉存大数据科技有限公司 一种基于风险传导模型的企业评估方法及系统
CN109472485A (zh) * 2018-11-01 2019-03-15 成都数联铭品科技有限公司 企业失信风险传播查询系统及方法
CN109949164A (zh) * 2019-03-28 2019-06-28 中山大学 一种基于投资关系网络的重要节点挖掘方法及装置

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114048330A (zh) * 2021-11-29 2022-02-15 平安银行股份有限公司 风险传导概率知识图谱生成方法、装置、设备及存储介质
CN115086013A (zh) * 2022-06-13 2022-09-20 北京奇艺世纪科技有限公司 风险识别方法、装置、电子设备、存储介质和计算机程序产品
CN115964507A (zh) * 2022-11-28 2023-04-14 北京海致星图科技有限公司 一种基于知识平台的图谱管理系统及计算机可读存储介质
CN115964507B (zh) * 2022-11-28 2023-10-27 北京海致星图科技有限公司 一种基于知识平台的图谱管理系统及计算机可读存储介质
CN115795055A (zh) * 2022-12-19 2023-03-14 广州城市规划技术开发服务部有限公司 一种关于土地用途数据的知识图谱构建方法及装置
CN115795055B (zh) * 2022-12-19 2023-09-12 广州城市规划技术开发服务部有限公司 一种关于土地用途数据的知识图谱构建方法及装置

Also Published As

Publication number Publication date
CN111401700A (zh) 2020-07-10
CN111401700B (zh) 2023-09-19

Similar Documents

Publication Publication Date Title
WO2021174693A1 (fr) Procédé et appareil d'analyse de données, et système informatique et support de stockage lisible
WO2021174944A1 (fr) Procédé de distribution sélective de message basé sur l'activité de cible et dispositif associé
US11270375B1 (en) Method and system for aggregating personal financial data to predict consumer financial health
CN110148053B (zh) 用户信贷额度评估方法、装置、电子设备和可读介质
CN110795568A (zh) 基于用户信息知识图谱的风险评估方法、装置和电子设备
CN112925914B (zh) 数据安全分级方法、系统、设备及存储介质
JP2019512128A (ja) データの秘匿性−実用性間のトレードオフを算出するためのシステムおよび方法
Fronzetti Colladon et al. Forecasting financial markets with semantic network analysis in the COVID‐19 crisis
Nam et al. City size distribution as a function of socioeconomic conditions: an eclectic approach to downscaling global population
WO2023123933A1 (fr) Procédé et dispositif de détermination d'informations de type d'utilisateur, et support de stockage
CN115375177A (zh) 用户价值评估方法、装置、电子设备及存储介质
Wang et al. An unsupervised strategy for defending against multifarious reputation attacks
Renigier-Biłozor et al. Optimization of the variables selection in the process of real estate markets rating
Navdeep et al. Role of big data analytics in analyzing e-Governance projects
Park et al. Using total sample size weights in meta-analysis of log-odds ratios
Youssef Digital Transformation in Tunisia: Under Which Conditions Could the Digital Economy Benefit Everyone?
Verma et al. Variance measures with ordered weighted aggregation operators
TW201810158A (zh) 評估任何實體對於包含召募或雇用決策、討債追踪、保險承保、信用決定、或縮短或改進銷售循環的活動的信賴度、勝任度及/或協調度
CN116402625A (zh) 客户评估方法、装置、计算机设备及存储介质
WO2019218517A1 (fr) Serveur, procédé de traitement de données de texte et support de stockage
CN115713248A (zh) 对用于交易所的数据打分和评价的方法
JP5156692B2 (ja) 擬似データ生成装置、擬似データ生成方法及びコンピュータプログラム
CN115713424A (zh) 风险评估方法、风险评估装置、设备及存储介质
CN113781247A (zh) 协议数据推荐方法、装置、计算机设备及存储介质
CN114066502A (zh) 一种基于ai大数据的目标客户分析方法、系统、设备及计算机可读介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20923309

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20923309

Country of ref document: EP

Kind code of ref document: A1