CN117788174A

CN117788174A - Financial user data security protection method based on blockchain

Info

Publication number: CN117788174A
Application number: CN202410205153.3A
Authority: CN
Inventors: 王超; 宋振盼
Original assignee: Shandong Huachuang Yuanzhi Information Technology Co ltd
Current assignee: Shandong Huachuang Yuanzhi Information Technology Co ltd
Priority date: 2024-02-26
Filing date: 2024-02-26
Publication date: 2024-03-29
Anticipated expiration: 2044-02-26
Also published as: CN117788174B

Abstract

The invention relates to the field of data security protection, in particular to a financial user data security protection method based on a blockchain. According to the method, firstly, multi-dimensional financial data of each user are acquired, the user is taken as a node of a blockchain, differences of data with the same dimension of the financial data between any two nodes are analyzed, dimension similarity with the same dimension is obtained, real similarity between any two nodes is obtained by combining the dimension similarity with the same dimension, an initial connected graph is constructed through the real similarity, the initial connected graph is optimized based on the real similarity between the nodes in the constructed connected path, the nodes in the acquired optimized connected graph are clustered, and the financial data of each node in each cluster is encrypted and protected. The invention can optimize the connected graph, improve the clustering effect of the nodes and ensure that the financial data of each node in the cluster is safer after being encrypted.

Description

Financial user data security protection method based on blockchain

Technical Field

The invention relates to the field of data security protection, in particular to a financial user data security protection method based on a blockchain.

Background

The block chain is a distributed account book technology with the characteristics of decentralization, safety, reliability, non-falsification and the like, so that more and more financial data are stored in the form of the block chain, but if the financial data of the user are leaked or abused, great losses are caused to the user and the financial industry, and the safety protection of the related data of the financial user is very important.

In the related art, a connected graph splitting and clustering algorithm is generally used for clustering user nodes in a blockchain, and financial data of each user in the same cluster are encrypted to improve the safety of the financial data, but due to the fact that a large number of redundant edges exist in the connected graph constructed by the existing connected graph splitting and clustering algorithm, the clustering effect on the user nodes in the blockchain is poor, and further the safety of encrypting the financial data of the user in the cluster is low.

Disclosure of Invention

In order to solve the technical problem that the clustering effect on user nodes in a blockchain is poor due to the fact that a large number of redundant edges exist in a connected graph constructed by the traditional connected graph splitting and clustering algorithm, and further the security of encrypting financial data of users in a clustered cluster is low, the invention aims to provide a blockchain-based financial user data security protection method, and the adopted technical scheme is as follows:

The invention provides a financial user data security protection method based on a blockchain, which comprises the following steps:

acquiring financial data of each user in different preset statistical periods within a preset historical time period, wherein the financial data at least comprises data of two dimensions;

taking each user as a node in a blockchain, taking two arbitrarily selected nodes as a node group to be tested, taking any dimension of financial data as a dimension to be tested, and obtaining the dimension similarity of the financial data of the two nodes in the node group to be tested between the dimensions to be tested according to the difference of the dimension data to be tested in all the financial data of the two nodes in the node group to be tested; obtaining the real similarity between two nodes in the node group to be tested according to the dimension similarity between all dimensions of the financial data of the two nodes in the node group to be tested;

constructing an initial connected graph for all nodes based on the real similarity between any two nodes; randomly selecting a node from the initial connected graph as a target node, and constructing a connected path of the target node by taking the target node as a starting point; according to the real similarity between the target node and other nodes in each communication path, redundant edges are screened out from all edges of the initial communication graph; deleting all redundant edges in the initial communication graph to obtain an optimized communication graph;

And clustering the nodes in the optimized connection graph to obtain different clustering clusters, and encrypting the financial data of all the nodes in each clustering cluster.

Further, the obtaining the dimension similarity between the dimension to be measured of the financial data of the two nodes in the node group to be measured according to the difference of the dimension to be measured data of all the financial data of the two nodes in the node group to be measured includes:

establishing a two-dimensional coordinate system by taking time as an abscissa and taking data of a dimension to be measured as an ordinate;

mapping the data of the dimension to be measured in all financial data of two nodes in the node group to be measured into a two-dimensional coordinate system, and performing polynomial fitting on the data of the dimension to be measured of each node in the two-dimensional coordinate system to obtain a fitting curve of the dimension to be measured of each node in the node group to be measured;

dividing each fitting curve according to preset time length to obtain the period of each fitting curve;

taking the average value of the data in each period as the whole data value of each period on the fitting curve; carrying out negative correlation mapping on the absolute value of the difference value of the integral data values between the same periods of the two fitting curves to obtain the data similarity between the two fitting curves in the same periods;

Obtaining extreme points on each fitting curve, wherein in each fitting curve, curve segments between adjacent extreme points are used as sub-curve segments, and the average value of slopes at all data in each sub-curve segment is used as a slope parameter of each sub-curve segment;

taking the accumulated value of the absolute values of the slope parameters of all the sub-curve segments in each period as a curve fluctuation value of each period on a fitting curve; carrying out negative correlation mapping on the absolute value of the difference value of the curve fluctuation values between the same periods of the two fitting curves to obtain fluctuation similarity between the two fitting curves in the same periods;

taking the product value of the data similarity and the fluctuation similarity as initial similarity of two fitting curves in the same period;

and obtaining the dimension similarity of the financial data of two nodes in the node group to be measured between the dimensions to be measured according to the initial similarity of the two fitting curves between all the same periods.

Further, the obtaining the dimension similarity between the dimensions to be measured of the financial data of two nodes in the node group to be measured according to the initial similarity between the two fitting curves in all the same period includes:

And taking the average value of initial similarity of the two fitting curves in all the same period as the dimension similarity of financial data of two nodes in the node group to be measured between the dimensions to be measured.

Further, the obtaining the true similarity between the two nodes in the node group to be measured according to the dimensional similarity between all the dimensions of the financial data of the two nodes in the node group to be measured includes:

normalizing the accumulated value of the dimension similarity of the financial data of the two nodes in the node group to be tested among all dimensions to obtain the real similarity between the two nodes in the node group to be tested.

Further, the constructing an initial connected graph for all nodes based on the true similarity between any two nodes includes:

sequencing the true similarity between all the two nodes according to the sequence from big to small to obtain a similarity sequence;

taking the lower quartile of the similarity sequence as a similarity threshold;

and connecting the two nodes with the real similarity larger than the similarity threshold in a straight line to obtain an initial connected graph.

Further, the filtering redundant edges from all edges of the initial connected graph according to the real similarity between the target node and other nodes in each connected path includes:

In the initial communication diagram, other nodes directly connected with the target node are used as main connection nodes of the target node, and other nodes which are not directly connected with the target node are used as secondary connection nodes of the target node;

taking the average value of the true similarity between the target node and all the main connection nodes in each communication path as the first integral similarity of each communication path of the target node; taking the average value of the true similarity between the target node and all secondary connection nodes in each communication path as a second overall similarity of each communication path of the target node;

taking the difference value of the first overall similarity and the second overall similarity as the ineffective degree of each communication path of the target node;

taking the communication path with the invalidation degree larger than the similarity threshold as an invalidation path; in each invalid path, taking other edges except the edge taking the target node as an endpoint as edges to be deleted in the invalid path, and marking each edge to be deleted once; traversing other nodes except the target node in the initial connected graph, so as to obtain all edges to be deleted and the marked times of each edge to be deleted in the initial connected graph;

And obtaining redundant edges in the initial communication graph according to the marked times of each edge to be deleted in the initial communication graph.

Further, the obtaining the redundant edge in the initial communication graph according to the marked times of each edge to be deleted in the initial communication graph includes:

taking the average value of marked times of all edges to be deleted as a marking time threshold;

and taking the edges to be deleted, of which the marked times are larger than the marked times threshold value, as redundant edges of the initial connected graph.

Further, the clustering the nodes in the optimized connection graph to obtain different clusters includes:

based on a connected graph splitting clustering algorithm, the nodes in the optimized connected graph are clustered to obtain different clustering clusters.

Further, the encrypting and protecting the financial data of all the nodes in each cluster includes:

and (3) based on an AES encryption algorithm, carrying out encryption protection on the financial data of all nodes in each cluster.

Further, the constructing the communication path of the target node by taking the target node as a starting point includes:

selecting a target node as a base node, and marking the target node;

if the number of marked nodes is smaller than the preset number, randomly selecting one node which is directly connected with the base node and is not marked as a new base node, marking the new base node, and continuously executing the selection process;

If the number of the marked nodes is not less than the preset number, terminating the selection process;

and sequencing the marked nodes according to the selection sequence to obtain a node sequence, and taking the combination of edges between all two adjacent nodes in the node sequence as a communication path of the target node in the initial communication graph.

The invention has the following beneficial effects:

according to the method, a large number of redundant edges exist in the connected graph constructed by the conventional connected graph splitting clustering algorithm, so that the clustering effect on user nodes in a block chain is poor, further the safety of encrypting financial data of users in a clustered cluster is low, therefore, the method firstly acquires financial data of a plurality of users in different preset statistical periods in a preset historical time period, and the number of financial data of each user in the scene is considered to be large, so that the difference of data between to-be-detected dimensions in all financial data of any two nodes is analyzed, the acquired dimensional similarity reflects the similarity degree of the same dimensional data of the financial data of the two nodes, the dimensional similarity between the two nodes is further combined, the acquired real similarity measures the similarity degree between the two nodes, the initial connected graph with better quality is conveniently constructed by combining the real similarity between the two nodes, the effect of subsequent clustering is improved, the method is considered to be only by the real similarity between the two nodes, the real similarity between the two nodes is not considered, a large number of edges exist in the initial connected graph, the redundant graph is more than the real similarity between the two nodes, the redundant nodes are found, the method is based on the fact that the redundancy nodes in the initial connected graph is more, the redundant nodes are more than the target nodes are connected, and the redundant nodes are more completely connected, and the method is better, and the method is realized, and the method is better than the method is based on the method that the method is better than the method.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of a method for protecting financial user data security based on blockchain according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a communication path of a target node in an initial communication graph according to an embodiment of the present invention.

Detailed Description

In order to further describe the technical means and effects adopted by the present invention to achieve the preset purpose, the following detailed description refers to the specific implementation, structure, characteristics and effects of a blockchain-based financial user data security protection method according to the present invention with reference to the accompanying drawings and preferred embodiments. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

An embodiment of a financial user data security protection method based on a blockchain is provided:

the following specifically describes a specific scheme of the block chain-based financial user data security protection method provided by the invention with reference to the accompanying drawings.

Referring to fig. 1, a flowchart of a blockchain-based financial user data security protection method according to an embodiment of the invention is shown, where the method includes:

step S1: and acquiring financial data of different users in different preset statistical periods within a preset historical time period, wherein the financial data at least comprises data of two dimensions.

Blockchains are distributed ledger technologies with the characteristics of decentralization, safety, reliability, non-tampering and the like, so that more and more financial data are stored in the form of blockchains, but if the financial data of a user are revealed or abused, great losses are caused to the user and the financial industry, and the safety protection of the related data of the financial user is very important. In the related art, user nodes in a blockchain are clustered based on a connected graph splitting clustering algorithm, so that users in the same cluster have similar financial data, the encrypted financial data can be more difficult to crack after the similar financial data of the users in the same cluster are encrypted, the safety of the financial data of the users is improved, but due to the fact that a large number of redundant edges exist in the connected graph constructed by the traditional connected graph splitting clustering algorithm, the quality of the constructed connected graph is poor, the clustering effect on the user nodes in the blockchain is reduced, the users with higher similarity are separated into different clusters, the encrypted financial data is easy to crack, and the safety of the encrypted financial data is reduced. The invention provides a financial user data security protection method based on a blockchain to solve the problem.

According to the embodiment of the invention, financial data of a plurality of users in a preset historical time period are acquired from a platform through docking a comprehensive credit service platform for financing small and medium enterprises in the whole country, the number of the users in one embodiment of the invention is set to be 50, the preset historical time period is set to be 5 years, the number of the users and the preset historical time period can be set by an implementer according to specific implementation scenes, the financial data comprise data of multiple dimensions such as business income, asset liability rate, profit rate, enterprise cash flow and the like, the financial data of the users are counted once every preset counting period, namely, the financial data used are counted once every 1 month, so that the plurality of financial data exist for each user, the number of the financial data of different users is the same, and the preset counting period can be set by the implementer according to specific implementation scenes.

After the financial data of each user is obtained, the similarity between the users can be measured based on the financial data in the follow-up, so that a connection diagram is constructed, and the clustering of the users and the encryption protection of the financial data of the users are realized.

Step S2: taking each user as a node in a blockchain, taking two arbitrarily selected nodes as a node group to be tested, taking any dimension of financial data as a dimension to be tested, and obtaining the dimension similarity of the financial data of the two nodes in the node group to be tested between the dimensions to be tested according to the difference of the dimension data to be tested in all the financial data of the two nodes in the node group to be tested; and obtaining the real similarity between the two nodes in the node group to be tested according to the dimension similarity of the financial data of the two nodes in the node group to be tested between all the dimensions.

Because the financial data of users can be stored more safely in the form of a blockchain, each user can be firstly regarded as a node in the blockchain, and the nodes are clustered through a connected graph splitting clustering algorithm in the follow-up process, so that the encryption protection is realized on the financial data of the users in the same cluster, the connected graph splitting clustering algorithm is used for constructing a connected graph through the nodes, and splitting the connected graph gradually to form a plurality of connected subgraphs until convergence is achieved, and finally each connected subgraph is a cluster, so that the connected graph needs to be firstly constructed on the nodes, the construction of the connected graph is carried out by evaluating the similarity between two nodes, in order to facilitate the follow-up analysis, two arbitrarily selected nodes can be used as node groups to be tested, the similarity between the nodes needs to be evaluated through the similarity of the financial data in the scene, the similarity between the nodes is usually measured through Euclidean distance or cosine similarity between the data, but as the data of the same dimension of different nodes are different in time, thus the similarity between the two nodes can not be used as the similarity between the two nodes in the prior art, the two dimensions can not be used as the similarity between the two nodes, the data of the dimension of the financial data can be measured, and the similarity between the two nodes can be measured through the similarity between the two dimensions is used as the similarity between the node dimension of the node groups, the similarity degree between two nodes can be measured by further combining the dimension similarity of all the same dimensions of the financial data of the two nodes, so that a communication graph with better effect is constructed.

Preferably, in an embodiment of the present invention, the method for acquiring the dimension similarity between the dimensions to be measured of the financial data of two nodes in the node group to be measured specifically includes:

because in one embodiment of the invention, a plurality of financial data exist in each node, and the financial data of each node is counted once every 1 month, the change of dimension data to be measured of the financial data of each node is related to time, so that the time can be taken as an abscissa, and a two-dimensional coordinate system can be established by taking the data of the dimension to be measured as an ordinate;

mapping the data of the dimension to be measured in all financial data of two nodes in the node group to be measured into a two-dimensional coordinate system, for example, mapping the data of the dimension of the business income in all financial data of one node in the node group to be measured into the two-dimensional coordinate system according to the sequence of statistical time, mapping the data of the dimension of the business income in all financial data of the other node into the two-dimensional coordinate system, and further performing polynomial fitting on the data of all the dimension to be measured of each node in the two-dimensional coordinate system to obtain a fitting curve of the dimension to be measured of each node in the node group to be measured, namely, each node in the node group to be measured corresponds to one fitting curve, wherein the polynomial fitting technology is a technical means well known to a person in the art and is not repeated herein;

Dividing each fitting curve according to preset time length to obtain a period of each fitting curve, wherein the preset time length is set to be 1 year, and specific values of the preset time length can also be set by an implementer according to specific implementation scenes, wherein the specific values are not limited herein, namely each period of the fitting curve contains data of dimension to be measured of financial data in one year; taking the average value of the data in each period as the whole data value of each period on the fitting curve; carrying out negative correlation mapping on the absolute value of the difference value of the overall data value between the same periods of the two fitting curves to obtain the data similarity between the two fitting curves in the same periods;

obtaining extreme points on each fitting curve based on Newton's method, wherein the extreme points comprise maximum points and minimum points, newton's method is a technical means well known to those skilled in the art, and not described in detail herein, in each fitting curve, curve segments between adjacent extreme points are used as sub-curve segments, wherein adjacent extreme points are two extreme points with different types, for example, the extreme point adjacent to the maximum point is a minimum point, the extreme point adjacent to the minimum point is a maximum point, and the length of the sub-curve segments is smaller than the length of a period, namely, a plurality of sub-curve segments exist in each period of the fitting curve, and the average value of slopes at all data in each sub-curve segment is used as the slope parameter of each sub-curve segment; taking the accumulated value of absolute values of slope parameters of all sub-curve segments in each period as a curve fluctuation value of each period on a fitting curve; carrying out negative correlation mapping on the absolute value of the difference value of the curve fluctuation values between the same periods of the two fitting curves to obtain the fluctuation similarity of the two fitting curves between the same periods;

Taking the product value of the data similarity and the fluctuation similarity as the initial similarity of the two fitting curves in the same period; and taking the average value of initial similarity of the two fitting curves in all the same period as the dimension similarity of financial data of two nodes in the node group to be measured between the dimensions to be measured. The expression of the dimension similarity may specifically be, for example:

wherein,representing the dimension similarity of financial data of two nodes in the node group to be tested among the dimensions to be tested;fitting curve representing one of the nodes of the node group to be tested +.>Fitting curve of cycle to another nodeInitial similarity between cycles; />The number of the periods of the fitting curve of each node in the node group to be tested is represented, and the number of the periods of the fitting curves of the two nodes is equal; />Fitting curve representing one of the nodes of the node group to be tested is at +.>Global data value of one cycle,/->Fitting curve representing another node of the node group to be tested at +.>A global data value of each cycle; />Fitting curve representing one of the nodes of the node group to be tested is at +.>Curve fluctuation value of each cycle, +.>Fitting curve representing another node of the node group to be tested at +. >Curve fluctuation values of each cycle; />The adjustment parameter is represented, the denominator is prevented from being 0, the adjustment parameter is set to be 0.01, and specific numerical values of the adjustment parameter can be set by an implementer according to specific implementation scenarios, and are not limited herein.

In the process of acquiring the dimension similarity, the dimension similarityThe degree of similarity between dimension data to be measured of financial data of two nodes in the node group to be measured can be reflected, and the degree of dimension similarity is +.>The larger the data describing the dimension to be measured of two nodes and the more similar the case of the data over time, wherein two are to be simulatedAbsolute value of the difference of the overall data values between the same periods of the sum curve +.>The smaller the difference of the data of the dimension to be measured of the financial data of the two nodes in the same period is, therefore, the data similarity between the same periods is +.>The larger the data of the dimension to be measured of the financial data of the two nodes in the same period is, the more similar the two nodes are, the absolute value of the difference value of the curve fluctuation values between the same periods of the two fitting curves is ≡>The smaller the difference of the data of the dimension to be measured of the financial data of the two nodes in the same period along the time change condition is, the fluctuation similarity between the same periods is ∈ >The larger the data is, the more similar the data of the dimension to be measured of the financial data of the two nodes are in the same period is along with the change of time, the data similarity and the fluctuation similarity are combined, and the product value of the data similarity and the fluctuation similarity is taken as the initial similarity +.>Initial similarity->The larger the data of the dimension to be measured of the financial data of the two nodes in the same period and the more similar the condition that the data changes along with time are, the average value of initial similarity of the two fitting curves between all the same periods is used as the dimension similarity of the financial data of the two nodes between the dimension to be measured>。

Since the financial data has data with multiple dimensions, two node groups to be measured can be obtained by the methodDimension similarity between other dimensions of financial data of individual nodes, for facilitating subsequent analysisFinancial data representing two nodes in the node group under test at +.>The dimension similarity between each dimension is further combined with the dimension similarity between all dimensions of financial data of two nodes in the node group to be tested to obtain the real similarity between the two nodes, and the degree of similarity between the two nodes is measured through the real similarity, so that a subsequent connected graph is conveniently constructed based on the real similarity.

Preferably, in an embodiment of the present invention, the method for obtaining the true similarity between two nodes in the node group to be tested specifically includes:

normalizing the accumulated value of the dimension similarity of the financial data of the two nodes in the node group to be tested among all dimensions to obtain the real similarity between the two nodes in the node group to be tested. The expression of true similarity may specifically be, for example:

wherein,representing the real similarity between two nodes in the node group to be tested; />Financial data representing two nodes in the node group under test at +.>Dimension similarity between the individual dimensions; />Representing the number of all dimensions of each financial data;representing the normalization function.

In the process of obtaining the true similarity between two nodes in the node group to be tested, the true similarityThe larger the similarity of the financial data between two nodes in the node group to be tested is, the more similar the two nodes are, wherein the dimension similarity is +.>The larger the data of the same dimension illustrating the financial data of the two nodes is, the more similar the data of the same dimension illustrating the financial data of the two nodes is, so the similarity between the financial data of the two nodes is evaluated in combination with the accumulated value of the dimension similarity of all the dimensions, and the true similarity is ++ >Is limited at->And in the range, the construction of a subsequent communication diagram is convenient.

The method can acquire the real similarity between any two nodes, and can construct a corresponding connected graph based on the real similarity between every two nodes in the follow-up, so that the connected graph can be clustered, and the financial data of the nodes in the clustered cluster can be encrypted and protected.

Step S3: based on the real similarity between any two nodes, constructing an initial connected graph for all the nodes; randomly selecting a node from the initial connected graph as a target node, and constructing a connected path of the target node by taking the target node as a starting point; according to the real similarity between the target node and other nodes in each communication path, redundant edges are screened out from all edges of the initial communication graph; and deleting all redundant edges in the initial connected graph to obtain the optimized connected graph.

In the connected graph splitting and clustering algorithm, the connected graph is constructed based on the similarity between every two nodes, and the corresponding connected graph is constructed by using a similar method, so that the real similarity between any two nodes is obtained in the steps, the initial connected graph can be constructed for all the nodes directly based on the real similarity between any two nodes, the initial connected graph can be further optimized in the follow-up process, and the clustering effect and the security of the financial data encryption protection of each node in the cluster are improved.

Preferably, in one embodiment of the present invention, the method for acquiring an initial connectivity map specifically includes:

in the connected graph split clustering algorithm, an edge is arranged between two nodes meeting the condition according to the magnitude relation between the similarity between the two nodes and a threshold value, and the two nodes are communicated, so that a connected graph is constructed, and therefore the true similarity between all the two nodes can be sequenced according to the sequence from large to small, and a similarity sequence is obtained; the lower quartile of the similarity sequence is used as a similarity threshold, and in other embodiments of the invention, the true similarity can be sequenced in order from small to large, and the upper quartile of the similarity sequence is used as the similarity threshold, which is not limited herein, and the acquisition of the lower quartile and the upper quartile is a technical means well known to those skilled in the art, and is not described herein; and connecting two nodes with the true similarity larger than the similarity threshold value in a straight line, namely setting an edge between the two nodes, so as to obtain an initial connected graph.

Because the edges between one node and the other node in the initial connected graph are constructed only through the real similarity between the two nodes, the real similarity between the two nodes and the other nodes in the initial connected graph is not considered, a large number of unnecessary edges, namely redundant edges, exist in the initial connected graph, the quality of the constructed initial connected graph is poor, the efficiency of subsequent clustering and the clustering effect are reduced, namely the speed of clustering convergence is reduced, and meanwhile users with financial data with higher similarity can be separated into different clusters, so that the safety of the encrypted financial data is reduced.

Preferably, in an embodiment of the present invention, the method for acquiring the communication path of the target node specifically includes:

selecting a target node as a base node, and marking the target node; if the number of marked nodes is smaller than the preset number, randomly selecting one node which is directly connected with the base node and is not marked as a new base node, marking the new base node, and continuously executing the selection process; if the number of the marked nodes is not less than the preset number, terminating the selection process; the marked nodes are ordered according to a selection sequence to obtain a node sequence, in an initial communication graph, the combination of edges between every two adjacent nodes in the node sequence is taken as one communication path of a target node, after each communication path is found, the marks on the nodes are cleared, a plurality of communication paths of the target node are obtained through the method, the preset number is set to be 6, namely 6 nodes including the target node exist on each communication path, meanwhile, the number of edges on each communication path is 1 less than the number of the nodes on each communication path, the specific numerical value of the preset number can be set by an implementer according to a specific implementation scene, the implementation is not limited, fig. 2 is a schematic diagram of the communication path of the target node in the initial communication graph, wherein the node M is the target node, the combination of edges M-1, 1-2, 2-3, 3-4 and 4-5 forms one communication path of the target node, meanwhile, the combination of edges M-A, A-B, B-C, C-D, D-E forms the other communication path of the target node, and the other communication path is not needed to be represented in the other communication path, and the communication path is not represented in the other communication path, and the communication path is taken as an example.

After different communication paths of the target node are obtained, the true similarity between the target node and other nodes on the communication paths can be analyzed, redundant edges can be screened out from the initial communication graph, the redundant edges can be deleted in the follow-up process, the optimization of the initial communication graph is realized, the optimized communication graph is obtained, and the nodes in the optimized communication graph can be clustered in the follow-up process, so that the final clustering effect is improved.

Preferably, in one embodiment of the present invention, the method for acquiring the redundant edge in the initial connected graph specifically includes:

in the initial connection graph, other nodes directly connected to the target node are taken as main connection nodes of the target node, other nodes which are not directly connected to the target node are taken as secondary connection nodes of the target node, for example, in a connection path M-1-2-3-4-5 in fig. 2, nodes 1 and 4 are main connection nodes of the target node M, and nodes 2, 3 and 5 are secondary connection nodes of the target node M; taking the average value of the true similarity between the target node and all the main connection nodes in each communication path as the first integral similarity of each communication path of the target node; taking the average value of the true similarity between the target node and all the secondary connection nodes in each communication path as a second overall similarity of each communication path of the target node, wherein the target node is directly connected with the primary connection node but is not directly connected with the secondary connection node, so that the true similarity between the target node and the primary connection node is larger than the true similarity between the target node and the secondary connection node, and the first overall similarity is larger than the second overall similarity; taking the difference value of the first overall similarity and the second overall similarity as the ineffective degree of each communication path of the target node. The expression of the degree of invalidity may specifically be, for example:

Wherein,representing the->Strip communicationThe degree of invalidity of the path; />Representing +.>In the communication path, the target node and the +.>True similarity between the individual primary connection nodes; />Representing the->The number of primary connection nodes in the communication path; />Representing +.>In the communication path, the target node and the +.>True similarity between the secondary connection nodes; />Representing the->The number of secondary connection nodes in the communication path.

In the process of acquiring the invalidation degree of each communication path of the target nodeThe larger the difference in the degree of similarity between the primary connection node and the secondary connection node, respectively, and the target node in the communication path is, the larger the difference is, so that in the following, invalid paths can be extracted from all the communication paths of the target node based on the degree of invalidity,and then extracting redundant edges in the initial connected graph.

Because the greater the invalidation degree is, the greater the difference of the similarity between the main connection node and the secondary connection node in the communication path and the target node is, and the more possible redundant edges are in the communication path is further, in one embodiment of the invention, the communication path with the invalidation degree larger than the similarity threshold value is taken as the invalidation path; in each invalid path, the edges except the edge taking the target node as an endpoint are taken as edges to be deleted in the invalid path, and each edge to be deleted is marked once, for example, if the communication path M-1-2-3-4-5 in fig. 2 is an invalid path, the edges 1-2, 2-3, 3-4 and 4-5 are edges to be deleted on the invalid path, and meanwhile, the edges are marked once, and each edge to be deleted is marked multiple times because of the overlapping edges between different communication paths; by the same method, other nodes except the target node in the initial connected graph are traversed, edges to be deleted can be extracted from the connected paths of the other nodes, and marking of the edges to be deleted is completed, so that all the edges to be deleted and the marked times of each edge to be deleted in the initial connected graph can be obtained.

The more marked the edges to be deleted are, the greater the possibility that the edges to be deleted are redundant edges is, so that the redundant edges in the initial communication graph can be obtained according to the marked times of each edge to be deleted in the initial communication graph, and in one embodiment of the invention, the average value of the marked times of all the edges to be deleted is used as a marking time threshold; and taking the edges to be deleted, of which the marked times are larger than the marked times threshold value, as redundant edges of the initial connected graph.

Because the redundant edges are the edges which are unnecessary in the initial connected graph, the redundant edges in the initial connected graph can be deleted, and the optimized connected graph is obtained, so that the optimization of the initial connected graph is realized, the nodes in the optimized connected graph can be clustered based on a connected graph splitting clustering algorithm in the follow-up process, the speed of clustering convergence can be improved, the nodes with similar financial data can be divided into the same cluster, the clustering effect is improved, and the security of the encrypted financial data of the nodes in the same cluster is higher.

It should be noted that, in the process of constructing the initial communication map, a plurality of independent initial communication maps may be obtained, and the optimization method for other initial communication maps is exactly the same as the above method.

Step S4: and clustering the nodes in the optimized connection graph to obtain different clustering clusters, and encrypting the financial data of all the nodes in each clustering cluster.

After the initial connected graph is optimized through the steps, an optimized connected graph with better quality is obtained, so that nodes in the optimized connected graph can be directly clustered to obtain different clustering clusters, the similarity of financial data of each node in the same clustering cluster is higher, and the security of encrypting the financial data of each node in the same clustering cluster is higher and is not easy to crack.

Preferably, in one embodiment of the present invention, the nodes in the optimized connectivity graph are clustered based on a connectivity split clustering algorithm, so as to obtain a plurality of clusters, and the financial data of all the nodes in each cluster are encrypted and protected based on an AES encryption algorithm, where the connectivity split clustering algorithm and the AES encryption algorithm are technical means well known to those skilled in the art, and are not described herein.

In summary, in the embodiment of the present invention, financial data of each user is first obtained, each user is taken as a node in a blockchain, two arbitrarily selected nodes are taken as a node group to be tested, any one dimension of the financial data is taken as a dimension to be tested, and the dimension similarity between the financial data of two nodes in the node group to be tested and the dimension to be tested is obtained according to the difference of the dimension to be tested in all the financial data of the two nodes in the node group to be tested; obtaining the real similarity between two nodes in the node group to be tested according to the dimension similarity between all dimensions of the financial data of the two nodes in the node group to be tested; based on the real similarity between any two nodes, constructing an initial connected graph for all the nodes; randomly selecting a node from the initial connected graph as a target node, and constructing a connected path of the target node by taking the target node as a starting point; according to the real similarity between the target node and other nodes in each communication path, redundant edges are screened out from all edges of the initial communication graph; deleting all redundant edges in the initial communication graph to obtain an optimized communication graph; and clustering the nodes in the optimized connection graph to obtain different clustering clusters, and encrypting the financial data of all the nodes in each clustering cluster.

An embodiment of a clustering method for encryption of financial user data:

in the related art, a connected graph splitting and clustering algorithm is generally used for clustering user nodes in a blockchain so as to improve the security of encrypting financial data of each user in a cluster, but the clustering effect on the user nodes in the blockchain is poor due to the fact that a large number of redundant edges exist in the connected graph constructed by the traditional connected graph splitting and clustering algorithm.

To solve the problem, the present embodiment provides a clustering method for encrypting financial user data, including:

step S1: acquiring financial data of different users in different preset statistical periods within a preset historical time period, wherein the financial data at least comprises data of two dimensions;

step S2: taking each user as a node in a blockchain, taking two arbitrarily selected nodes as a node group to be tested, taking any dimension of financial data as a dimension to be tested, and obtaining the dimension similarity of the financial data of the two nodes in the node group to be tested between the dimensions to be tested according to the difference of the dimension data to be tested in all the financial data of the two nodes in the node group to be tested; obtaining the real similarity between two nodes in the node group to be tested according to the dimension similarity between all dimensions of the financial data of the two nodes in the node group to be tested;

Step S3: based on the real similarity between any two nodes, constructing an initial connected graph for all the nodes; randomly selecting a node from the initial connected graph as a target node, and constructing a connected path of the target node by taking the target node as a starting point; according to the real similarity between the target node and other nodes in each communication path, redundant edges are screened out from all edges of the initial communication graph; deleting all redundant edges in the initial communication graph to obtain an optimized communication graph; and clustering the nodes in the optimized connection graph to obtain different clustering clusters.

The details of the steps S1 to S3 are given in the above embodiment of the blockchain-based financial user data security protection method, and are not described herein.

The beneficial effects brought by the embodiment are as follows: according to the method, a large number of redundant edges exist in the communication graph constructed by the conventional communication graph splitting and clustering algorithm, so that the clustering effect on user nodes in a block chain is poor, the subsequent encryption of financial data is low, therefore, the method firstly acquires financial data of a plurality of users in different preset statistical periods in a preset historical time period, and the number of financial data of each user in the scene is considered to be large, so that the difference of data between to-be-detected dimensions in all financial data of any two nodes is analyzed, the acquired dimensional similarity reflects the similarity degree of the same dimensional data of the financial data of the two nodes, the acquired dimensional similarity is further combined with the dimensional similarity, the acquired dimensional similarity measures the similarity degree between the two nodes, the subsequent establishment of an initial communication graph with better quality is facilitated by combining the real similarity between the two nodes, the subsequent communication graph is improved, the real similarity between the two nodes is only considered, the real similarity between the two nodes is not considered, a large number of edges exist in the initial communication graph, the redundant graph is optimized, the redundancy graph is filtered out of the target nodes, and the redundancy graph is filtered, and the redundancy graph is improved, and the redundancy graph is filtered, and the redundant paths between the target nodes are connected with the subsequent nodes are based on the real communication paths.

It should be noted that: the sequence of the embodiments of the present invention is only for description, and does not represent the advantages and disadvantages of the embodiments. The processes depicted in the accompanying drawings do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments.

Claims

1. A blockchain-based financial user data security method, the method comprising:

2. The blockchain-based financial user data security protection method of claim 1, wherein obtaining the dimension similarity between the dimensions of the financial data of the two nodes in the node group to be tested according to the difference of the dimension data to be tested in all the financial data of the two nodes in the node group to be tested comprises:

3. The blockchain-based financial user data security protection method of claim 2, wherein obtaining the dimension similarity of the financial data of two nodes in the node group to be tested between the dimensions to be tested according to the initial similarity of the two fitting curves between all the same periods comprises:

4. The blockchain-based financial user data security protection method of claim 1, wherein the obtaining the true similarity between the two nodes in the node group to be tested according to the dimensional similarity between all dimensions of the financial data of the two nodes in the node group to be tested comprises:

5. The blockchain-based financial user data security protection method of claim 1, wherein the constructing an initial connected graph for all nodes based on the true similarity between any two nodes comprises:

taking the lower quartile of the similarity sequence as a similarity threshold;

6. The blockchain-based financial user data security protection method of claim 5, wherein the filtering redundant edges from all edges of the initial connected graph according to the true similarity between the target node and other nodes in each connected path comprises:

7. The method for protecting data security of a financial user based on a blockchain of claim 6, wherein the obtaining redundant edges in the initial connected graph according to the number of times each edge to be deleted in the initial connected graph is marked comprises:

8. The blockchain-based financial user data security protection method of claim 1, wherein clustering the nodes in the optimized connectivity graph to obtain different clusters comprises:

9. The blockchain-based financial user data security protection method of claim 1, wherein the encrypting protection of the financial data of all nodes in each cluster comprises:

10. The blockchain-based financial user data security protection method of claim 1, wherein the constructing the communication path of the target node with the target node as a starting point includes:

selecting a target node as a base node, and marking the target node;