CN117390495B

CN117390495B - Multi-source data risk management system and method based on big data

Info

Publication number: CN117390495B
Application number: CN202311642090.XA
Authority: CN
Inventors: 夏山俊
Original assignee: Jiangsu Rui Mdt Infotech Ltd
Current assignee: Jiangsu Rui Mdt Infotech Ltd
Priority date: 2023-12-04
Filing date: 2023-12-04
Publication date: 2024-02-20
Anticipated expiration: 2043-12-04
Also published as: CN117390495A

Abstract

The invention relates to the technical field of data management, in particular to a multi-source data risk management system and method based on big data, comprising the following steps: the system comprises a multi-source data acquisition module, a data management center, a user demand analysis module, a user classification management module and a data source access management module, wherein the multi-source data acquisition module is used for acquiring historical information of data in different third-party data sources called by a user and querying the historical information of the data from the different third-party data sources, the data management center is used for storing and managing all acquired data, the user demand analysis module is used for classifying the called data, the demand degree of the user on different types of data is analyzed, the user classification management module is used for classifying the user according to the analysis result, the data source access management module is used for selecting and accessing the data sources for the same type of user, the probability of aggravation of abnormal condition risks caused by data calling due to improper selection of the third-party data sources is reduced, and the efficiency and the success rate of data calling are improved.

Description

Multi-source data risk management system and method based on big data

Technical Field

The invention relates to the technical field of data management, in particular to a multi-source data risk management system and method based on big data.

Background

The data is often required to be acquired from a third-party data source in the network back-end data service, so that a user can be helped to acquire the data to be queried more conveniently, a plurality of third-party data sources are often required to be accessed for querying more complete and accurate data, after the plurality of third-party data sources are accessed, the data formats provided by the plurality of third-party data sources are uniformly converted, and the data content is cleaned so as to be called by the user;

however, because a plurality of third party data sources can be selected for access, different types of data are called from different third party data sources, the severity of abnormal conditions such as delayed calling data, failure and the like is different, and the third party data sources are randomly selected and accessed, so that the risk of abnormal conditions of data calling is increased due to improper selection of the third party data sources.

Therefore, there is a need for a multi-source data risk management system and method based on big data to solve the above-mentioned problems.

Disclosure of Invention

The invention aims to provide a multi-source data risk management system and method based on big data, so as to solve the problems in the background technology.

In order to solve the technical problems, the invention provides the following technical scheme: a big data based multi-source data risk management system, the system comprising: the system comprises a multi-source data acquisition module, a data management center, a user demand analysis module, a user classification management module and a data source access management module;

the output end of the multi-source data acquisition module is connected with the input end of the data management center, the output end of the data management center is connected with the input end of the user demand analysis module, the output end of the user demand analysis module is connected with the input end of the user classification management module, and the output ends of the user classification management module and the data management center are connected with the input end of the data source access management module;

the multi-source data acquisition module is used for acquiring historical information of data in different third-party data sources called by a user and querying the historical information of the data from the different third-party data sources, and transmitting all acquired data to the data management center;

storing and managing all received data through the data management center;

classifying the invoked data through the user demand analysis module, and analyzing the demand degrees of different users on different types of data;

classifying the users according to the analysis result by the user classification management module;

and selecting a data source for the same type of users to access through the data source access management module.

Further, the multi-source data acquisition module comprises a demand information acquisition unit and a calling information acquisition unit;

the output ends of the demand information acquisition unit and the calling information acquisition unit are connected with the input end of the data management center;

the demand information acquisition unit is used for acquiring the frequency information of calling data in different time periods in the past of different users;

the call information acquisition unit is used for acquiring historical information of data queried from different third party data sources, and the historical information comprises the number of times of query of the past data and the time length information spent for obtaining the data by each query.

Further, the user demand analysis module comprises a demand data classification unit, an analysis model establishment unit and a demand degree prediction unit;

the input end of the demand data classification unit is connected with the output end of the data management center, the output end of the demand data classification unit is connected with the input end of the analysis model establishment unit, and the output end of the analysis model establishment unit is connected with the input end of the demand degree prediction unit;

the demand data classification unit is used for classifying the data which are called by the user in the past according to the user demand and confirming the frequency information of calling different types of data in different time periods by different users in the past;

the analysis model building unit is used for calling the frequency information of calling different types of data in different time periods by a random user and building a call analysis model of the user on the different types of data, and a plurality of call analysis models are built when a plurality of types of data exist;

the demand degree prediction unit is used for analyzing the demand degree of the user on different types of data according to the call analysis model.

Further, the user classification management module comprises a demand degree comparison unit and a user classification unit;

the input end of the demand degree comparison unit is connected with the output end of the demand degree prediction unit, and the output end of the demand degree comparison unit is connected with the input end of the user classification unit;

the demand level comparison unit is used for comparing the demand level of a random user on different types of data and predicting the data type with the highest demand level of the corresponding user in future time;

the user classification unit is used for classifying users with highest demands on the same type of data into the same class.

Further, the data source access management module comprises a calling information analysis unit, a data source stability evaluation unit and a data source access selection unit;

the input end of the calling information analysis unit is connected with the output ends of the user classification unit and the data management center, the output end of the calling information analysis unit is connected with the input end of the data source stability evaluation unit, and the output end of the data source stability evaluation unit is connected with the input end of the data source access selection unit;

the calling information analysis unit is used for calling the searching times of the most needed data of the same type of users in the past of different third-party data sources and obtaining the time length information spent on the most needed data in each inquiry in the past to the data source stability evaluation unit;

the data source stability evaluation unit is used for evaluating the stability degree of the data source query corresponding to different third parties;

the data source access selection unit is used for comparing the stability of the data of the corresponding types of the data sources of different third parties, grouping the third party data sources according to the comparison result, selecting and accessing the most suitable group of third party data sources for the same type of users, uniformly converting the data formats provided by the plurality of third party data sources after accessing the third party data sources, and cleaning the data content for the users to call.

A multi-source data risk management method based on big data comprises the following steps:

z1: collecting historical information of data in different third-party data sources called by a user and inquiring the historical information of the data from the different third-party data sources;

z2: classifying the called data, and analyzing the demand degree of different users on different types of data;

z3: classifying the users according to the analysis result;

z4: the historical information of query data from different third-party data sources is called, and the data query stability of the third-party data sources is analyzed;

z5: and selecting a data source for the same type of user to access.

Further, in step Z1: and equally dividing the time period from T1 to T2 into n time periods, wherein T2 represents the current time, collecting the time information of the data which are called in the n time periods by different users in the past, and collecting the historical information of the data which are queried from different third party data sources, wherein the historical information comprises the time of the data which are queried in the past and the time length information spent in the data query in each time in the past.

Further, in step Z2: classifying the called data according to the data service requirement of the user;

for example: the data service requirements of the user are: the method comprises the steps that when basic information of an enterprise needs to be queried and client pool distribution information needs to be queried, information with different requirements is divided into different types of data;

k types of data are obtained in total, and the number of times of calling random type data in n different time periods by a random user is set as S= { S ₁ ，S ₂ ，…，S _n Establishing a call analysis model of the random type data by the corresponding user:

S _n+1 =τ*S _n +(1-τ)*P _n ；

predicting the number of times of calling the corresponding type data by the corresponding user in the (n+1) th time period as S _n+1 Wherein τ represents a smoothing coefficient, 0<τ<1，P _n Exponential smoothing of the number of times representing the invocation of the random class of data in the nth time period, according to formula P ₁ =τ*S ₁ +(1-τ)*[(S ₁ +S ₂ +S ₃ )/3]Calculating to obtain the exponential smoothing value P of the times of calling the random type data in the 1 st time period ₁ According to formula P ₂ =τ*S ₁ +(1-τ)*P ₁ Calculating to obtain the exponential smoothing value P of the times of calling the random type data in the 2 nd time period ₂ According to formula P ₃ =τ*S ₂ +(1-τ)*P ₂ Calculating to obtain the exponential smoothing value P of the times of calling the random type data in the 3 rd time period ₃ And so on to obtain P _n ，P _n =τ*S _n-1 +(1-τ)*P _n-1 The method comprises the steps of establishing a call analysis model of k types of data by a corresponding user in the same way, predicting the number of times of calling k types of data by the corresponding user in an n+1th time period, comparing the predicted number of times of calling, and predicting the data of the type with the largest number of times of calling as the data type with the highest requirement level of the corresponding user in the n+1th time period;

the number of times of calling different types of data by users in different time periods shows dynamic change, the number of times of calling different types of data by users in the past is acquired through a big data technology, the number of times of calling random type data by users in different time periods in the past is acquired, the number of times of calling corresponding types of data by users in a future is predicted by an exponential smoothing method, compared with other prediction algorithms, the exponential smoothing method is more suitable for data prediction in a short period, the historical data according to the prediction is the historical data of a period before the current time, rather than the historical data longer than the current time interval, the predicted number of times can reflect the requirement of the users on the corresponding types of data, and the accuracy of a prediction result is improved.

Further, in step Z3: obtaining the data types with the highest requirements of different users in the (n+1) th time period, and classifying the users with the highest requirements on the data of the same type into the same class;

the more the calling times are, the higher the demand degree of the pre-judging user on the corresponding type data in a future period is, the demand degree of different users on different types of data is compared, the most needed data types of different users in the future period are analyzed, the users with the same most needed data types are classified into the same class, the purpose of carrying out data overall planning in advance is to select the same third party data source for the same class of users to access, and compared with the situation that the third party data sources are selected one by one for the users, the workload of analysis and selection of the third party data source is reduced.

Further, in step Z4: the number of times of calling the type of data with highest random user demand degree from different third-party data sources is N= { N ₁ ，N ₂ ，…，N _f The set of the number of times of finding the data is r= { r ₁ ，r ₂ ，…，r _f Time length set spent on obtaining data from each query in random third-party data source is t= { t ₁ ，t ₂ ，…，t _c Wherein f represents the number of third party data sources to be accessed, c=r _i C represents the number of times data is searched from a random data source, and the stability Q of the data of the corresponding type is searched by a random third-party data source according to the following formula _i ：

Q _i =(r _i /N _i )×[1/[(∑ ^c _j=1 t _j )/c]]；

Wherein N is _i Representing the number of times, t, that a random class of data of the type with the highest user demand has been queried from a random third party data source in the past _j Representing the time length spent for obtaining data from the jth query in random one third-party data source, and obtaining the stability degree set of the data of the corresponding type of the query of the f third-party data sources as Q= { Q ₁ ，Q ₂ ，…，Q _i ，…，Q _f }。

Further, in step Z5: comparing the stability degree of the data of the corresponding type queried by f third party data sources, dividing the third party data sources into g groups according to the order of the stability degree from big to small, wherein the stability degree of the data of the corresponding type queried by all the third party data sources in the former group is larger than that of the data of the latter group, and obtaining a random grouping result, wherein the average value set of the stability degree of the data of the corresponding type queried by each third party data source in the g groups is L= { L ₁ ，L ₂ ，…，L _g And according to the formula w= [ (Σ) ^g _v=1 (L _v -(∑ ^g _v=1 L _v )/g) ² )/g] ^1/2 Calculating the discrete degree W of the g group parameters in a random grouping result, and calculating the discrete degree of the g group parameters in different grouping results to obtain the discrete degreeThe grouping result with the greatest degree is obtained, the third party data sources in the first group are screened out from the grouping result with the greatest discrete degree, and the third party data sources screened out are selected and accessed for the user with the highest requirement on the corresponding type of data;

the method comprises the steps of collecting past data query history information of different third-party data sources through a big data technology, analyzing the stability degree of query data of the different third-party data sources, judging that the stability degree is higher as the data query frequency is higher and the query time is shorter, grouping the third-party data sources according to the stability degree after analyzing the stability degree of the query data of the different third-party data sources, classifying the third-party data sources with similar stability degrees into a group by analyzing parameters in different grouping results, namely the discrete degree of the stability degree, selecting the group of third-party data sources with the largest stability degree to be accessed by the same user, reducing the probability of aggravation of abnormal condition risks caused by data call due to improper selection of the third-party data sources, and improving the efficiency and success rate of data call.

Compared with the prior art, the invention has the following beneficial effects:

according to the method, the number information of the user calling the data of different types in the past is acquired through a big data technology, the number information of the user calling the data of random type in the past in different time periods is acquired, the number of times that the user can call the data of corresponding type in a period of time in the future is predicted by using an exponential smoothing method, compared with other prediction algorithms, the exponential smoothing method is more suitable for data prediction in a short period, the historical data according to the selection prediction is the historical data of a period of time before the current time, rather than the historical data longer than the current time interval, the predicted number of times can reflect the requirement of the user on the data of corresponding type, and the accuracy of a prediction result is improved;

comparing the demand degree of different users on different types of data, analyzing the most needed data types of different users in a future period, classifying the users with the same most needed data types into the same class, carrying out data overall in advance, and selecting the same third party data source for the users of the same class for access, so that the workload of the analysis and selection of the third party data source is reduced compared with the situation that the third party data sources are selected for the users one by one;

the method comprises the steps of collecting past data query history information of different third-party data sources through a big data technology, analyzing the stability of query data of the different third-party data sources, grouping the third-party data sources according to the stability after analyzing the stability of the query data of the different third-party data sources, classifying the third-party data sources with similar stability into a group by analyzing parameters in different grouping results, namely the dispersion degree of the stability, selecting a group of third-party data sources with the largest stability for accessing by the same user, reducing the probability of aggravation of abnormal condition risks caused by data call due to improper selection of the third-party data sources, and improving the efficiency and success rate of data call.

Drawings

The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:

FIG. 1 is a block diagram of a big data based multi-source data risk management system of the present invention;

fig. 2 is a flow chart of a multi-source data risk management method based on big data according to the present invention.

Detailed Description

The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present invention only, and are not intended to limit the present invention.

The invention is further described below with reference to fig. 1-2 and the specific embodiments.

Example 1: as shown in fig. 1, the present embodiment provides a multi-source data risk management system based on big data, the system includes: the system comprises a multi-source data acquisition module, a data management center, a user demand analysis module, a user classification management module and a data source access management module, wherein the multi-source data acquisition module is used for acquiring historical information of data in different third-party data sources called by a user and historical information of data queried from the different third-party data sources, all acquired data are transmitted to the data management center, all received data are stored and managed through the data management center, the called data are classified through the user demand analysis module, the demand degree of different users on different types of data is analyzed, the user is classified through the user classification management module according to analysis results, and the data sources are selected for the same type of user through the data source access management module to be accessed.

The multi-source data acquisition module comprises a demand information acquisition unit and a calling information acquisition unit, wherein the demand information acquisition unit is used for acquiring the time information of calling data in different time periods in the past of different users, and the calling information acquisition unit is used for acquiring historical information of inquiring data in different third-party data sources, and comprises the time information of inquiring the data in the past and the time information of spending for inquiring the obtained data in each time.

The user demand analysis module comprises a demand data classification unit, an analysis model establishment unit and a demand degree prediction unit, wherein the demand data classification unit is used for classifying data which are called by users in the past according to user demands, confirming the frequency information of calling different types of data by different users in different time periods, the analysis model establishment unit is used for calling the frequency information of calling different types of data by a random user in different time periods and establishing a call analysis model of the user on the different types of data, a plurality of call analysis models are established for the data of a plurality of types, and the demand degree prediction unit is used for analyzing the demand degree of the user on the different types of data according to the call analysis model.

The user classification management module comprises a demand level comparison unit and a user classification unit, wherein the demand level comparison unit is used for comparing the demand level of a random user on different types of data, predicting the data type with the highest demand level of the corresponding user in the future time, and the user classification unit is used for classifying the users with the highest demand level on the same type of data into the same type.

The data source access management module comprises a calling information analysis unit, a data source stability evaluation unit and a data source access selection unit, wherein the calling information analysis unit is used for calling the searching times of data needed most by different third party data sources for the same type of users in the past and obtaining time length information spent on the most needed data for each inquiry in the past to the data source stability evaluation unit, the data source stability evaluation unit is used for evaluating the stability degree of the data corresponding to the inquiry of the different third party data sources, the data source access selection unit is used for comparing the stability degree of the data corresponding to the inquiry of the different third party data sources, the third party data sources are grouped according to the comparison result, a group of most suitable third party data sources are selected and accessed for the same type of users, after the third party data sources are accessed, the data formats provided by the plurality of third party data sources are uniformly converted, and the data content is cleaned for the users to call.

Example 2: as shown in fig. 2, the present embodiment provides a multi-source data risk management method based on big data, which is implemented based on the data management system in the embodiment, and specifically includes the following steps:

z1: collecting historical information of data in different third party data sources called by a user and historical information of data queried from the different third party data sources, and equally dividing a time period from T1 to T2 into 5 time periods, wherein T2 represents current time, collecting frequency information of data invoked by different users in the 5 time periods, and collecting historical information of data queried from the different third party data sources, wherein the historical information comprises the frequency of data query and time length information spent on data query in each time;

z2: classifying the called data, classifying the called data according to the data service requirements of users to obtain 5 types of data, analyzing the requirement degree of different users on different types of data, and calling the number of times that one random user calls the random type data in 5 different time periods in the past to be S= { S ₁ ，S ₂ ，S ₃ ，S ₄ ，S ₅ } = {10, 12, 15, 11, 18}, building a call analysis model of the random class data by the corresponding user: s is S _n+1 =τ*S _n +(1-τ)*P _n The corresponding user is predicted to call the corresponding type in the (n+1) time periodThe number of times of data is S _n+1 Where τ represents a smoothing coefficient, and τ=0.6 and p is set _n Exponential smoothing of the number of times representing the invocation of the random class of data in the nth time period, according to formula P ₁ =τ*S ₁ +(1-τ)*[(S ₁ +S ₂ +S ₃ )/3]Calculating to obtain the exponential smoothing value P of the times of calling the random type data in the 1 st time period ₁ Approximately 11, according to formula P ₂ =τ*S ₁ +(1-τ)*P ₁ Calculating to obtain the exponential smoothing value P of the times of calling the random type data in the 2 nd time period ₂ About 10, according to formula P ₃ =τ*S ₂ +(1-τ)*P ₂ Calculating to obtain the exponential smoothing value P of the times of calling the random type data in the 3 rd time period ₃ Approximately 11, and so on to obtain P _n Obtaining P ₅ =τ*S ₄ +(1-τ)*P ₄ About to 12, the number of times of calling the corresponding type data by the corresponding user in the (n+1=6) th time period is predicted to be S ₆ The method comprises the steps of (1) establishing a call analysis model of a corresponding user on 5 types of data in the same mode, predicting that the number of times of calling k types of data by the corresponding user in a 6 th time period is {16, 20, 15,8,6}, comparing the predicted number of times of calling, and predicting that the data with the highest requirement degree of the corresponding user in the 6 th time period is: second class data;

z3: classifying the users according to the analysis result to obtain the data types with the highest requirements of different users in the 6 th time period, and classifying the users with the highest requirements on the data of the same type into the same class;

z4: the historical information of the data is queried from different third-party data sources is invoked, and the frequency set for querying the data of the type with highest user demand degree in random from different third-party data sources is invoked to be N= { N ₁ ，N ₂ ，N ₃ ，N ₄ ，N ₅ ，N ₆ ，N ₇ The number of times of finding the data is set as r= { r = {10, 22, 26, 18, 15,8, 30} ₁ ，r ₂ ，r ₃ ，r ₄ ，r ₅ ，r ₆ ，r ₇ Each query from the first third party data source } = {7, 20, 25, 15, 10,8, 15}The set of time durations spent getting the data is t= { t ₁ ，t ₂ ，t ₃ ，t ₄ ，t ₅ ，t ₆ ，t ₇ } = {10, 12, 15, 14, 11, 11, 13}, in units of: second, according to formula Q _i =(r _i /N _i )×[1/[(∑ ^c _j=1 t _j )/c]]Calculating stability Q of data of corresponding type queried by random third party data source _i Wherein N is _i Representing the number of times, t, that a random class of data of the type with the highest user demand has been queried from a random third party data source in the past _j Representing the time length spent for obtaining data from the j-th query in a random third-party data source, and obtaining the stability Q of the data of the corresponding type of the query of the first third-party data source ₁ Approximately equal to 0.06, the stability degree set of the corresponding type data queried by 7 third-party data sources is Q= { Q ₁ ，Q ₂ ，Q ₃ ，Q ₄ ，Q ₅ ，Q ₆ ，Q ₇ }={0.06，0.18，0.20，0.09，0.12，0.05，0.22}；

Z5: selecting data sources for the same type of users to access, comparing the stability of querying corresponding type data from 7 third party data sources, and dividing the third party data sources into 3 groups according to the order of the stability from big to small, wherein the stability of querying the corresponding type data by all the third party data sources in the former group is greater than that of the latter group, and obtaining a random grouping result as follows: the stability degree sets of the data of the corresponding type of the query of the 3 groups of the third-party data sources are {0.22,0.20}, {0.18,0.12}, and {0.09,0.06,0.05}, respectively, and in the corresponding grouping result, the stability degree mean value set of the data of the corresponding type of the query of each group of the 3 groups of the third-party data sources is L= { L ₁ ，L ₂ ，L ₃ } = {0.21,0.15,0.07}, according to the formula w= [ (Σ) ^g _v=1 (L _v -(∑ ^g _v= ₁ L _v )/g) ² )/g] ^1/2 Calculating the discrete degree W (approximately equal to 0.06) of 3 groups of parameters in a random grouping result, and calculating the discrete degree of g groups of parameters in different grouping results, wherein the grouping result with the maximum discrete degree is obtained by the following steps: 3 groups of third party data sources query corresponding types of dataThe stability degree sets of (a) are {0.22,0.20,0.18}, {0.12,0.09}, and {0.06,0.05}, and third party data sources in the first group are selected from the grouping results with the largest discrete degree as follows: and the seventh, third and second third party data sources are selected and accessed for users with highest demands on corresponding types of data, after the third party data sources are accessed, the data formats provided by the seventh, third and second third party data sources are uniformly converted, and the data contents are cleaned for the users to call.

Finally, it should be noted that: the foregoing is merely a preferred example of the present invention, and the present invention is not limited thereto, but it is to be understood that modifications and equivalents of some of the technical features described in the foregoing embodiments may be made by those skilled in the art, although the present invention has been described in detail with reference to the foregoing embodiments. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A big data based multi-source data risk management system, characterized in that: the system comprises: the system comprises a multi-source data acquisition module, a data management center, a user demand analysis module, a user classification management module and a data source access management module;

storing and managing all received data through the data management center;

selecting a data source for the same type of users to access through the data source access management module;

the multi-source data acquisition module comprises a demand information acquisition unit and a calling information acquisition unit;

the call information acquisition unit is used for acquiring historical information of data queried from different third party data sources, and comprises the number of times of query of the past data and the time length information spent for obtaining the data by each query in the past;

the user demand analysis module comprises a demand data classification unit, an analysis model building unit and a demand degree prediction unit;

the analysis model building unit is used for calling the frequency information of calling different types of data in different time periods by a random user and building a call analysis model of the user on the different types of data;

the demand degree prediction unit is used for analyzing the demand degree of the user on different types of data according to the call analysis model;

the user classification management module comprises a demand degree comparison unit and a user classification unit;

the user classification unit is used for classifying users with highest demands on the same type of data into the same class;

the data source access management module comprises a call information analysis unit, a data source stability evaluation unit and a data source access selection unit;

the data source access selection unit is used for comparing the stability degree of the data of different third party data sources for inquiring the corresponding types, grouping the third party data sources according to the comparison result, and selecting and accessing the most suitable group of third party data sources for the same type of users.

2. A multi-source data risk management method based on big data is characterized in that: the method comprises the following steps:

z3: classifying the users according to the analysis result;

z5: selecting a data source for the same type of users to access;

in step Z1: equally dividing a time period from T1 to T2 into n time periods, wherein T2 represents the current time, collecting the time information of data calling in the n time periods in the past of different users, and collecting the historical information of data inquiry in different third party data sources, wherein the historical information comprises the time of data inquiry in the past and the time length information spent on data inquiry in each time in the past;

in step Z5: comparing the stability of the data of the corresponding type queried from the f third party data sources, dividing the third party data sources into g groups according to the order of the stability from big to small, and obtaining a random grouping result, wherein the average value set of the stability of the data of the corresponding type queried by each third party data source in the g groups is L= { L ₁ ，L ₂ ，…，L _g And according to the formula w= [ (Σ) ^g _v=1 (L _v -(∑ ^g _v=1 L _v )/g) ² )/g] ^1/2 And calculating the discrete degree W of the g group parameters in a random grouping result, calculating the discrete degree of the g group parameters in different grouping results, obtaining the grouping result with the largest discrete degree, screening the third party data sources in the first group from the grouping result with the largest discrete degree, and selecting and accessing the screened third party data sources for the user with the highest requirement degree for the corresponding type of data.

3. The big data based multi-source data risk management method of claim 2, wherein: in step Z2: classifying the called data according to the data service requirements of the users to obtain k types of data, and calling the number of times of calling random type data in n different time periods by a random user to be S= { S ₁ ，S ₂ ，…，S _n Establishing a call analysis model of the random type data by the corresponding user:

S _n+1 =τ*S _n +(1-τ)*P _n ；

predicting the number of times of calling the corresponding type data by the corresponding user in the (n+1) th time period as S _n+1 Wherein τ represents a smoothing coefficient, 0<τ<1，P _n Exponential smoothing of the number of times representing the invocation of the random class of data in the nth time period, according to formula P ₁ =τ*S ₁ +(1-τ)*[(S ₁ +S ₂ +S ₃ )/3]Calculating to obtain the exponential smoothing value P of the times of calling the random type data in the 1 st time period ₁ According to formula P ₂ =τ*S ₁ +(1-τ)*P ₁ Calculating to obtain the exponential smoothing value P of the times of calling the random type data in the 2 nd time period ₂ According to formula P ₃ =τ*S ₂ +(1-τ)*P ₂ Calculating to obtain the exponential smoothing value P of the times of calling the random type data in the 3 rd time period ₃ And so on to obtain P _n ，P _n =τ*S _n-1 +(1-τ)*P _n-1 And establishing a call analysis model of k types of data by the corresponding user in the same way, predicting the number of times of calling k types of data by the corresponding user in the (n+1) th time period, comparing the predicted number of times of calling, and predicting the data of the type with the largest number of times of calling as the data type with the highest requirement level of the corresponding user in the (n+1) th time period.

4. The big data based multi-source data risk management method of claim 2, wherein: in step Z3: and obtaining the data types with the highest requirements of different users in the (n+1) th time period, and classifying the users with the highest requirements on the data of the same type into the same class.

5. The big data based multi-source data risk management method of claim 2, wherein: in step Z4: the number of times of calling the type of data with highest random user demand degree from different third-party data sources is N= { N ₁ ，N ₂ ，…，N _f The set of the number of times of finding the data is r= { r ₁ ，r ₂ ，…，r _f Time length set spent on obtaining data from each query in random third-party data source is t= { t ₁ ，t ₂ ，…，t _c Wherein f represents the number of third party data sources to be accessed, c=r _i C represents the number of times data is searched from a random data source, and the stability Q of the data of the corresponding type is searched by a random third-party data source according to the following formula _i ：

Q _i =(r _i /N _i )×[1/[(∑ ^c _j=1 t _j )/c]]；