CN117390495B - Multi-source data risk management system and method based on big data - Google Patents

Multi-source data risk management system and method based on big data Download PDF

Info

Publication number
CN117390495B
CN117390495B CN202311642090.XA CN202311642090A CN117390495B CN 117390495 B CN117390495 B CN 117390495B CN 202311642090 A CN202311642090 A CN 202311642090A CN 117390495 B CN117390495 B CN 117390495B
Authority
CN
China
Prior art keywords
data
user
different
demand
calling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311642090.XA
Other languages
Chinese (zh)
Other versions
CN117390495A (en
Inventor
夏山俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Rui Mdt Infotech Ltd
Original Assignee
Jiangsu Rui Mdt Infotech Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Rui Mdt Infotech Ltd filed Critical Jiangsu Rui Mdt Infotech Ltd
Priority to CN202311642090.XA priority Critical patent/CN117390495B/en
Publication of CN117390495A publication Critical patent/CN117390495A/en
Application granted granted Critical
Publication of CN117390495B publication Critical patent/CN117390495B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the technical field of data management, in particular to a multi-source data risk management system and method based on big data, comprising the following steps: the system comprises a multi-source data acquisition module, a data management center, a user demand analysis module, a user classification management module and a data source access management module, wherein the multi-source data acquisition module is used for acquiring historical information of data in different third-party data sources called by a user and querying the historical information of the data from the different third-party data sources, the data management center is used for storing and managing all acquired data, the user demand analysis module is used for classifying the called data, the demand degree of the user on different types of data is analyzed, the user classification management module is used for classifying the user according to the analysis result, the data source access management module is used for selecting and accessing the data sources for the same type of user, the probability of aggravation of abnormal condition risks caused by data calling due to improper selection of the third-party data sources is reduced, and the efficiency and the success rate of data calling are improved.

Description

Multi-source data risk management system and method based on big data
Technical Field
The invention relates to the technical field of data management, in particular to a multi-source data risk management system and method based on big data.
Background
The data is often required to be acquired from a third-party data source in the network back-end data service, so that a user can be helped to acquire the data to be queried more conveniently, a plurality of third-party data sources are often required to be accessed for querying more complete and accurate data, after the plurality of third-party data sources are accessed, the data formats provided by the plurality of third-party data sources are uniformly converted, and the data content is cleaned so as to be called by the user;
however, because a plurality of third party data sources can be selected for access, different types of data are called from different third party data sources, the severity of abnormal conditions such as delayed calling data, failure and the like is different, and the third party data sources are randomly selected and accessed, so that the risk of abnormal conditions of data calling is increased due to improper selection of the third party data sources.
Therefore, there is a need for a multi-source data risk management system and method based on big data to solve the above-mentioned problems.
Disclosure of Invention
The invention aims to provide a multi-source data risk management system and method based on big data, so as to solve the problems in the background technology.
In order to solve the technical problems, the invention provides the following technical scheme: a big data based multi-source data risk management system, the system comprising: the system comprises a multi-source data acquisition module, a data management center, a user demand analysis module, a user classification management module and a data source access management module;
the output end of the multi-source data acquisition module is connected with the input end of the data management center, the output end of the data management center is connected with the input end of the user demand analysis module, the output end of the user demand analysis module is connected with the input end of the user classification management module, and the output ends of the user classification management module and the data management center are connected with the input end of the data source access management module;
the multi-source data acquisition module is used for acquiring historical information of data in different third-party data sources called by a user and querying the historical information of the data from the different third-party data sources, and transmitting all acquired data to the data management center;
storing and managing all received data through the data management center;
classifying the invoked data through the user demand analysis module, and analyzing the demand degrees of different users on different types of data;
classifying the users according to the analysis result by the user classification management module;
and selecting a data source for the same type of users to access through the data source access management module.
Further, the multi-source data acquisition module comprises a demand information acquisition unit and a calling information acquisition unit;
the output ends of the demand information acquisition unit and the calling information acquisition unit are connected with the input end of the data management center;
the demand information acquisition unit is used for acquiring the frequency information of calling data in different time periods in the past of different users;
the call information acquisition unit is used for acquiring historical information of data queried from different third party data sources, and the historical information comprises the number of times of query of the past data and the time length information spent for obtaining the data by each query.
Further, the user demand analysis module comprises a demand data classification unit, an analysis model establishment unit and a demand degree prediction unit;
the input end of the demand data classification unit is connected with the output end of the data management center, the output end of the demand data classification unit is connected with the input end of the analysis model establishment unit, and the output end of the analysis model establishment unit is connected with the input end of the demand degree prediction unit;
the demand data classification unit is used for classifying the data which are called by the user in the past according to the user demand and confirming the frequency information of calling different types of data in different time periods by different users in the past;
the analysis model building unit is used for calling the frequency information of calling different types of data in different time periods by a random user and building a call analysis model of the user on the different types of data, and a plurality of call analysis models are built when a plurality of types of data exist;
the demand degree prediction unit is used for analyzing the demand degree of the user on different types of data according to the call analysis model.
Further, the user classification management module comprises a demand degree comparison unit and a user classification unit;
the input end of the demand degree comparison unit is connected with the output end of the demand degree prediction unit, and the output end of the demand degree comparison unit is connected with the input end of the user classification unit;
the demand level comparison unit is used for comparing the demand level of a random user on different types of data and predicting the data type with the highest demand level of the corresponding user in future time;
the user classification unit is used for classifying users with highest demands on the same type of data into the same class.
Further, the data source access management module comprises a calling information analysis unit, a data source stability evaluation unit and a data source access selection unit;
the input end of the calling information analysis unit is connected with the output ends of the user classification unit and the data management center, the output end of the calling information analysis unit is connected with the input end of the data source stability evaluation unit, and the output end of the data source stability evaluation unit is connected with the input end of the data source access selection unit;
the calling information analysis unit is used for calling the searching times of the most needed data of the same type of users in the past of different third-party data sources and obtaining the time length information spent on the most needed data in each inquiry in the past to the data source stability evaluation unit;
the data source stability evaluation unit is used for evaluating the stability degree of the data source query corresponding to different third parties;
the data source access selection unit is used for comparing the stability of the data of the corresponding types of the data sources of different third parties, grouping the third party data sources according to the comparison result, selecting and accessing the most suitable group of third party data sources for the same type of users, uniformly converting the data formats provided by the plurality of third party data sources after accessing the third party data sources, and cleaning the data content for the users to call.
A multi-source data risk management method based on big data comprises the following steps:
z1: collecting historical information of data in different third-party data sources called by a user and inquiring the historical information of the data from the different third-party data sources;
z2: classifying the called data, and analyzing the demand degree of different users on different types of data;
z3: classifying the users according to the analysis result;
z4: the historical information of query data from different third-party data sources is called, and the data query stability of the third-party data sources is analyzed;
z5: and selecting a data source for the same type of user to access.
Further, in step Z1: and equally dividing the time period from T1 to T2 into n time periods, wherein T2 represents the current time, collecting the time information of the data which are called in the n time periods by different users in the past, and collecting the historical information of the data which are queried from different third party data sources, wherein the historical information comprises the time of the data which are queried in the past and the time length information spent in the data query in each time in the past.
Further, in step Z2: classifying the called data according to the data service requirement of the user;
for example: the data service requirements of the user are: the method comprises the steps that when basic information of an enterprise needs to be queried and client pool distribution information needs to be queried, information with different requirements is divided into different types of data;
k types of data are obtained in total, and the number of times of calling random type data in n different time periods by a random user is set as S= { S 1 ,S 2 ,…,S n Establishing a call analysis model of the random type data by the corresponding user:
S n+1 =τ*S n +(1-τ)*P n
predicting the number of times of calling the corresponding type data by the corresponding user in the (n+1) th time period as S n+1 Wherein τ represents a smoothing coefficient, 0<τ<1,P n Exponential smoothing of the number of times representing the invocation of the random class of data in the nth time period, according to formula P 1 =τ*S 1 +(1-τ)*[(S 1 +S 2 +S 3 )/3]Calculating to obtain the exponential smoothing value P of the times of calling the random type data in the 1 st time period 1 According to formula P 2 =τ*S 1 +(1-τ)*P 1 Calculating to obtain the exponential smoothing value P of the times of calling the random type data in the 2 nd time period 2 According to formula P 3 =τ*S 2 +(1-τ)*P 2 Calculating to obtain the exponential smoothing value P of the times of calling the random type data in the 3 rd time period 3 And so on to obtain P n ,P n =τ*S n-1 +(1-τ)*P n-1 The method comprises the steps of establishing a call analysis model of k types of data by a corresponding user in the same way, predicting the number of times of calling k types of data by the corresponding user in an n+1th time period, comparing the predicted number of times of calling, and predicting the data of the type with the largest number of times of calling as the data type with the highest requirement level of the corresponding user in the n+1th time period;
the number of times of calling different types of data by users in different time periods shows dynamic change, the number of times of calling different types of data by users in the past is acquired through a big data technology, the number of times of calling random type data by users in different time periods in the past is acquired, the number of times of calling corresponding types of data by users in a future is predicted by an exponential smoothing method, compared with other prediction algorithms, the exponential smoothing method is more suitable for data prediction in a short period, the historical data according to the prediction is the historical data of a period before the current time, rather than the historical data longer than the current time interval, the predicted number of times can reflect the requirement of the users on the corresponding types of data, and the accuracy of a prediction result is improved.
Further, in step Z3: obtaining the data types with the highest requirements of different users in the (n+1) th time period, and classifying the users with the highest requirements on the data of the same type into the same class;
the more the calling times are, the higher the demand degree of the pre-judging user on the corresponding type data in a future period is, the demand degree of different users on different types of data is compared, the most needed data types of different users in the future period are analyzed, the users with the same most needed data types are classified into the same class, the purpose of carrying out data overall planning in advance is to select the same third party data source for the same class of users to access, and compared with the situation that the third party data sources are selected one by one for the users, the workload of analysis and selection of the third party data source is reduced.
Further, in step Z4: the number of times of calling the type of data with highest random user demand degree from different third-party data sources is N= { N 1 ,N 2 ,…,N f The set of the number of times of finding the data is r= { r 1 ,r 2 ,…,r f Time length set spent on obtaining data from each query in random third-party data source is t= { t 1 ,t 2 ,…,t c Wherein f represents the number of third party data sources to be accessed, c=r i C represents the number of times data is searched from a random data source, and the stability Q of the data of the corresponding type is searched by a random third-party data source according to the following formula i
Q i =(r i /N i )×[1/[(∑ c j=1 t j )/c]];
Wherein N is i Representing the number of times, t, that a random class of data of the type with the highest user demand has been queried from a random third party data source in the past j Representing the time length spent for obtaining data from the jth query in random one third-party data source, and obtaining the stability degree set of the data of the corresponding type of the query of the f third-party data sources as Q= { Q 1 ,Q 2 ,…,Q i ,…,Q f }。
Further, in step Z5: comparing the stability degree of the data of the corresponding type queried by f third party data sources, dividing the third party data sources into g groups according to the order of the stability degree from big to small, wherein the stability degree of the data of the corresponding type queried by all the third party data sources in the former group is larger than that of the data of the latter group, and obtaining a random grouping result, wherein the average value set of the stability degree of the data of the corresponding type queried by each third party data source in the g groups is L= { L 1 ,L 2 ,…,L g And according to the formula w= [ (Σ) g v=1 (L v -(∑ g v=1 L v )/g) 2 )/g] 1/2 Calculating the discrete degree W of the g group parameters in a random grouping result, and calculating the discrete degree of the g group parameters in different grouping results to obtain the discrete degreeThe grouping result with the greatest degree is obtained, the third party data sources in the first group are screened out from the grouping result with the greatest discrete degree, and the third party data sources screened out are selected and accessed for the user with the highest requirement on the corresponding type of data;
the method comprises the steps of collecting past data query history information of different third-party data sources through a big data technology, analyzing the stability degree of query data of the different third-party data sources, judging that the stability degree is higher as the data query frequency is higher and the query time is shorter, grouping the third-party data sources according to the stability degree after analyzing the stability degree of the query data of the different third-party data sources, classifying the third-party data sources with similar stability degrees into a group by analyzing parameters in different grouping results, namely the discrete degree of the stability degree, selecting the group of third-party data sources with the largest stability degree to be accessed by the same user, reducing the probability of aggravation of abnormal condition risks caused by data call due to improper selection of the third-party data sources, and improving the efficiency and success rate of data call.
Compared with the prior art, the invention has the following beneficial effects:
according to the method, the number information of the user calling the data of different types in the past is acquired through a big data technology, the number information of the user calling the data of random type in the past in different time periods is acquired, the number of times that the user can call the data of corresponding type in a period of time in the future is predicted by using an exponential smoothing method, compared with other prediction algorithms, the exponential smoothing method is more suitable for data prediction in a short period, the historical data according to the selection prediction is the historical data of a period of time before the current time, rather than the historical data longer than the current time interval, the predicted number of times can reflect the requirement of the user on the data of corresponding type, and the accuracy of a prediction result is improved;
comparing the demand degree of different users on different types of data, analyzing the most needed data types of different users in a future period, classifying the users with the same most needed data types into the same class, carrying out data overall in advance, and selecting the same third party data source for the users of the same class for access, so that the workload of the analysis and selection of the third party data source is reduced compared with the situation that the third party data sources are selected for the users one by one;
the method comprises the steps of collecting past data query history information of different third-party data sources through a big data technology, analyzing the stability of query data of the different third-party data sources, grouping the third-party data sources according to the stability after analyzing the stability of the query data of the different third-party data sources, classifying the third-party data sources with similar stability into a group by analyzing parameters in different grouping results, namely the dispersion degree of the stability, selecting a group of third-party data sources with the largest stability for accessing by the same user, reducing the probability of aggravation of abnormal condition risks caused by data call due to improper selection of the third-party data sources, and improving the efficiency and success rate of data call.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:
FIG. 1 is a block diagram of a big data based multi-source data risk management system of the present invention;
fig. 2 is a flow chart of a multi-source data risk management method based on big data according to the present invention.
Detailed Description
The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present invention only, and are not intended to limit the present invention.
The invention is further described below with reference to fig. 1-2 and the specific embodiments.
Example 1: as shown in fig. 1, the present embodiment provides a multi-source data risk management system based on big data, the system includes: the system comprises a multi-source data acquisition module, a data management center, a user demand analysis module, a user classification management module and a data source access management module, wherein the multi-source data acquisition module is used for acquiring historical information of data in different third-party data sources called by a user and historical information of data queried from the different third-party data sources, all acquired data are transmitted to the data management center, all received data are stored and managed through the data management center, the called data are classified through the user demand analysis module, the demand degree of different users on different types of data is analyzed, the user is classified through the user classification management module according to analysis results, and the data sources are selected for the same type of user through the data source access management module to be accessed.
The multi-source data acquisition module comprises a demand information acquisition unit and a calling information acquisition unit, wherein the demand information acquisition unit is used for acquiring the time information of calling data in different time periods in the past of different users, and the calling information acquisition unit is used for acquiring historical information of inquiring data in different third-party data sources, and comprises the time information of inquiring the data in the past and the time information of spending for inquiring the obtained data in each time.
The user demand analysis module comprises a demand data classification unit, an analysis model establishment unit and a demand degree prediction unit, wherein the demand data classification unit is used for classifying data which are called by users in the past according to user demands, confirming the frequency information of calling different types of data by different users in different time periods, the analysis model establishment unit is used for calling the frequency information of calling different types of data by a random user in different time periods and establishing a call analysis model of the user on the different types of data, a plurality of call analysis models are established for the data of a plurality of types, and the demand degree prediction unit is used for analyzing the demand degree of the user on the different types of data according to the call analysis model.
The user classification management module comprises a demand level comparison unit and a user classification unit, wherein the demand level comparison unit is used for comparing the demand level of a random user on different types of data, predicting the data type with the highest demand level of the corresponding user in the future time, and the user classification unit is used for classifying the users with the highest demand level on the same type of data into the same type.
The data source access management module comprises a calling information analysis unit, a data source stability evaluation unit and a data source access selection unit, wherein the calling information analysis unit is used for calling the searching times of data needed most by different third party data sources for the same type of users in the past and obtaining time length information spent on the most needed data for each inquiry in the past to the data source stability evaluation unit, the data source stability evaluation unit is used for evaluating the stability degree of the data corresponding to the inquiry of the different third party data sources, the data source access selection unit is used for comparing the stability degree of the data corresponding to the inquiry of the different third party data sources, the third party data sources are grouped according to the comparison result, a group of most suitable third party data sources are selected and accessed for the same type of users, after the third party data sources are accessed, the data formats provided by the plurality of third party data sources are uniformly converted, and the data content is cleaned for the users to call.
Example 2: as shown in fig. 2, the present embodiment provides a multi-source data risk management method based on big data, which is implemented based on the data management system in the embodiment, and specifically includes the following steps:
z1: collecting historical information of data in different third party data sources called by a user and historical information of data queried from the different third party data sources, and equally dividing a time period from T1 to T2 into 5 time periods, wherein T2 represents current time, collecting frequency information of data invoked by different users in the 5 time periods, and collecting historical information of data queried from the different third party data sources, wherein the historical information comprises the frequency of data query and time length information spent on data query in each time;
z2: classifying the called data, classifying the called data according to the data service requirements of users to obtain 5 types of data, analyzing the requirement degree of different users on different types of data, and calling the number of times that one random user calls the random type data in 5 different time periods in the past to be S= { S 1 ,S 2 ,S 3 ,S 4 ,S 5 } = {10, 12, 15, 11, 18}, building a call analysis model of the random class data by the corresponding user: s is S n+1 =τ*S n +(1-τ)*P n The corresponding user is predicted to call the corresponding type in the (n+1) time periodThe number of times of data is S n+1 Where τ represents a smoothing coefficient, and τ=0.6 and p is set n Exponential smoothing of the number of times representing the invocation of the random class of data in the nth time period, according to formula P 1 =τ*S 1 +(1-τ)*[(S 1 +S 2 +S 3 )/3]Calculating to obtain the exponential smoothing value P of the times of calling the random type data in the 1 st time period 1 Approximately 11, according to formula P 2 =τ*S 1 +(1-τ)*P 1 Calculating to obtain the exponential smoothing value P of the times of calling the random type data in the 2 nd time period 2 About 10, according to formula P 3 =τ*S 2 +(1-τ)*P 2 Calculating to obtain the exponential smoothing value P of the times of calling the random type data in the 3 rd time period 3 Approximately 11, and so on to obtain P n Obtaining P 5 =τ*S 4 +(1-τ)*P 4 About to 12, the number of times of calling the corresponding type data by the corresponding user in the (n+1=6) th time period is predicted to be S 6 The method comprises the steps of (1) establishing a call analysis model of a corresponding user on 5 types of data in the same mode, predicting that the number of times of calling k types of data by the corresponding user in a 6 th time period is {16, 20, 15,8,6}, comparing the predicted number of times of calling, and predicting that the data with the highest requirement degree of the corresponding user in the 6 th time period is: second class data;
z3: classifying the users according to the analysis result to obtain the data types with the highest requirements of different users in the 6 th time period, and classifying the users with the highest requirements on the data of the same type into the same class;
z4: the historical information of the data is queried from different third-party data sources is invoked, and the frequency set for querying the data of the type with highest user demand degree in random from different third-party data sources is invoked to be N= { N 1 ,N 2 ,N 3 ,N 4 ,N 5 ,N 6 ,N 7 The number of times of finding the data is set as r= { r = {10, 22, 26, 18, 15,8, 30} 1 ,r 2 ,r 3 ,r 4 ,r 5 ,r 6 ,r 7 Each query from the first third party data source } = {7, 20, 25, 15, 10,8, 15}The set of time durations spent getting the data is t= { t 1 ,t 2 ,t 3 ,t 4 ,t 5 ,t 6 ,t 7 } = {10, 12, 15, 14, 11, 11, 13}, in units of: second, according to formula Q i =(r i /N i )×[1/[(∑ c j=1 t j )/c]]Calculating stability Q of data of corresponding type queried by random third party data source i Wherein N is i Representing the number of times, t, that a random class of data of the type with the highest user demand has been queried from a random third party data source in the past j Representing the time length spent for obtaining data from the j-th query in a random third-party data source, and obtaining the stability Q of the data of the corresponding type of the query of the first third-party data source 1 Approximately equal to 0.06, the stability degree set of the corresponding type data queried by 7 third-party data sources is Q= { Q 1 ,Q 2 ,Q 3 ,Q 4 ,Q 5 ,Q 6 ,Q 7 }={0.06,0.18,0.20,0.09,0.12,0.05,0.22};
Z5: selecting data sources for the same type of users to access, comparing the stability of querying corresponding type data from 7 third party data sources, and dividing the third party data sources into 3 groups according to the order of the stability from big to small, wherein the stability of querying the corresponding type data by all the third party data sources in the former group is greater than that of the latter group, and obtaining a random grouping result as follows: the stability degree sets of the data of the corresponding type of the query of the 3 groups of the third-party data sources are {0.22,0.20}, {0.18,0.12}, and {0.09,0.06,0.05}, respectively, and in the corresponding grouping result, the stability degree mean value set of the data of the corresponding type of the query of each group of the 3 groups of the third-party data sources is L= { L 1 ,L 2 ,L 3 } = {0.21,0.15,0.07}, according to the formula w= [ (Σ) g v=1 (L v -(∑ g v= 1 L v )/g) 2 )/g] 1/2 Calculating the discrete degree W (approximately equal to 0.06) of 3 groups of parameters in a random grouping result, and calculating the discrete degree of g groups of parameters in different grouping results, wherein the grouping result with the maximum discrete degree is obtained by the following steps: 3 groups of third party data sources query corresponding types of dataThe stability degree sets of (a) are {0.22,0.20,0.18}, {0.12,0.09}, and {0.06,0.05}, and third party data sources in the first group are selected from the grouping results with the largest discrete degree as follows: and the seventh, third and second third party data sources are selected and accessed for users with highest demands on corresponding types of data, after the third party data sources are accessed, the data formats provided by the seventh, third and second third party data sources are uniformly converted, and the data contents are cleaned for the users to call.
Finally, it should be noted that: the foregoing is merely a preferred example of the present invention, and the present invention is not limited thereto, but it is to be understood that modifications and equivalents of some of the technical features described in the foregoing embodiments may be made by those skilled in the art, although the present invention has been described in detail with reference to the foregoing embodiments. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (5)

1. A big data based multi-source data risk management system, characterized in that: the system comprises: the system comprises a multi-source data acquisition module, a data management center, a user demand analysis module, a user classification management module and a data source access management module;
the output end of the multi-source data acquisition module is connected with the input end of the data management center, the output end of the data management center is connected with the input end of the user demand analysis module, the output end of the user demand analysis module is connected with the input end of the user classification management module, and the output ends of the user classification management module and the data management center are connected with the input end of the data source access management module;
the multi-source data acquisition module is used for acquiring historical information of data in different third-party data sources called by a user and querying the historical information of the data from the different third-party data sources, and transmitting all acquired data to the data management center;
storing and managing all received data through the data management center;
classifying the invoked data through the user demand analysis module, and analyzing the demand degrees of different users on different types of data;
classifying the users according to the analysis result by the user classification management module;
selecting a data source for the same type of users to access through the data source access management module;
the multi-source data acquisition module comprises a demand information acquisition unit and a calling information acquisition unit;
the output ends of the demand information acquisition unit and the calling information acquisition unit are connected with the input end of the data management center;
the demand information acquisition unit is used for acquiring the frequency information of calling data in different time periods in the past of different users;
the call information acquisition unit is used for acquiring historical information of data queried from different third party data sources, and comprises the number of times of query of the past data and the time length information spent for obtaining the data by each query in the past;
the user demand analysis module comprises a demand data classification unit, an analysis model building unit and a demand degree prediction unit;
the input end of the demand data classification unit is connected with the output end of the data management center, the output end of the demand data classification unit is connected with the input end of the analysis model establishment unit, and the output end of the analysis model establishment unit is connected with the input end of the demand degree prediction unit;
the demand data classification unit is used for classifying the data which are called by the user in the past according to the user demand and confirming the frequency information of calling different types of data in different time periods by different users in the past;
the analysis model building unit is used for calling the frequency information of calling different types of data in different time periods by a random user and building a call analysis model of the user on the different types of data;
the demand degree prediction unit is used for analyzing the demand degree of the user on different types of data according to the call analysis model;
the user classification management module comprises a demand degree comparison unit and a user classification unit;
the input end of the demand degree comparison unit is connected with the output end of the demand degree prediction unit, and the output end of the demand degree comparison unit is connected with the input end of the user classification unit;
the demand level comparison unit is used for comparing the demand level of a random user on different types of data and predicting the data type with the highest demand level of the corresponding user in future time;
the user classification unit is used for classifying users with highest demands on the same type of data into the same class;
the data source access management module comprises a call information analysis unit, a data source stability evaluation unit and a data source access selection unit;
the input end of the calling information analysis unit is connected with the output ends of the user classification unit and the data management center, the output end of the calling information analysis unit is connected with the input end of the data source stability evaluation unit, and the output end of the data source stability evaluation unit is connected with the input end of the data source access selection unit;
the calling information analysis unit is used for calling the searching times of the most needed data of the same type of users in the past of different third-party data sources and obtaining the time length information spent on the most needed data in each inquiry in the past to the data source stability evaluation unit;
the data source stability evaluation unit is used for evaluating the stability degree of the data source query corresponding to different third parties;
the data source access selection unit is used for comparing the stability degree of the data of different third party data sources for inquiring the corresponding types, grouping the third party data sources according to the comparison result, and selecting and accessing the most suitable group of third party data sources for the same type of users.
2. A multi-source data risk management method based on big data is characterized in that: the method comprises the following steps:
z1: collecting historical information of data in different third-party data sources called by a user and inquiring the historical information of the data from the different third-party data sources;
z2: classifying the called data, and analyzing the demand degree of different users on different types of data;
z3: classifying the users according to the analysis result;
z4: the historical information of query data from different third-party data sources is called, and the data query stability of the third-party data sources is analyzed;
z5: selecting a data source for the same type of users to access;
in step Z1: equally dividing a time period from T1 to T2 into n time periods, wherein T2 represents the current time, collecting the time information of data calling in the n time periods in the past of different users, and collecting the historical information of data inquiry in different third party data sources, wherein the historical information comprises the time of data inquiry in the past and the time length information spent on data inquiry in each time in the past;
in step Z5: comparing the stability of the data of the corresponding type queried from the f third party data sources, dividing the third party data sources into g groups according to the order of the stability from big to small, and obtaining a random grouping result, wherein the average value set of the stability of the data of the corresponding type queried by each third party data source in the g groups is L= { L 1 ,L 2 ,…,L g And according to the formula w= [ (Σ) g v=1 (L v -(∑ g v=1 L v )/g) 2 )/g] 1/2 And calculating the discrete degree W of the g group parameters in a random grouping result, calculating the discrete degree of the g group parameters in different grouping results, obtaining the grouping result with the largest discrete degree, screening the third party data sources in the first group from the grouping result with the largest discrete degree, and selecting and accessing the screened third party data sources for the user with the highest requirement degree for the corresponding type of data.
3. The big data based multi-source data risk management method of claim 2, wherein: in step Z2: classifying the called data according to the data service requirements of the users to obtain k types of data, and calling the number of times of calling random type data in n different time periods by a random user to be S= { S 1 ,S 2 ,…,S n Establishing a call analysis model of the random type data by the corresponding user:
S n+1 =τ*S n +(1-τ)*P n
predicting the number of times of calling the corresponding type data by the corresponding user in the (n+1) th time period as S n+1 Wherein τ represents a smoothing coefficient, 0<τ<1,P n Exponential smoothing of the number of times representing the invocation of the random class of data in the nth time period, according to formula P 1 =τ*S 1 +(1-τ)*[(S 1 +S 2 +S 3 )/3]Calculating to obtain the exponential smoothing value P of the times of calling the random type data in the 1 st time period 1 According to formula P 2 =τ*S 1 +(1-τ)*P 1 Calculating to obtain the exponential smoothing value P of the times of calling the random type data in the 2 nd time period 2 According to formula P 3 =τ*S 2 +(1-τ)*P 2 Calculating to obtain the exponential smoothing value P of the times of calling the random type data in the 3 rd time period 3 And so on to obtain P n ,P n =τ*S n-1 +(1-τ)*P n-1 And establishing a call analysis model of k types of data by the corresponding user in the same way, predicting the number of times of calling k types of data by the corresponding user in the (n+1) th time period, comparing the predicted number of times of calling, and predicting the data of the type with the largest number of times of calling as the data type with the highest requirement level of the corresponding user in the (n+1) th time period.
4. The big data based multi-source data risk management method of claim 2, wherein: in step Z3: and obtaining the data types with the highest requirements of different users in the (n+1) th time period, and classifying the users with the highest requirements on the data of the same type into the same class.
5. The big data based multi-source data risk management method of claim 2, wherein: in step Z4: the number of times of calling the type of data with highest random user demand degree from different third-party data sources is N= { N 1 ,N 2 ,…,N f The set of the number of times of finding the data is r= { r 1 ,r 2 ,…,r f Time length set spent on obtaining data from each query in random third-party data source is t= { t 1 ,t 2 ,…,t c Wherein f represents the number of third party data sources to be accessed, c=r i C represents the number of times data is searched from a random data source, and the stability Q of the data of the corresponding type is searched by a random third-party data source according to the following formula i
Q i =(r i /N i )×[1/[(∑ c j=1 t j )/c]];
Wherein N is i Representing the number of times, t, that a random class of data of the type with the highest user demand has been queried from a random third party data source in the past j Representing the time length spent for obtaining data from the jth query in random one third-party data source, and obtaining the stability degree set of the data of the corresponding type of the query of the f third-party data sources as Q= { Q 1 ,Q 2 ,…,Q i ,…,Q f }。
CN202311642090.XA 2023-12-04 2023-12-04 Multi-source data risk management system and method based on big data Active CN117390495B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311642090.XA CN117390495B (en) 2023-12-04 2023-12-04 Multi-source data risk management system and method based on big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311642090.XA CN117390495B (en) 2023-12-04 2023-12-04 Multi-source data risk management system and method based on big data

Publications (2)

Publication Number Publication Date
CN117390495A CN117390495A (en) 2024-01-12
CN117390495B true CN117390495B (en) 2024-02-20

Family

ID=89470486

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311642090.XA Active CN117390495B (en) 2023-12-04 2023-12-04 Multi-source data risk management system and method based on big data

Country Status (1)

Country Link
CN (1) CN117390495B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008092149A2 (en) * 2007-01-26 2008-07-31 Information Resources, Inc. Data fusion methods and systems
CN107193967A (en) * 2017-05-25 2017-09-22 南开大学 A kind of multi-source heterogeneous industry field big data handles full link solution
CN110765337A (en) * 2019-11-15 2020-02-07 中科院计算技术研究所大数据研究院 Service providing method based on internet big data
CN112579625A (en) * 2020-09-28 2021-03-30 京信数据科技有限公司 Multi-source heterogeneous data treatment method and device
CN115409120A (en) * 2022-09-02 2022-11-29 国网青海省电力公司海西供电公司 Data-driven-based auxiliary user electricity stealing behavior detection method
CN116189436A (en) * 2023-03-17 2023-05-30 佛山市众合科技有限公司 Multi-source data fusion algorithm based on big data
CN116795655A (en) * 2023-08-25 2023-09-22 深圳市银闪科技有限公司 Storage device performance monitoring system and method based on artificial intelligence
CN117131345A (en) * 2023-08-30 2023-11-28 山西玖幺两航空运动有限公司 Multi-source data parameter evaluation method based on data deep learning calculation

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114926204A (en) * 2022-05-11 2022-08-19 北京大学 Data processing device and method based on data value

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008092149A2 (en) * 2007-01-26 2008-07-31 Information Resources, Inc. Data fusion methods and systems
CN107193967A (en) * 2017-05-25 2017-09-22 南开大学 A kind of multi-source heterogeneous industry field big data handles full link solution
CN110765337A (en) * 2019-11-15 2020-02-07 中科院计算技术研究所大数据研究院 Service providing method based on internet big data
CN112579625A (en) * 2020-09-28 2021-03-30 京信数据科技有限公司 Multi-source heterogeneous data treatment method and device
CN115409120A (en) * 2022-09-02 2022-11-29 国网青海省电力公司海西供电公司 Data-driven-based auxiliary user electricity stealing behavior detection method
CN116189436A (en) * 2023-03-17 2023-05-30 佛山市众合科技有限公司 Multi-source data fusion algorithm based on big data
CN116795655A (en) * 2023-08-25 2023-09-22 深圳市银闪科技有限公司 Storage device performance monitoring system and method based on artificial intelligence
CN117131345A (en) * 2023-08-30 2023-11-28 山西玖幺两航空运动有限公司 Multi-source data parameter evaluation method based on data deep learning calculation

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Thomas Minier等.SaGe: Web Preemption for Public SPARQL Query Services.《WWW '19: The World Wide Web Conference》.2019,1268–1278. *
游客行为数据接入与智能推荐方法研究;王国泰;《中国优秀硕士学位论文全文数据库 信息科技辑》(第02期);I138-1363 *

Also Published As

Publication number Publication date
CN117390495A (en) 2024-01-12

Similar Documents

Publication Publication Date Title
CN107943809B (en) Data quality monitoring method and device and big data computing platform
CN110110881B (en) Power customer demand prediction analysis method and system
WO2021179544A1 (en) Sample classification method and apparatus, computer device, and storage medium
US20060235742A1 (en) System and method for process evaluation
US20040093351A1 (en) System and method for controlling task assignment and work schedules
CN115497272B (en) Construction period intelligent early warning system and method based on digital construction
CN112398700B (en) Service degradation method and device, storage medium and computer equipment
CN111652661B (en) Mobile phone client user loss early warning processing method
EP2652909A1 (en) Method and system for carrying out predictive analysis relating to nodes of a communication network
CN117390495B (en) Multi-source data risk management system and method based on big data
CN110543509B (en) Monitoring system, method and device for user access data and electronic equipment
CN116974805A (en) Root cause determination method, apparatus and storage medium
CN116700929A (en) Task batch processing method and system based on artificial intelligence
CN116091175A (en) Transaction information data management system and method based on big data
CN116089209A (en) Database capacity management method and device
McCalla et al. A time-dependent queueing-network model to describe the life-cycle dynamics of private-line telecommunication services
CN114661463A (en) BP neural network-based system resource prediction method and system
CN114520773A (en) Service request response method, device, server and storage medium
CN114781717A (en) Network point equipment recommendation method, device, equipment and storage medium
CN108683551B (en) Pipeline type flow control method and device
CN114143263A (en) Method, device and medium for limiting current of user request
CN115174693B (en) Hierarchical blockchain cross-chain interaction method
CN115203240B (en) Bus message processing system based on call data retrieval
CN117896363B (en) Cloud service-based software management system and method
CN111026863A (en) Customer behavior prediction method, apparatus, device and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant