CN117252287A - Index prediction method and system based on federal pearson correlation analysis - Google Patents

Index prediction method and system based on federal pearson correlation analysis Download PDF

Info

Publication number
CN117252287A
CN117252287A CN202310981568.5A CN202310981568A CN117252287A CN 117252287 A CN117252287 A CN 117252287A CN 202310981568 A CN202310981568 A CN 202310981568A CN 117252287 A CN117252287 A CN 117252287A
Authority
CN
China
Prior art keywords
data set
random
partner
initiator
correlation coefficient
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310981568.5A
Other languages
Chinese (zh)
Other versions
CN117252287B (en
Inventor
孙银银
兰春嘉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Lingshuzhonghe Information Technology Co ltd
Original Assignee
Shanghai Lingshuzhonghe Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Lingshuzhonghe Information Technology Co ltd filed Critical Shanghai Lingshuzhonghe Information Technology Co ltd
Priority to CN202310981568.5A priority Critical patent/CN117252287B/en
Priority claimed from CN202310981568.5A external-priority patent/CN117252287B/en
Publication of CN117252287A publication Critical patent/CN117252287A/en
Application granted granted Critical
Publication of CN117252287B publication Critical patent/CN117252287B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Tourism & Hospitality (AREA)
  • Educational Administration (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Water Supply & Treatment (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an index prediction method and system based on federal pearson correlation analysis. Wherein the method comprises the following steps: normalizing the data set; slicing the standard data set; computing a multiplication pair in a trusted execution environment; calculating a first common parameter and a second common parameter; calculating a first slicing correlation coefficient and a second slicing correlation coefficient according to the public parameters; calculating a federal correlation coefficient according to the first and second sliced correlation coefficients; performing federal pearson correlation analysis according to federal correlation coefficients; determining model training data according to the analysis result, and training a federal learning model by adopting the model training data; index prediction is carried out through the federal learning model; the index comprises: fault performance index, and profit index. The invention calculates the multiplication pair in the trusted execution environment, reduces the communication overhead of ciphertext calculation and improves the performance, in addition, in the aspect of the performance, the calculation tasks can be executed in a partitioning and parallel manner, and the calculation efficiency is greatly improved.

Description

Index prediction method and system based on federal pearson correlation analysis
Technical Field
The invention relates to the technical field of privacy computation, in particular to an index prediction method and system based on federal pearson correlation analysis.
Background
When longitudinal federal modeling is performed, the data sets of the task initiator and the partner have common sample space and different feature spaces, encryption algorithm is required to be used for guaranteeing data privacy safety, correlation analysis is performed on each continuous feature of the node and other features of the node of the partner, the feature with larger correlation is removed, modeling efficiency and modeling accuracy are improved, existing federal correlation analysis can be converted into matrix secret multiplication, performance is poor, communication cost is high, calculation process is complex, and efficiency is low.
Aiming at the problems of poor federal correlation analysis performance, high communication overhead, complex calculation process and low efficiency in the prior art, no effective solution is proposed at present.
Disclosure of Invention
The embodiment of the invention provides an index prediction method and system based on federal pearson correlation analysis, which are used for solving the problems of poor federal correlation analysis performance, high communication overhead, complex calculation process and low efficiency in the prior art.
To achieve the above object, in one aspect, the present invention provides an index prediction method based on federal pearson correlation analysis, the method comprising: s1, normalizing a first data set X of an initiator in longitudinal federal learning to obtain a first standard data set X ', and normalizing a second data set Y of a partner to obtain a second standard data set Y'; s2, the initiator fragments X ' according to the first random data set R0 to obtain a first fragmented data set X0', and the partner takes the shared R0 as a second fragmented data set X1'; the partner fragments Y ' according to the second random data set R1 to obtain a third fragment data set Y0', and the initiator takes the shared R1 as a fourth fragment data set Y1'; s3, the trusted execution environment calculates a product data set c according to the third random data set a0, the fourth random data set b0, the fifth random data set a1 and the sixth random data set b 1; randomly generating a first generated data set c0 with the same size as the c matrix according to the c; calculating a second generated data set c1 according to the c and the c0; the initiator shares a0 and b0 and acquires c0 sent by the trusted execution environment; the partner shares a1 and b1 and acquires c1 sent by the trusted execution environment; s4, the partner calculates a first public parameter according to the sum of X1', a1 and X0' and a0 of the shared initiator; the initiator calculates a second public parameter according to the Y1', the b0 and the sum of Y0' and b1 of the shared partner; s5, the initiator calculates a first slicing correlation coefficient according to the a0, the b0, the c0, the second common parameter and the shared first common parameter; the partner calculates a second fragment correlation coefficient according to the a1, the b1 and the c1, the first public parameter and the shared second public parameter; s6, the two parties respectively calculate and obtain the federal correlation coefficient according to the respective slicing correlation coefficient and the slicing correlation coefficient of the shared counterpart; s7, carrying out Federal pearson correlation analysis according to the Federal correlation coefficient, determining model training data according to an analysis result, and training a Federal learning model by adopting the model training data; s8, index prediction is carried out through the federal learning model; the index comprises: fault performance index, and profit index.
Optionally, the S2 includes: the method comprises the steps that a first random seed generated by an initiator is sent to a partner, and the initiator and the partner generate a first random data set R0 according to the first random seed; the initiator fragments the first standard data set X 'according to the first random data set R0 to obtain a first fragmented data set X0'; the partner takes the first random data set R0 as a second fragment data set X1'; the second random seed generated by the partner is sent to the initiator, and the initiator and the partner generate a second random data set R1 according to the second random seed; the partner fragments the second standard data set Y 'according to the second random data set R1 to obtain a third fragmented data set Y0'; the initiator takes the second random data set R1 as a fourth sliced data set Y1'.
Optionally, the product data set c is calculated according to the following formula:
c=(a0+a1)×(b0+b1);
the second generated dataset c1 is calculated according to the following formula:
c1=c-c0;
wherein a0 is the third random data set, a1 is the fifth random data set, b0 is the fourth random data set, b1 is the sixth random data set, c is the product data set, c0 is the first generated data set, and c1 is the second generated data set.
Optionally, the first common parameter is calculated according to the following formula:
X’+a=X0’+a0+X1’+a1;
wherein X ' +a is a first common parameter, X0' is a first sliced data set, a0 is a third random data set, X1' is a second sliced data set, and a1 is a fifth random data set;
the second common parameter is calculated according to the following formula:
Y’+b=Y0’+b1+Y1’+b0;
wherein Y ' +b is a second common parameter, Y0' is a third sliced data set, b1 is a sixth random data set, Y1' is a fourth sliced data set, and b0 is a fourth random data set.
Optionally, the first slice correlation coefficient is calculated according to the following formula:
corr0=c0-a0*(Y’+b)-(X’+a)*b0+(X’+a)*(Y’+b);
wherein corr0 is a first slicing correlation coefficient, c0 is a first generated data set, a0 is a third random data set, b0 is a fourth random data set, X '+a is a first common parameter, and Y' +b is a second common parameter;
the second slice correlation coefficient is calculated according to the following formula:
corr1=c1-a1*(Y’+b)-(X’+a)*b1;
wherein corr1 is a second slice correlation coefficient, c1 is a second generated data set, a1 is a fifth random data set, b1 is a sixth random data set, X '+a is a first common parameter, and Y' +b is a second common parameter.
In another aspect, the present invention provides an index prediction system based on federal pearson correlation analysis, the system comprising: the normalization unit is used for normalizing the first data set X of the initiator in longitudinal federal learning to obtain a first standard data set X ', and normalizing the second data set Y of the partner to obtain a second standard data set Y'; the slicing unit is used for slicing the X ' according to the first random data set R0 by the initiator to obtain a first slicing data set X0', and taking the shared R0 as a second slicing data set X1' by the partner; the partner fragments Y ' according to the second random data set R1 to obtain a third fragment data set Y0', and the initiator takes the shared R1 as a fourth fragment data set Y1'; the generated data set calculation unit is used for calculating a product data set c according to the third random data set a0, the fourth random data set b0, the fifth random data set a1 and the sixth random data set b1 by the trusted execution environment; randomly generating a first generated data set c0 with the same size as the c matrix according to the c; calculating a second generated data set c1 according to the c and the c0; the initiator shares a0 and b0 and acquires c0 sent by the trusted execution environment; the partner shares a1 and b1 and acquires c1 sent by the trusted execution environment; the public parameter calculation unit is used for the partner to calculate and obtain a first public parameter according to the sum of X1', a1 and X0' and a0 of the shared initiator; the initiator calculates a second public parameter according to the Y1', the b0 and the sum of Y0' and b1 of the shared partner; the slicing correlation coefficient calculation unit is used for calculating a first slicing correlation coefficient according to the a0, b0, c0, the second common parameter and the shared first common parameter by the initiator; the partner calculates a second fragment correlation coefficient according to the a1, the b1 and the c1, the first public parameter and the shared second public parameter; the federal correlation coefficient calculation unit is used for calculating the federal correlation coefficient by two parties according to the respective slicing correlation coefficient and the slicing correlation coefficient of the shared counterpart; the analysis unit is used for carrying out federal pearson correlation analysis according to the federal correlation coefficient, determining model training data according to analysis results, and training a federal learning model by adopting the model training data; the prediction unit is used for performing index prediction through the federal learning model; the index comprises: fault performance index, and profit index.
Optionally, the slicing unit includes: the first segmentation subunit is used for sending a first random seed generated by the initiator to the partner, and the initiator and the partner generate a first random data set R0 according to the first random seed; the initiator fragments the first standard data set X 'according to the first random data set R0 to obtain a first fragmented data set X0'; the partner takes the first random data set R0 as a second fragment data set X1'; the second segmentation subunit is used for sending a second random seed generated by the partner to the initiator, and the initiator and the partner generate a second random data set R1 according to the second random seed; the partner fragments the second standard data set Y 'according to the second random data set R1 to obtain a third fragmented data set Y0'; the initiator takes the second random data set R1 as a fourth sliced data set Y1'.
Optionally, the product data set c is calculated according to the following formula:
c=(a0+a1)×(b0+b1);
the second generated dataset c1 is calculated according to the following formula:
c1=c-c0;
wherein a0 is the third random data set, a1 is the fifth random data set, b0 is the fourth random data set, b1 is the sixth random data set, c is the product data set, c0 is the first generated data set, and c1 is the second generated data set.
Optionally, the first common parameter is calculated according to the following formula:
X’+a=X0’+a0+X1’+a1;
wherein X ' +a is a first common parameter, X0' is a first sliced data set, a0 is a third random data set, X1' is a second sliced data set, and a1 is a fifth random data set;
the second common parameter is calculated according to the following formula:
Y’+b=Y0’+b1+Y1’+b0;
wherein Y ' +b is a second common parameter, Y0' is a third sliced data set, b1 is a sixth random data set, Y1' is a fourth sliced data set, and b0 is a fourth random data set.
Optionally, the first slice correlation coefficient is calculated according to the following formula:
corr0=c0-a0*(Y’+b)-(X’+a)*b0+(X’+a)*(Y’+b);
wherein corr0 is a first slicing correlation coefficient, c0 is a first generated data set, a0 is a third random data set, b0 is a fourth random data set, X '+a is a first common parameter, and Y' +b is a second common parameter;
the second slice correlation coefficient is calculated according to the following formula:
corr1=c1-a1*(Y’+b)-(X’+a)*b1;
wherein corr1 is a second slice correlation coefficient, c1 is a second generated data set, a1 is a fifth random data set, b1 is a sixth random data set, X '+a is a first common parameter, and Y' +b is a second common parameter.
The invention has the beneficial effects that:
the invention provides an index prediction method and system based on federal pearson correlation analysis, wherein the method normalizes a data set; slicing the standard data set; in the trusted execution environment computing multiplication pair, the communication overhead of ciphertext computing is reduced, the performance is improved, in addition, in the performance, computing tasks can be executed in a partitioning and parallel mode, the computing efficiency is greatly improved, the correlation of model training data is greatly improved through an analysis result based on federal pearson correlation analysis, and further the prediction efficiency and the prediction accuracy of indexes are improved.
Drawings
FIG. 1 is a flowchart of an index prediction method based on Federal pearson correlation analysis provided by an embodiment of the present invention;
FIG. 2 is a flow chart of standard data set sharding provided by an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an index prediction system based on federal pearson correlation analysis according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a slicing unit according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
When longitudinal federal modeling is performed, the data sets of the task initiator and the partner have common sample space and different feature spaces, encryption algorithm is required to be used for guaranteeing data privacy safety, correlation analysis is performed on each continuous feature of the node and other features of the node of the partner, the feature with larger correlation is removed, modeling efficiency and modeling accuracy are improved, existing federal correlation analysis can be converted into matrix secret multiplication, performance is poor, communication cost is high, calculation process is complex, and efficiency is low.
Therefore, the index prediction method based on the federal pearson correlation analysis can reduce communication overhead, improve performance and computational efficiency, and improve the correlation of the determined model training data through the federal pearson correlation analysis, thereby improving the prediction efficiency and accuracy of the federal learning model on the user performance index. The federal learning model in the embodiment of the invention can be a wind power equipment fault model, a power supply coal consumption clean profit model and the like.
Fig. 1 is a flowchart of an index prediction method based on a federal pearson correlation analysis party according to an embodiment of the present invention, where, as shown in fig. 1, the method includes:
s1, normalizing a first data set X of an initiator in longitudinal federal learning to obtain a first standard data set X ', and normalizing a second data set Y of a partner to obtain a second standard data set Y';
the sponsor refers to the sponsor in the longitudinal federal learning, and the partner refers to the partner in the longitudinal federal learning. Longitudinal federal learning is a federal learning scenario applicable to participants on a dataset that have the same sample space and different feature spaces. A machine learning model may be co-trained for different participants through longitudinal federal learning. The initiator and the partner may be different enterprises having a partner requirement. The initiator data set contains user data of the initiator, wherein the user data of the initiator refer to sample data which can represent performance indexes of a user at the initiator and are obtained by the initiator under the condition of user permission; the partner data set contains user data of the partner, and the user data of the partner refers to sample data which can represent performance indexes of the user at the partner and is acquired by the partner under the condition of user permission.
For example, the initiator may be a wind farm and the partner may be a manufacturer of the wind power plant. The data set of the wind power plant comprises SCADA operation data, maintenance ledgers, test data, meteorological data and the like of the wind power equipment; the data set of the manufacturer of the wind power equipment comprises manufacturer design parameters and the like of the wind power equipment. The data sets of manufacturers of the wind power plant and the wind power equipment are respectively standardized and fragmented, and a model training data set is determined according to the federal pearson analysis result, so that the failure performance index of the wind power equipment is accurately predicted through the federal learning model. For another example, the initiator may be a group power company and the partner may be a sub power company. The data set of the group power company comprises operation financial data (such as actual power supply coal consumption net profit value); the data set of the sub-utility includes power coal consumption data. The data sets of sub-electric power companies of the group electric power company are processed, the strongly-correlated characteristic data of the power supply coal consumption can be obtained through a Federal pearson analysis method, a power supply coal consumption net profit model is determined according to the strongly-correlated characteristic data, and effective prediction of the power supply coal consumption profit can be achieved through the model.
Specifically, in the longitudinal federal modeling task, users of the initiator and the partner have intersections and different features. Assuming that the number of samples of an intersection is n after intersection based on user id, the initiator has m features, and the partner has t features; the first data set of the initiator is X (m, n) and the second data set of the partner is Y (t, n);
the first data set X (m, n) of the initiator is characterized by xi, i=1, 2, …, m, each feature having dimensions j, j=1, 2, …, n; average value ui= (xi1+xi) of feature xi 2 + … +xin)/n; criteria for feature xiDifference δi= (((xi 1-ui)/(2+ (xi)) 2 -ui)/(2+ … + (xin-ui)/(2)/n)/(0.5); xi normalized feature xij' = (xij-ui)/δi; wherein i=1, 2, …, m; j=1, 2, …, n; normalizing each feature of the first data set X to obtain a first standard data set X'; similarly, the second dataset of the partner is normalized to a second standard dataset Y'.
S2, the initiator fragments X ' according to the first random data set R0 to obtain a first fragmented data set X0', and the partner takes the shared R0 as a second fragmented data set X1'; the partner fragments Y ' according to the second random data set R1 to obtain a third fragment data set Y0', and the initiator takes the shared R1 as a fourth fragment data set Y1';
Fig. 2 is a flowchart of standard data set slicing provided by an embodiment of the present invention, as shown in fig. 2, where S2 includes:
s21, a first random seed generated by an initiator is sent to a partner, and the initiator and the partner generate a first random data set R0 according to the first random seed; the initiator fragments the first standard data set X 'according to the first random data set R0 to obtain a first fragmented data set X0'; the partner takes the first random data set R0 as a second fragment data set X1';
the first random seed generated by the initiator 0=k, k being [1,2, …,1000000]Optionally, the initiator sends a first random seed to the partner, and the initiator and the partner generate a first random data set r0= [ R0i ] according to the first random seed]=[r01,r0 2 ,…,r0m]The method comprises the steps of carrying out a first treatment on the surface of the i=1, 2, …, m; r0i is composed of j-dimensional vectors, j=1, 2, …, n; first standard data set X '= [ X1', X of initiator 2 ’,…,xm’]The method comprises the steps of carrying out a first treatment on the surface of the The initiator acquires a first sliced data set X0 '=x' -R0; the partner obtains the second patch data set X1' =r0.
S22, a second random seed generated by the partner is sent to the initiator, and the initiator and the partner generate a second random data set R1 according to the second random seed; the partner fragments the second standard data set Y 'according to the second random data set R1 to obtain a third fragmented data set Y0'; the initiator takes the second random data set R1 as a fourth sliced data set Y1'.
The method comprises the steps that a second random seed1 = k generated by a partner is selected from any one of [1,2, …,1000000], the second random seed is sent to an initiator by the partner, and the initiator and the partner generate a second random data set R1= [ r1i ] = [ R11, R12, …, R1t ]; i=1, 2, …, t; r1i is composed of j-dimensional vectors, j=1, 2, …, n; a second standard dataset Y '= [ Y1', Y2', …, yt' ] of the partner; the partner acquires a third sliced data set Y0 '=y' -R1; the initiator acquires a fourth set of sliced data Y1' =r1.
The random seed is directly sent to the opposite side, and the random data set is not sent to the opposite side, so that the random seed is directly sent to the opposite side, the sending time is saved, and the calculation efficiency is improved because the data volume of the random data set is large and the sending time is long.
S3, the trusted execution environment calculates a product data set c according to the third random data set a0, the fourth random data set b0, the fifth random data set a1 and the sixth random data set b 1; randomly generating a first generated data set c0 with the same size as the c matrix according to the c; calculating a second generated data set c1 according to the c and the c0; the initiator shares a0 and b0 and acquires c0 sent by the trusted execution environment; the partner shares a1 and b1 and acquires c1 sent by the trusted execution environment;
Specifically, the trusted execution environment and the initiator generate a third random data set a0, a0= [ a0i ] based on the same random seed]=[a01,a0 2 ,…,a0m]The method comprises the steps of carrying out a first treatment on the surface of the Wherein i=1, 2, …, m; a0i is composed of j-dimensional vectors, j=1, 2, …, n;
the trusted execution environment and the partner generate a fifth random data set a1, a1= [ a1i ] = [ a11, a12, … a1m ]; wherein i=1, 2, …, m; a1i is composed of j-dimensional vectors, j=1, 2, …, n;
the trusted execution environment and the initiator generate a fourth random data set b0, b0= [ b0i ] based on the same random seed]=[b01,b0 2 ,…,b0t]The method comprises the steps of carrying out a first treatment on the surface of the Wherein i=1, 2, …, t; b0i is composed of j-dimensional vectors, j=1, 2, …, n;
the trusted execution environment and the assembler generate a sixth random data set b1, b1= [ b1i ] = [ b11, b12, … b1t ]; wherein i=1, 2, …, t; b1i is composed of j-dimensional vectors, j=1, 2, …, n;
the trusted execution environment is known as a0, a1, b0, b1, and the product data set c= (a0+a1) (b0+b1) is calculated first, the matrix c rank is m×t, c= [ ci ] = [ c1, c2, …, ct ], i=1, 2, …, t, ci is composed of k-dimensional vectors, k=1, 2, …, m;
the trusted execution environment generates a random integer matrix c0 of m rows and t columns, i.e. a first generated data set, 0= [ c0i ]]=[c01,c0 2 ,…,c0t]I=1, 2, …, t, c0i is composed of k-dimensional vectors, k=1, 2, …, m; the trusted execution environment sends c0 to the initiator;
Calculating a second generated data set c1=c-c 0; c1 = [ c1i ] = [ c11, c12, …, c1t ], i=1, 2, …, t, c1i is composed of k-dimensional vectors, k=1, 2, …, m; the trusted execution environment sends c1 to the partner.
S4, the partner calculates a first public parameter according to the sum of X1', a1 and X0' and a0 of the shared initiator; the initiator calculates a second public parameter according to the Y1', the b0 and the sum of Y0' and b1 of the shared partner;
the initiator calculates the sum value of X0' and a0 and sends the sum value to the partner; the partner calculates a first common parameter according to the sum of X1', a1 and X0' and a 0; the first common parameter X ' +a=x0 ' +a0+x1' +a1;
the partner calculates the sum value of Y0' and b1 and sends the sum value to the initiator; the initiator calculates a second common parameter according to the sum of Y1', b0 and Y0' and b 1; the second common parameter Y ' +b=y0 ' +b1+y1' +b0;
in another embodiment, the partner calculates a first common parameter from the sum of X1', a1, and X0' and a0 of the shared initiator; the partner calculates a second public parameter according to the sum of Y0', b1 and Y1' and b0 of the shared initiator; the initiator calculates a first public parameter according to the sum of X0', a0 and X1' and a1 of the shared partner; the initiator calculates a second common parameter according to the sum of Y1', b0 and Y0' and b1 of the shared partner.
S5, the initiator calculates a first slicing correlation coefficient according to the a0, the b0, the c0, the second common parameter and the shared first common parameter; the partner calculates a second fragment correlation coefficient according to the a1, the b1 and the c1, the first public parameter and the shared second public parameter;
the partner sends the first public parameter to the initiator; the initiator sends the second public parameter to the partner;
the first slicing correlation coefficient is calculated according to the following formula:
corr0=c0-a0*(Y’+b)-(X’+a)*b0+(X’+a)*(Y’+b);
wherein corr0 is a first slicing correlation coefficient, c0 is a first generated data set, a0 is a third random data set, b0 is a fourth random data set, X '+a is a first common parameter, and Y' +b is a second common parameter;
the second slice correlation coefficient is calculated according to the following formula:
corr1=c1-a1*(Y’+b)-(X’+a)*b1;
wherein corr1 is a second slice correlation coefficient, c1 is a second generated data set, a1 is a fifth random data set, b1 is a sixth random data set, X '+a is a first common parameter, and Y' +b is a second common parameter.
In the other embodiment, the initiator calculates a first slicing correlation coefficient according to a0, b0, c0, the first common parameter and the second common parameter; and the partner calculates a second fragment correlation coefficient according to the a1, the b1, the c1, the first public parameter and the second public parameter.
S6, the two parties respectively calculate and obtain the federal correlation coefficient according to the respective slicing correlation coefficient and the slicing correlation coefficient of the shared counterpart;
the initiator sends the first fragment correlation coefficient to the partner; the partner sends the second fragment correlation coefficient to the initiator; both sides calculate to obtain a federal correlation coefficient corr=corr0+corr1 according to the first slicing correlation coefficient and the second slicing correlation coefficient; where corr is the federal correlation coefficient.
S7, carrying out Federal pearson correlation analysis according to the Federal correlation coefficient, determining model training data according to an analysis result, and training a Federal learning model by adopting the model training data;
s8, index prediction is carried out through the federal learning model; the index comprises: fault performance index, and profit index.
Specifically, a new (current) first data set and a new (current) second data set are input into the federal learning model for index prediction.
The method of the invention is illustrated below by scenario one:
the wind farm fan equipment performs fault diagnosis, SCADA operation data, maintenance account, test data, factory design parameters, meteorological data and the like of the equipment or equipment related data of other wind farms, which have no history data for just put into production operation, are needed for equipment fault diagnosis, because commercial confidentiality is involved, part of design data of a manufacturer has great influence on a prediction result, but core data is inconvenient to expose, federal learning solves the problem, data cannot be local, namely the safety of the data is protected, the federal modeling with rich features is realized, and the prediction precision of the model is improved. And (3) mining out the characteristics which are greatly related to a certain fault of the fan from mass data of the wind farm and a partner (manufacturer) by using federal correlation analysis, performing federal correlation analysis on the characteristics, selecting one of the characteristics which are strongly related, removing the other characteristics, and improving nonlinearity among the characteristics, thereby establishing a corresponding characteristic set of each fault mode, and improving efficiency and accuracy of federal modeling.
The correlation analysis of the feature set corresponding to the fan gear box fault model generally selects modeling features according to expert experience, important features can be omitted, and the correlation analysis is related to the accuracy of acquired data of a sensor, so that the expert modeling is assisted by using a data correlation analysis result to be more scientific.
Firstly, longitudinally crossing data, namely, based on time crossing, mainly acquiring data of a SCADA (supervisory control and data acquisition) of a wind power plant, wherein the data comprise characteristics such as wind speed, active power, generator rotating speed, opposite wind angle, low-speed bearing temperature of a fan, high-speed bearing temperature of the fan, bearing temperature of a driving end of the fan, free end bearing temperature of the fan, oil temperature of a gear box and the like, a manufacturer mainly comprises temperature curves of normal operation of the gear box under different working conditions, different characteristic spaces under the same time are acquired as respective data sets, the data sets of the wind power plant are X, and the data sets of the manufacturer are Y under different working conditions;
step two, X, Y data sets are standardized to obtain x and y;
step three, calculating the correlation between the features of the local data set x or y, screening in the feature set with strong correlation, and reducing the linear coupling between the features;
taking x as an example, x has n features, each feature consisting of vectors of m dimensions, x= [ xi ] ]=[x1,x 2 ,…,xn],xi=[xi1,xi 2 ,…,xim]I=1, 2, …, n; the pearson correlation coefficients between n features are calculated to obtain a correlation coefficient matrix P (n, n), each element being P (i, j) =correlation coefficient (xi, xj), where i and j=1, 2, …, n, if P (i, j)>0.95, screening a feature as a feature of the following modeling, assuming k features are screened out, k<n, n-k left features compose new data set x', and the same method, y data set is analyzed by correlation coefficient, and the correlation coefficient is calculated>Features of 0.95 are screened, the number of features is reduced, and a new data set y' is generated.
Step four, performing x ', y' fragmentation of the data set;
generating a first random number seed by a wind power plant and sending the first random number seed to a manufacturer, generating a first random data set r0 by the wind power plant and the manufacturer according to the first random number seed, slicing a data set x 'by the wind power plant according to the first random data set r0 to obtain x0' =x '-r0, and slicing x1' =r0 by the manufacturer; the manufacturer generates a second random seed and sends the second random seed to the wind power plant, the manufacturer and the wind power plant generate a second random data set r1 according to the second random seed, and the manufacturer slices the data set y ' according to the second random data set r1 to obtain y0' =y ' -r1; the wind farm has data y 'slices y1' =r1.
Step five, generating multiplication pairs [ (a 0, b0, c 0), (a 1, b1, c 1) ];
sharing random number seeds with an initiator (wind farm) and a partner (manufacturer) in a trusted execution environment respectively;
the wind power plant sends the sample number s and the characteristic number f0 of the triplet to the trusted execution environment, and the manufacturer sends the characteristic number f1 to the trusted execution environment;
generating random seeds by the wind power plant, sending the random seeds to a trusted execution environment, generating a0 by the wind power plant and the trusted execution environment according to the random seeds, wherein a0 is a matrix of s rows and f0 columns, and generating b0 by the same method, wherein b0 is a matrix of s rows and f1 columns;
the random seeds are generated by manufacturers and sent to the trusted execution environment, the manufacturers and the trusted execution environment generate a1 according to the random seeds, a1 is a matrix of s rows and f0 columns, and b1 is a matrix of s rows and f1 columns in the same method;
trusted execution environment computing a=a0+a1, b=b0+b1;
the trusted execution environment calculates c=a×b, and the c matrix is f0 rows and f1 columns;
the trusted execution environment generates a random number c0 with the size of a matrix c, and sends the random number c0 to a wind power plant; calculating c1=c-c 0, and sending c1 to a manufacturer; c0, c1 are the f0 row, f1 column matrix.
Step six, calculating a public parameter x '+a and y' +b;
the wind farm has x 'slices x0' =x '-r0, and a0 in the multiplication pair, calculating the slices (x' +a) 0=x '-r0+a0 of (x' +a); transmitting the fragment to a manufacturer, wherein the manufacturer has x 'fragments x1' =r0 and a1 in the multiplication pair, and calculates fragments (x '+a) 1=r0+a1 of (x' +a); transmitting the slice to a wind farm;
Each participant calculates the sum (x '+a) 0+ (x' +a) 1=x '-r0+a0+r0+a1=x' +a of the fragments of (x '+a), and the wind power plant and the manufacturer obtain a public parameter x' +a;
as above, the wind farm and manufacturer obtain the public parameter y' +b.
Step seven, calculating a federal correlation coefficient corr;
the wind power plant calculates a first slicing correlation coefficient:
corr0=c0-a0 (y '+b) - (x' +a) b0+ (x '+a) x (y' +b), the wind farm transmitting a first sliced correlation coefficient corr0 to the manufacturer;
calculating a second fragment correlation coefficient by a manufacturer:
corr1=c1-a1 (y '+b) - (x' +a) b1, and the manufacturer sends a second slice correlation coefficient corr1 to the wind farm;
the wind farm and the manufacturer simultaneously acquire federal correlation coefficient corr=corr0+corr1.
Step eight, acquiring a feature set related to a fan gear box fault model through federal correlation analysis, improving the accuracy rate compared with a feature modeling model selected by expert experience, and protecting the safety of data. Training a wind power equipment fault model according to the characteristic values; and predicting fault performance indexes through a wind power equipment fault model.
Specifically, a new (current) screened data set x 'and a new data set y' are input into a wind power equipment fault model to conduct fault performance index prediction, and fault data of the wind power equipment are obtained.
The method of the invention is described below by way of scenario two:
the system comprises a plant-level monitoring system (SIS) and a Management Information System (MIS) of a power plant, wherein the MIS system depends on the SIS and can master the operation condition of each power plant under a group, so that the operation and management of the power plants under the group can be scientifically assisted, the traditional mode is to upload the data of the power plants under each subsidiary company to a cloud server of the group, the problem of data island is solved, the big data mining analysis is realized, the competition of different subsidiary companies exists, the group company wants to apply the operation model of a marker post power plant to other power plants, the potential safety hazard exists when the data is uploaded to the same server, and the big data mining analysis is realized under the condition that the federal learning ensures the data privacy safety. Taking a certain power company operation analysis coefficient as an example, the company needs to analyze real-time power supply coal consumption influence factors, needs to analyze the influence of measurement point data in each power plant sis of the company on the power supply coal consumption, and supposes that the power company is p0, the power company has two power plants p1 and p2, and p0 has company operation financial data, such as real-time power supply coal consumption net profit value y, needs to learn with p1 and p2 federal, and analyzes which indexes of the p1 and p2 power plants are strongly related to a target y, so that a power supply module analysis module is constructed by using the characteristics to realize real-time prediction of the power supply coal consumption net profit;
Step one, p0 acquires data sets of a certain period of time from a mis system and p1 and p2 respectively, longitudinally intersection data, and acquires different feature spaces under the same time as respective data sets based on the intersection time, wherein the data set of p0 is Y, and the data sets of p1 and p2 are X2;
step two, normalizing the X1, X2 and Y data sets to obtain X1, X2 and Y;
step three, calculating the correlation between the features of the local data set x1 or x2, screening in a feature set with a correlation coefficient of >0.95, and reducing the linear coupling between the features, wherein the screened feature set is x1', x2';
step four, performing data set x1', x2', y segmentation, and respectively calculating the segmentation of x1' and y; and fragmentation of x2' and y;
step five, generating multiplication pairs of x1 'and y and x2' and y, which are [ (a 0, b0, c 0), (a 1, b1, c 1) ], [ (a 0', b0', c0 ') ], (a 1', b1', c 1') ] ], wherein a=a0+a1, b=b0+b1; a '=a0' +a1', b' =b0 '+b1' as above;
step six, calculating x1 'and y public parameters x1' +a, y+b, and calculating x2 'and y public parameters x2' +a ', y+b', wherein the method is the same as above;
step seven, calculating federal correlation coefficients corr of x1' and y and federal correlation coefficients corr ' of x2' and y, wherein the method is the same as that described above;
Step eight, the correlation coefficient matrix affecting the net profit of the power supply coal consumption of the power company is [ corr, corr' ], the characteristics with the correlation coefficient larger than 0.95 are screened, the characteristic set related to the net profit model of the power supply coal consumption is obtained through federal correlation analysis, and the data of the characteristics are used for federal modeling to obtain the profit model of the power supply coal consumption; the model can predict the net profit of the power supply coal consumption on line, and is convenient for the scientific decision analysis of the manager.
Specifically, the new (current) screened feature sets are x1', x2' and y, and are input into a power supply coal consumption profit model for profit index prediction, so that power supply coal consumption profit data are obtained.
Fig. 3 is a schematic structural diagram of an index prediction system based on federal pearson correlation analysis according to an embodiment of the present invention, as shown in fig. 3, where the system includes:
a normalizing unit 201, configured to normalize a first data set X of an initiator in longitudinal federal learning to obtain a first standard data set X ', and normalize a second data set Y of a partner to obtain a second standard data set Y';
a slicing unit 202, configured to slice X ' according to the first random data set R0 to obtain a first sliced data set X0', where the partner uses the shared R0 as a second sliced data set X1'; the partner fragments Y ' according to the second random data set R1 to obtain a third fragment data set Y0', and the initiator takes the shared R1 as a fourth fragment data set Y1';
Fig. 4 is a schematic structural diagram of a slicing unit according to an embodiment of the present invention, as shown in fig. 4, where the slicing unit 202 includes:
a first slicing subunit 2021, configured to send a first random seed generated by the initiator to the partner, where the initiator and the partner generate a first random data set R0 according to the first random seed; the initiator fragments the first standard data set X 'according to the first random data set R0 to obtain a first fragmented data set X0'; the partner takes the first random data set R0 as a second fragment data set X1';
a second segmentation subunit 2022, configured to send a second random seed generated by the partner to the initiator, where the initiator and the partner generate a second random data set R1 according to the second random seed; the partner fragments the second standard data set Y 'according to the second random data set R1 to obtain a third fragmented data set Y0'; the initiator takes the second random data set R1 as a fourth sliced data set Y1'.
A generated data set calculating unit 203, configured to calculate a product data set c according to the third random data set a0, the fourth random data set b0, the fifth random data set a1 and the sixth random data set b1 by using the trusted execution environment; randomly generating a first generated data set c0 with the same size as the c matrix according to the c; calculating a second generated data set c1 according to the c and the c0; the initiator shares a0 and b0 and acquires c0 sent by the trusted execution environment; the partner shares a1 and b1 and acquires c1 sent by the trusted execution environment;
A common parameter calculation unit 204, configured to calculate a first common parameter according to the sum of X1', a1 and X0' and a0 of the shared initiator by the partner; the initiator calculates a second public parameter according to the Y1', the b0 and the sum of Y0' and b1 of the shared partner;
a slicing correlation coefficient calculating unit 205, configured to calculate a first slicing correlation coefficient according to a0, b0, c0, the second common parameter and the shared first common parameter by the initiator; the partner calculates a second fragment correlation coefficient according to the a1, the b1 and the c1, the first public parameter and the shared second public parameter;
the federal correlation coefficient calculation unit 206 is configured to calculate federal correlation coefficients according to the respective shard correlation coefficients and the shard correlation coefficients of the shared parties, respectively;
an analysis unit 207, configured to perform federal pearson correlation analysis according to the federal correlation coefficient, determine model training data according to an analysis result, and train a federal learning model using the model training data;
a prediction unit 208, configured to perform index prediction through the federal learning model; the index comprises: fault performance index, and profit index.
The index prediction system based on the federal pearson correlation analysis provided by the invention corresponds to the method, and is not described herein.
The invention has the beneficial effects that:
the invention provides an index prediction method and system based on federal pearson correlation analysis, wherein the method normalizes a data set; slicing the standard data set; in the trusted execution environment computing multiplication pair, the communication overhead of ciphertext computing is reduced, the performance is improved, in addition, in the performance, computing tasks can be executed in a partitioning and parallel mode, the computing efficiency is greatly improved, the correlation of model training data is greatly improved through an analysis result based on federal pearson correlation analysis, and further the prediction efficiency and the prediction accuracy of indexes are improved.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. An index prediction method based on federal pearson correlation analysis, comprising:
S1, normalizing a first data set X of an initiator in longitudinal federal learning to obtain a first standard data set X ', and normalizing a second data set Y of a partner to obtain a second standard data set Y';
s2, the initiator fragments X' according to the first random data set R0 to obtain a first fragmented data set X 0 ' the partner takes shared R0 as the second sliced data set X 1 'A'; the partner slices Y' according to the second random data set R1 to obtain a third sliced data set Y 0 ' the initiator takes the shared R1 as a fourth sliced data set Y 1 ’;
S3, the trusted execution environment calculates a product data set c according to the third random data set a0, the fourth random data set b0, the fifth random data set a1 and the sixth random data set b 1; randomly generating a first generated data set c0 with the same size as the c matrix according to the c; calculating a second generated data set c1 according to the c and the c0; the initiator shares a0 and b0 and acquires c0 sent by the trusted execution environment; the partner shares a1 and b1 and acquires c1 sent by the trusted execution environment;
s4, the partner is according to X 1 ' a1 and X of the shared initiator 0 Calculating the sum of' and a0 to obtain a first common parameter; the initiator according to Y 1 ' b0 and Y of shared partner 0 Calculating the sum of' and b1 to obtain a second common parameter;
s5, the initiator calculates a first slicing correlation coefficient according to the a0, the b0, the c0, the second common parameter and the shared first common parameter; the partner calculates a second fragment correlation coefficient according to the a1, the b1 and the c1, the first public parameter and the shared second public parameter;
s6, the two parties respectively calculate and obtain the federal correlation coefficient according to the respective slicing correlation coefficient and the slicing correlation coefficient of the shared counterpart;
s7, carrying out Federal pearson correlation analysis according to the Federal correlation coefficient, determining model training data according to an analysis result, and training a Federal learning model by adopting the model training data;
s8, index prediction is carried out through the federal learning model; the index comprises: fault performance index, and profit index.
2. The method according to claim 1, wherein S2 comprises:
the method comprises the steps that a first random seed generated by an initiator is sent to a partner, and the initiator and the partner generate a first random data set R0 according to the first random seed; the initiator fragments the first standard data set X' according to the first random data set R0 to obtain a first fragmented data set X 0 'A'; the partner takes the first random data set R0 as a second sliced data set X 1 ’;
The second random seed generated by the partner is sent to the initiator, and the initiator and the partner generate a second random data set R1 according to the second random seed; the partner fragments the second standard data set Y' according to the second random data set R1 to obtain a third fragmented data set Y 0 'A'; the initiator takes the second random data set R1 as a fourth sliced data set Y 1 ’。
3. The method according to claim 1, characterized in that:
the product data set c is calculated according to the following formula:
c=(a0+a1)×(b0+b1);
the second generated dataset c1 is calculated according to the following formula:
c1=c-c0;
wherein a0 is the third random data set, a1 is the fifth random data set, b0 is the fourth random data set, b1 is the sixth random data set, c is the product data set, c0 is the first generated data set, and c1 is the second generated data set.
4. The method according to claim 1, characterized in that:
the first common parameter is calculated according to the following formula:
X’+a=X 0 ’+a0+X 1 ’+a1;
wherein X' +a is a first common parameter, X 0 ' is the first sliced data set, a0 is the third random data set, X 1 ' is the second sliced data set and a1 is the fifth random data set;
The second common parameter is calculated according to the following formula:
Y’+b=Y 0 ’+b1+Y 1 ’+b0;
wherein Y' +b is a second common parameter, Y 0 ' is the third sliced data set, b1 is the sixth random data set, Y 1 ' is the fourth sliced data set and b0 is the fourth random data set.
5. The method according to claim 1, characterized in that:
the first slicing correlation coefficient is calculated according to the following formula:
corr 0 =c0-a0*(Y’+b)-(X’+a)*b0+(X’+a)*(Y’+b);
wherein corr 0 For the first sliced correlation coefficient, c0 is the first generated data set and a0 is the third randomThe data set, b0 is the fourth random data set, X '+a is the first common parameter, Y' +b is the second common parameter;
the second slice correlation coefficient is calculated according to the following formula:
corr 1 =c1-a1*(Y’+b)-(X’+a)*b1;
wherein corr 1 For the second slice correlation coefficient, c1 is the second generated data set, a1 is the fifth random data set, b1 is the sixth random data set, X '+a is the first common parameter, and Y' +b is the second common parameter.
6. An index prediction system based on federal pearson correlation analysis, comprising:
the normalization unit is used for normalizing the first data set X of the initiator in longitudinal federal learning to obtain a first standard data set X ', and normalizing the second data set Y of the partner to obtain a second standard data set Y';
A slicing unit for slicing the X' by the initiator according to the first random data set R0 to obtain a first sliced data set X 0 ' the partner takes shared R0 as the second sliced data set X 1 'A'; the partner slices Y' according to the second random data set R1 to obtain a third sliced data set Y 0 ' the initiator takes the shared R1 as a fourth sliced data set Y 1 ’;
The generated data set calculation unit is used for calculating a product data set c according to the third random data set a0, the fourth random data set b0, the fifth random data set a1 and the sixth random data set b1 by the trusted execution environment; randomly generating a first generated data set c0 with the same size as the c matrix according to the c; calculating a second generated data set c1 according to the c and the c0; the initiator shares a0 and b0 and acquires c0 sent by the trusted execution environment; the partner shares a1 and b1 and acquires c1 sent by the trusted execution environment;
a common parameter calculation unit for the partner according to X 1 ' a1 and X of the shared initiator 0 Calculating the sum of' and a0 to obtain a first common parameter; the initiator according to Y 1 ' b0 and Y of shared partner 0 Calculating the sum of' and b1 to obtain a second common parameter;
The slicing correlation coefficient calculation unit is used for calculating a first slicing correlation coefficient according to the a0, b0, c0, the second common parameter and the shared first common parameter by the initiator; the partner calculates a second fragment correlation coefficient according to the a1, the b1 and the c1, the first public parameter and the shared second public parameter;
the federal correlation coefficient calculation unit is used for calculating the federal correlation coefficient by two parties according to the respective slicing correlation coefficient and the slicing correlation coefficient of the shared counterpart;
the analysis unit is used for carrying out federal pearson correlation analysis according to the federal correlation coefficient, determining model training data according to analysis results, and training a federal learning model by adopting the model training data;
the prediction unit is used for performing index prediction through the federal learning model; the index comprises: fault performance index, and profit index.
7. The system of claim 6, wherein the slicing unit comprises:
the first segmentation subunit is used for sending a first random seed generated by the initiator to the partner, and the initiator and the partner generate a first random data set R0 according to the first random seed; the initiator fragments the first standard data set X' according to the first random data set R0 to obtain a first fragmented data set X 0 'A'; the partner takes the first random data set R0 as a second sliced data set X 1 ’;
The second segmentation subunit is used for sending a second random seed generated by the partner to the initiator, and the initiator and the partner generate a second random data set R1 according to the second random seed; the partner fragments the second standard data set Y' according to the second random data set R1 to obtain a third fragmented data set Y 0 'A'; the initiator takes the second random data set R1 as a fourth sliced data set Y 1 ’。
8. The system according to claim 6, wherein:
the product data set c is calculated according to the following formula:
c=(a0+a1)×(b0+b1);
the second generated dataset c1 is calculated according to the following formula:
c1=c-c0;
wherein a0 is the third random data set, a1 is the fifth random data set, b0 is the fourth random data set, b1 is the sixth random data set, c is the product data set, c0 is the first generated data set, and c1 is the second generated data set.
9. The system according to claim 6, wherein:
the first common parameter is calculated according to the following formula:
X’+a=X 0 ’+a0+X 1 ’+a1;
wherein X' +a is a first common parameter, X 0 ' is the first sliced data set, a0 is the third random data set, X 1 ' is the second sliced data set and a1 is the fifth random data set;
the second common parameter is calculated according to the following formula:
Y’+b=Y 0 ’+b1+Y 1 ’+b0;
wherein Y' +b is a second common parameter, Y 0 ' is the third sliced data set, b1 is the sixth random data set, Y 1 ' is the fourth sliced data set and b0 is the fourth random data set.
10. The system according to claim 6, wherein:
the first slicing correlation coefficient is calculated according to the following formula:
corr 0 =c0-a0*(Y’+b)-(X’+a)*b0+(X’+a)*(Y’+b);
wherein corr 0 For the first sliced correlation coefficient, c0 is the first generated data set, a0 is the third random data set, b0 is the fourth random data setA machine dataset, X '+a being a first common parameter and Y' +b being a second common parameter;
the second slice correlation coefficient is calculated according to the following formula:
corr 1 =c1-a1*(Y’+b)-(X’+a)*b1;
wherein corr 1 For the second slice correlation coefficient, c1 is the second generated data set, a1 is the fifth random data set, b1 is the sixth random data set, X '+a is the first common parameter, and Y' +b is the second common parameter.
CN202310981568.5A 2023-08-04 Index prediction method and system based on federal pearson correlation analysis Active CN117252287B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310981568.5A CN117252287B (en) 2023-08-04 Index prediction method and system based on federal pearson correlation analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310981568.5A CN117252287B (en) 2023-08-04 Index prediction method and system based on federal pearson correlation analysis

Publications (2)

Publication Number Publication Date
CN117252287A true CN117252287A (en) 2023-12-19
CN117252287B CN117252287B (en) 2024-07-05

Family

ID=

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210188306A1 (en) * 2020-12-23 2021-06-24 Nageen Himayat Distributed learning to learn context-specific driving patterns
CN113095514A (en) * 2021-04-26 2021-07-09 深圳前海微众银行股份有限公司 Data processing method, device, equipment, storage medium and program product
CN114202402A (en) * 2021-06-10 2022-03-18 中国工商银行股份有限公司 Behavior characteristic prediction method and device
US20220121934A1 (en) * 2019-01-23 2022-04-21 Deepmind Technologies Limited Identifying neural networks that generate disentangled representations
CN114492605A (en) * 2022-01-12 2022-05-13 杭州博盾习言科技有限公司 Federal learning feature selection method, device and system and electronic equipment
CN115545216A (en) * 2022-10-19 2022-12-30 上海零数众合信息科技有限公司 Service index prediction method, device, equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220121934A1 (en) * 2019-01-23 2022-04-21 Deepmind Technologies Limited Identifying neural networks that generate disentangled representations
US20210188306A1 (en) * 2020-12-23 2021-06-24 Nageen Himayat Distributed learning to learn context-specific driving patterns
CN113095514A (en) * 2021-04-26 2021-07-09 深圳前海微众银行股份有限公司 Data processing method, device, equipment, storage medium and program product
CN114202402A (en) * 2021-06-10 2022-03-18 中国工商银行股份有限公司 Behavior characteristic prediction method and device
CN114492605A (en) * 2022-01-12 2022-05-13 杭州博盾习言科技有限公司 Federal learning feature selection method, device and system and electronic equipment
CN115545216A (en) * 2022-10-19 2022-12-30 上海零数众合信息科技有限公司 Service index prediction method, device, equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈璐: "基于联邦学习的跨源数据错误检测方法", 《软件学报》, 31 March 2023 (2023-03-31), pages 1126 - 1147 *

Similar Documents

Publication Publication Date Title
Cao et al. Interactive temporal recurrent convolution network for traffic prediction in data centers
CN109359385B (en) Training method and device for service quality assessment model
CN108696529A (en) Network security situation awareness analysis system based on multivariate information fusion
Chen et al. Prompt federated learning for weather forecasting: Toward foundation models on meteorological data
CN112784920A (en) Cloud-side-end-coordinated dual-anti-domain self-adaptive fault diagnosis method for rotating part
Karia et al. Fractionally integrated ARMA for crude palm oil prices prediction: case of potentially overdifference
Silveira de Andrade et al. Bayesian GARMA models for count data
Gjika et al. A study on the efficiency of hybrid models in forecasting precipitations and water inflow Albania case study
JP2019125306A (en) Data processing method, data processing device and program
Li et al. Trading strategy design in financial investment through a turning points prediction scheme
CN117252287B (en) Index prediction method and system based on federal pearson correlation analysis
Sun et al. A data privacy protection diagnosis framework for multiple machines vibration signals based on a swarm learning algorithm
CN116992336B (en) Bearing fault diagnosis method based on federal local migration learning
CN117217820A (en) Intelligent integrated prediction method and system for purchasing demand of supply chain
CN117252287A (en) Index prediction method and system based on federal pearson correlation analysis
Qi et al. On-line monitoring data quality of high-dimensional data streams
CN116258274A (en) Power distribution network partition bus voltage prediction method based on longitudinal federal learning
CN113486586B (en) Device health state evaluation method and device, computer device and storage medium
Shan et al. Forecasting study of Shanghai’s and Shenzhen’s stock markets using a hybrid forecast method
CN114154415A (en) Equipment life prediction method and device
Kiefer et al. Artificial intelligence in supply chain management: investigation of transfer learning to improve demand forecasting of intermittent time series with deep learning
Dong et al. Network evolution analysis of nickel futures and the spot price linkage effect based on a distributed lag model
CN116050557A (en) Power load prediction method, device, computer equipment and medium
CN115730631A (en) Method and device for federal learning
Yonghong et al. The construction and application of a new exchange rate forecast model combining ARIMA with a chaotic BP algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant