CN113222169B

CN113222169B - Federated Machine Composition Service Method and System Combined with Big Data Analysis Feedback

Info

Publication number: CN113222169B
Application number: CN202110289138.8A
Authority: CN
Inventors: 邢廷炎; 周长兵; 刘思民
Original assignee: China University of Geosciences Beijing
Current assignee: China University of Geosciences Beijing
Priority date: 2021-03-18
Filing date: 2021-03-18
Publication date: 2023-06-23
Anticipated expiration: 2041-03-18
Also published as: CN113222169A

Abstract

The invention discloses a federal machine combination service system combining big data analysis feedback, which comprises a plurality of devices (1) distributed at different addresses and a big data analysis and scheduling module (2); each device (1) comprises a data preprocessing module (3), a data acquisition module (6) and a data reading module (11); the big data analysis and scheduling module (2) analyzes and schedules all the equipment, the working module and the data progress data which participate in the federal machine learning; the big data analysis and scheduling module (2) is in data communication connection with the data reading module (11), the data training fusion sub-module (4) and the federal data training module (5); the data acquisition module (6) is in data communication connection with the single machine storage module (7), and the single machine storage module (7) is also in data communication connection with the data preprocessing module (3) and the data reading module (11) respectively. According to the federal machine combination service system combining big data analysis feedback, before data recording training is carried out, data is cleaned on the data records, so that abnormal parts of the data records are removed, the accuracy of the data records is guaranteed, and the accuracy of a data model is guaranteed.

Description

Federated Machine Composition Service Method and System Combined with Big Data Analysis Feedback

技术领域technical field

本发明涉及在智能制造技术领域，具体为一种结合大数据分析反馈的联邦机器组合服务方法与系统。The invention relates to the technical field of intelligent manufacturing, and specifically relates to a federated machine combination service method and system combined with big data analysis and feedback.

背景技术Background technique

二十世纪，是一个智能化生产和智能制造的年底，现今生活中，设备不仅自己智能化，自动化，而且，还从以前设备单独的运行，到如今的协同化操作，这其中就离不开跨域、跨设备的运行和协作，而其中必然就涉及了不同设备或领域之间的协同，传统的技术中，如果通常常规的逻辑关系去处理，则不仅处理周期长，导致需要大量数据逻辑的运算，从而对处理器的运算处理能力需求进一步加强，因此，为了适应于大规模的计算和运算处理能力，这必然会提高对处理器的逻辑计算能力需求，而大规模集成电路或超大规模集成电路的运算处理能力，也直接影响到了其生产成本。一方面，需要提升处理器的自身处理能力，另外一方面，希望降低对运算能力的需求，即优化人工智能的运算处理需求，因此，现今，也一直有大量的研究针对算法，已通过算法去改进计算模型，降低运算的需求。另外一方面，对各种智能设备来说，为了实现各种智能化处理的技术效果，需要联合各传感设备或各机构的数据，进行综合逻辑计算后，进行汇总后作出综合判断，但要将分散在各地的数据进行整合在巨大的困难和经济成本，而目前也存在对分布在各地的数据进行联合访问和处理的技术，如联邦机器学习，又名联邦学习，联合学习，联盟学习，联邦机器学习是一个机器学习框架，能有效帮助多个机构在满足用户隐私保护、数据安全和政府法规的要求下，进行数据使用和机器学习建模；The twentieth century is the end of intelligent production and intelligent manufacturing. In today's life, equipment is not only intelligent and automatic, but also from the separate operation of equipment in the past to the collaborative operation of today, which is inseparable from Cross-domain, cross-device operation and collaboration, which inevitably involves collaboration between different devices or fields. In traditional technologies, if the usual logical relationship is used to process, not only the processing cycle is long, but a large amount of data logic is required Therefore, in order to adapt to large-scale computing and computing processing capabilities, this will inevitably increase the demand for logical computing capabilities of processors, and large-scale integrated circuits or ultra-large-scale The computing and processing capabilities of integrated circuits also directly affect their production costs. On the one hand, it is necessary to improve the processing capability of the processor itself. On the other hand, it is hoped to reduce the demand for computing power, that is, to optimize the computing and processing requirements of artificial intelligence. Improve the calculation model to reduce the demand for calculation. On the other hand, for all kinds of smart devices, in order to achieve the technical effects of various intelligent processing, it is necessary to combine the data of each sensor device or each institution, perform comprehensive logic calculations, and make a comprehensive judgment after summarizing. There are huge difficulties and economic costs in integrating scattered data, and there are currently technologies for joint access and processing of distributed data, such as federated machine learning, also known as federated learning, federated learning, federated learning, Federated machine learning is a machine learning framework that can effectively help multiple agencies conduct data usage and machine learning modeling while meeting the requirements of user privacy protection, data security, and government regulations;

如专利CN 110263936A公开了一种横向联邦学习方法，该方法包括：社区协调者获取中央协调者发送的全局模型参数，并将全局模型参数发送至各参与者；获取各参与者发送的基于全局模型参数进行模型训练得到的模型参数更新，对各模型参数更新进行融合以获取社区模型参数更新，并确定是否需要将社区模型参数更新发送至中央协调者；若是，则将社区模型参数更新发送至中央协调者，获取中央协调者返回的全局模型参数更新，并将全局模型参数更新发送至各参与者，以便各参与者基于全局模型参数更新进行模型训练。该发明还公开了一种横向联邦学习装置、设备和一种计算机存储介质。本发明提高了横向联邦学习的学习效率。For example, patent CN 110263936A discloses a horizontal federated learning method, which includes: the community coordinator obtains the global model parameters sent by the central coordinator, and sends the global model parameters to each participant; The model parameter update obtained by model training with parameters, the update of each model parameter is fused to obtain the update of the community model parameter, and it is determined whether the community model parameter update needs to be sent to the central coordinator; if so, the community model parameter update is sent to the central coordinator The coordinator obtains the global model parameter update returned by the central coordinator, and sends the global model parameter update to each participant, so that each participant can perform model training based on the global model parameter update. The invention also discloses a horizontal federated learning device, equipment and a computer storage medium. The invention improves the learning efficiency of horizontal federated learning.

专利CN111324440A公开了一种本发明公开了一种自动化流程的执行方法、装置、设备及可读存储介质，涉及金融科技领域，该方法包括步骤：获取终端对应的多模态数据，将所述多模态数据作为深度学习模型的输入，以得到意图分析模型；通过所述意图分析模型确定终端中自动化流程对应的行为意图；根据所述行为意图执行所述自动化流程对应的目标操作指令，以执行所述自动化流程。该发明实现了通过意图分析模型来分析自动化流程对应的行为意图，根据行为意图来来执行自动化流程，避免自动化流程对应执行环境的变化导致自动化流程执行失败的情况出现，从而提高了自动化流程对不同执行环境的适应性，即提高了自动化流程的适应性，以及提高了自动化流程的执行成功率。Patent CN111324440A discloses an execution method, device, equipment and readable storage medium of an automated process, and relates to the field of financial technology. The modal data is used as the input of the deep learning model to obtain the intention analysis model; the behavior intention corresponding to the automation process in the terminal is determined through the intention analysis model; the target operation instruction corresponding to the automation process is executed according to the behavior intention to execute The automated process. The invention realizes the analysis of the behavior intention corresponding to the automation process through the intention analysis model, executes the automation process according to the behavior intention, and avoids the failure of the execution of the automation process caused by the change of the execution environment corresponding to the automation process, thereby improving the automation process. The adaptability of the execution environment means that the adaptability of the automation process is improved, and the execution success rate of the automation process is improved.

专利CN111882308A公开了一种区块链安全交易方法，包括：收集安全交易中参与的各个节点的交易请求信息；记录所述交易请求信息，并验证合法性；产生交易块，将所述交易块打包至区块链的区块中；获得且验证所述记账权节点转交的记账权；向各个节点广播验证后的所述记账权的归属情况，以使所述各个节点按照预设的共识设算法达成共识；通过所述记账权验证所述交易请求信息，当验证成功后，通知各个节点记录所述交易请求信息，并同步更新各个节点的账户信息。该发明提供的区块链安全交易方法，能够解决现有技术中，交易数据确认时延长，违法信息不能篡改，安全性差，隐私性不佳等问题。Patent CN111882308A discloses a blockchain security transaction method, including: collecting transaction request information of each node participating in the security transaction; recording the transaction request information, and verifying the legality; generating a transaction block, and packaging the transaction block to the block of the block chain; obtain and verify the bookkeeping right transferred by the bookkeeping right node; broadcast the verified ownership of the bookkeeping right to each node, so that each node follows the preset The consensus setting algorithm reaches a consensus; the transaction request information is verified through the bookkeeping right, and when the verification is successful, each node is notified to record the transaction request information, and the account information of each node is updated synchronously. The blockchain security transaction method provided by the invention can solve the problems in the prior art, such as prolonged transaction data confirmation, illegal information cannot be tampered with, poor security, and poor privacy.

专利CN112257876A公开了一种联邦学习方法、装置、计算机设备及介质，属于计算机技术领域。该方法包括：第一计算机设备获取样本标识对应的样本标签信息，获取样本标识对应的第一融合信息；第二计算机设备获取样本标识对应的第二融合信息，向第一计算机设备发送第二融合信息；第一计算机设备基于第一融合信息、第二融合信息和样本标签信息，获取样本标识对应的梯度算子，向第二计算机设备发送梯度算子；第一计算机设备和第二计算机设备基于梯度算子，分别调整机器学习模型中第一子模型的模型参数和第二子模型的模型参数。该方法在保证用户隐私的同时，提高了模型的训练速度，且丰富了样本的特征的信息量，提高了模型的准确率。Patent CN112257876A discloses a federated learning method, device, computer equipment and media, belonging to the field of computer technology. The method includes: the first computer device acquires the sample label information corresponding to the sample identification, and obtains the first fusion information corresponding to the sample identification; the second computer equipment obtains the second fusion information corresponding to the sample identification, and sends the second fusion information to the first computer equipment information; the first computer device obtains the gradient operator corresponding to the sample identification based on the first fusion information, the second fusion information and the sample label information, and sends the gradient operator to the second computer device; the first computer device and the second computer device based on The gradient operator adjusts the model parameters of the first sub-model and the model parameters of the second sub-model in the machine learning model respectively. While ensuring user privacy, this method improves the training speed of the model, enriches the amount of information of the characteristics of the sample, and improves the accuracy of the model.

专利CN112217706A公开了一种数据处理方法、装置、设备，一方面，数据处理系统中的设备采用环形结构连接，每个设备与其他设备有两条通信链路，即使其中一条通信链路暂时中断，该设备也可以通过其他通信链路与其他设备进行通信，该数据处理系统具有很好的稳定性和鲁棒性。另一方面，该数据处理系统在进行数据处理时，每个设备确定的模型参数按照上述通信链路依次传递，且每个设备对接收到的模型参数与自身确定的模型参数融合后再传递，设备之间传递的数据量较小，且无需向一个设备集中发送模型参数，能够有效避免过载和通信拥塞的问题，能够有效提高数据处理速度和效率，保证数据处理的稳定性。Patent CN112217706A discloses a data processing method, device, and equipment. On the one hand, the equipment in the data processing system is connected in a ring structure, and each equipment has two communication links with other equipment. Even if one of the communication links is temporarily interrupted, The device can also communicate with other devices through other communication links, and the data processing system has good stability and robustness. On the other hand, when the data processing system performs data processing, the model parameters determined by each device are sequentially transmitted according to the above-mentioned communication link, and each device fuses the received model parameters with the model parameters determined by itself before transmitting, The amount of data transmitted between devices is small, and there is no need to send model parameters to one device collectively, which can effectively avoid the problems of overload and communication congestion, effectively improve the speed and efficiency of data processing, and ensure the stability of data processing.

专利CN112330048A公开了一种评分卡模型训练方法、装置、存储介质及电子装置，该方法包括：将数据宽表中的连续变量进行分箱得到离散的变量；将该变量输入带约束的逻辑回归模型中，将该逻辑回归模型转换为评分卡模型，并计算该评分卡模型的补偿和刻度，其中，该逻辑回归模型的约束条件是限制该变量系数的下界为非负。通过该发明，由于限制逻辑回归模型中变量系数的下界为非负，解决了相关技术中评分卡模型在使用逻辑回归算法训练模型时自变量间存在多重共线的关系而导致个别变量系数为负，进而导致模型失去原有的解释力的问题，进而达到了避免多次模型迭代、减少模型训练的时间成本和训练开销的效果。Patent CN112330048A discloses a scoring card model training method, device, storage medium and electronic device. The method includes: binning the continuous variables in the wide data table to obtain discrete variables; inputting the variables into a constrained logistic regression model In , convert the logistic regression model into a scorecard model, and calculate the compensation and scale of the scorecard model, where the constraint condition of the logistic regression model is to limit the lower bound of the variable coefficient to be non-negative. Through this invention, since the lower bound of the variable coefficient in the logistic regression model is restricted to be non-negative, it solves the problem that the scorecard model in the related art has a multi-collinear relationship between independent variables when the logistic regression algorithm is used to train the model, which causes the individual variable coefficient to be negative. , which leads to the problem that the model loses its original explanatory power, and thus achieves the effect of avoiding multiple model iterations and reducing the time cost and training overhead of model training.

可见，目前，市面上的基于联邦学习的知识迁移技术方面还存在以下缺陷：It can be seen that at present, the knowledge transfer technology based on federated learning on the market still has the following defects:

1.在现有技术中，对大数据的应用，主要还是用于纯对海量数据的管理，用于经济预测或者商业应用，在工业应用方面或者工业应用指导方面还是很少。1. In the existing technology, the application of big data is mainly used for pure management of massive data, for economic forecasting or commercial applications, and there are still few industrial applications or industrial application guidance.

2.另外一方面，对数据的分组，既没有该相应的技术或相应的技术启示，从而在对数据分组，也不知如何分组，如果依据经验的话，显然不科学，而不科学的分组显然会导致数据训练得到的模型不准确。2. On the other hand, there is neither the corresponding technology nor the corresponding technical inspiration for grouping data, so when grouping data, I don’t know how to group it. If it is based on experience, it is obviously unscientific, and unscientific grouping will obviously The model obtained by data training is inaccurate.

3.现有技术中，对数据记录的训练时，未考虑数据量的大小和数量的多少，直接对所有的数据进行训练以便获得模型时，容易导致数据量太大，从而一方面数据运算量大，数据运算困难；同时数据量量大容易导致数据训练模型不准确。3. In the prior art, when training data records, the size and quantity of data are not considered, and when all data are directly trained in order to obtain a model, it is easy to cause the amount of data to be too large, so that on the one hand, the amount of data calculation Large data calculation is difficult; at the same time, the large amount of data can easily lead to inaccurate data training models.

4.现有技术中，对数据记录中，可能存在的异常数据记录并未进行数据初步清理，容易产生异常数据导致数据训练得到的模型异常。4. In the prior art, in the data records, the abnormal data records that may exist have not been preliminarily cleaned up, and it is easy to generate abnormal data and cause the model obtained by data training to be abnormal.

面对上述技术问题，人们希望提供一种能够快速进行数据训练，同时降低对数据处理系统的能力的需求的联邦机器学习服务方法的技术手段，以便快速处理数据的得到数据模型的技术方案。但到目前为止，现有技术中并无有效办法解决上述技术难题。Faced with the above technical problems, people hope to provide a technical means of federated machine learning service method that can quickly perform data training while reducing the demand on the capabilities of the data processing system, so as to obtain a technical solution for quickly processing data and obtaining a data model. But so far, there is no effective way to solve the above technical problems in the prior art.

面对上述技术问题，希望提供一种结合大数据分析反馈的联邦机器组合服务方法与系统，以解决上述技术问题。Facing the above technical problems, it is hoped to provide a federated machine combination service method and system combined with big data analysis feedback to solve the above technical problems.

发明内容Contents of the invention

针对上述技术问题，本发明的目的在于提供一种结合大数据分析反馈的联邦机器组合服务方法与系统，以解决上述背景技术中提出的问题。In view of the above technical problems, the purpose of the present invention is to provide a federated machine combination service method and system combined with big data analysis feedback to solve the problems raised in the above background technology.

为实现上述目的，本发明提供如下技术方案：To achieve the above object, the present invention provides the following technical solutions:

一种结合大数据分析反馈的联邦机器组合服务系统，包括分布在不同地址的多个设备、大数据分析和调度模块；每个设备包括数据预处理模块、数据获取模块和数据读取模块；A federal machine combination service system combined with big data analysis feedback, including multiple devices distributed at different addresses, big data analysis and scheduling modules; each device includes a data preprocessing module, a data acquisition module and a data reading module;

数据训练融合子模块，所述数据训练融合子模块设置于部分所述设备上；联邦数据训练模块，所述联邦数据训练模块设置于其中一个所述设备上；所有设备都包括单机存储模块，设置有所述数据训练融合子模块的设备上设置局域数据存储模块，设置有所述联邦数据训练模块的所述设备设置有全局数据存储模块；The data training fusion sub-module, the data training fusion sub-module is set on some of the devices; the federated data training module, the federated data training module is set on one of the devices; all devices include a stand-alone storage module, set A local data storage module is set on the device with the data training fusion sub-module, and a global data storage module is set on the device with the federated data training module;

所述大数据分析和调度模块对所有参与联邦机器学习的设备、工作模块和数据进度数据分析和调度；所述大数据分析和调度模块数据通信连接所述数据读取模块、所述数据训练融合子模块和所述联邦数据训练模块；所述数据获取模块和所述单机存储模块数据通信连接，所述单机存储模块还分别和所述数据预处理模块和数据读取模块数据通信连接；The big data analysis and scheduling module analyzes and schedules all equipment, work modules and data progress data participating in federated machine learning; the big data analysis and scheduling module connects the data reading module and the data training fusion The sub-module is connected to the federated data training module; the data acquisition module is connected to the stand-alone storage module in data communication, and the stand-alone storage module is also connected to the data preprocessing module and the data reading module in data communication respectively;

所述设备在运行时，所述数据获取模块获取安装在该设备上的运行数据和状态数据，形成数据记录，并将所述数据记录存储于所述设备的所述单机存储模块，所述数据预处理模块读取存储于所述单机存储模块中的所述数据记录，并利用数理统计方法和设定的要求去分析每条所述数据记录，在发现某条数据记录出现明显不合理时，将该条数据记录删除；When the device is running, the data acquisition module acquires the operating data and status data installed on the device, forms a data record, and stores the data record in the stand-alone storage module of the device, and the data The preprocessing module reads the data records stored in the stand-alone storage module, and uses mathematical statistics methods and set requirements to analyze each data record, and when a certain data record is found to be obviously unreasonable, delete the data record;

所述大数据分析和调度模块利用所述设备以往的运行特点和产生的数据记录量特点，制定所述设备的分组规则，从而所述大数据分析和调度模块对所有的设备进行分组，将所有的所述设备按照一定的规则分成数个分组，并保证每个所述分组中存在至少一个所述数据训练融合子模块，并将所述分组的信息发送给所述数据读取模块、所述数据训练融合子模块和所述联邦数据训练模块，并且修改所述数据读取模块、所述数据训练融合子模块和所述联邦数据训练模块的所述数据记录的读取权限；The big data analysis and scheduling module uses the past operating characteristics of the equipment and the characteristics of the generated data records to formulate grouping rules for the equipment, so that the big data analysis and scheduling module groups all the equipment and divides all The device is divided into several groups according to certain rules, and at least one of the data training fusion sub-modules exists in each group, and the information of the groups is sent to the data reading module, the data training fusion sub-module and the federated data training module, and modify the read permission of the data records of the data reading module, the data training fusion sub-module and the federated data training module;

所述数据训练融合子模块依据所述大数据分析和调度模块分配的读取权限，和其对应的分组的所述数据读取模块建立数据通信连接，从而所述数据训练融合子模块通过所述数据读取模块读取存储于所述单机存储模块中存储的数据记录进行数据学习训练，得出数据联邦子模型，并将该数据联邦子模型和随机从获得该数据联邦子模型中应用的数据记录中抽取一定量的数据记录发送给所述局域数据存储模块；The data training fusion sub-module establishes a data communication connection with the data reading module corresponding to the group according to the read authority assigned by the big data analysis and scheduling module, so that the data training fusion sub-module passes through the The data reading module reads the data records stored in the stand-alone storage module for data learning and training, obtains the data federation sub-model, and combines the data federation sub-model and the data randomly obtained from the data federation sub-model A certain amount of data records are extracted from the records and sent to the local data storage module;

所述联邦数据训练模块读取存储于所述局域数据存储模块中的所述数据联邦子模型和所述数据记录，对所有所述联邦子模型采取参数加权的模式得出总的所述数据联邦模型，并利用读取的抽取的数据记录进行数据训练，得出相应的参数，从而得出总的所述数据联邦模型，并发送给所述全局数据存储模块进行存储；The federated data training module reads the data federated sub-models and the data records stored in the local data storage module, and adopts a parameter weighted mode for all the federated sub-models to obtain the total data Federated model, and use the read and extracted data records for data training to obtain corresponding parameters, so as to obtain the overall data federated model, and send it to the global data storage module for storage;

所述大数据分析和调度模块从所有设备上的所述单机存储模块任意抽取一定数量的数据记录，用于对所述总的所述数据联邦模型进行校验，在所述数据记录利用该总的所述数据联邦模型进行校验中，数据输出和数据记录中的数据符合模型精度要求时，则该总的所述数据联邦模型建立完成，否则，在所述局域数据存储模块中储存的随机抽取的数据记录重新随机抽取，并再次利用所述联邦数据训练模块进行建立总的所述数据联邦模型的过程。The big data analysis and scheduling module arbitrarily extracts a certain number of data records from the stand-alone storage modules on all devices to verify the overall data federation model. During the verification of the data federation model, if the data in the data output and data records meet the model accuracy requirements, then the establishment of the overall data federation model is completed; otherwise, the data stored in the local data storage module Randomly selected data records are randomly selected again, and the process of establishing the overall data federation model is performed by using the federated data training module again.

优选的，在所有所述联邦子模型采取参数加权的模式得出总的所述数据联邦模型的过程中，其中的参数初始值采用所述大数据分析和调度模块根据以往的所述总的所述数据联邦模型采用的参数或者利用大数据分析所述数据记录量的特点作为参数初始值，在此基础上利用读取的抽取的数据记录进行数据记录训练，得出最后相应的参数，从而得出总的所述数据联邦模型，以加数据记录训练的收敛速度。Preferably, in the process of obtaining the overall data federation model in a parameter-weighted mode for all the federated sub-models, the initial values of the parameters are based on the previous total of the data federation models by the big data analysis and scheduling module. The parameters adopted by the above-mentioned data federation model or the characteristics of the data record volume analyzed by using big data are used as the initial value of the parameters, and on this basis, the read and extracted data records are used for data record training, and the final corresponding parameters are obtained. Generate the overall data federation model to increase the convergence speed of data record training.

优选的，在各分组利用所述数据训练融合子模块进行数据训练生成所述数据联邦子模型时，为了增加训练数据的维度，对所述分组，部分分组采用纵向数据联邦学习，剩余部分分组采用联邦迁移学习；或部分分组采用纵向联邦数据学习、部分分组采用横向联邦学习，剩余部分采用联邦迁移学习。Preferably, when each group uses the data training fusion sub-module to perform data training to generate the data federation sub-model, in order to increase the dimension of the training data, for the group, some of the groups adopt longitudinal data federated learning, and the rest of the groups adopt Federated transfer learning; or some groups use vertical federated data learning, some groups use horizontal federated learning, and the rest use federated transfer learning.

优选的，在所述数据预处理模块进行数据记录的数据清理时，所述大数据分析和调度模块利用已有的历史数据记录或数据联邦子模型对数据进行初步数据清理，对每条数据记录进行分析，在数据记录出现偏离一定程度时，剔除该数据记录，以使得数据记录清理更加准确。Preferably, when the data preprocessing module performs data cleaning of data records, the big data analysis and scheduling module uses existing historical data records or data federation sub-models to perform preliminary data cleaning on the data, and for each data record Carry out analysis, and when the data record deviates to a certain extent, the data record is eliminated to make the data record cleaning more accurate.

优选的，所述大数据分析和调度模块，在进行数据记录清理时，将剔除的不合理的数据记录结合以往的设备运行特点，分析出现数据记录异常其中的原因，以便对该数据记录进行修改或者添加数据记录的参数。Preferably, the big data analysis and scheduling module, when cleaning the data records, combines the unreasonable data records eliminated with the previous equipment operation characteristics to analyze the reasons for the abnormal data records, so as to modify the data records Or add parameters for data logging.

优选的，在利用所述大数据分析和调度模块对所有的设备进行分组时，预先利用大数据的预估方法预估获取各所述设备的数据记录量的大小，并且分组时，将数据记录量大的在同一分组，而数据记录量小的为一个分组，以便防止数据记录量大的数据记录在进行数据训练时淹没了数量记录量小的数据，以便构建总的所述数据联邦模型准确性。Preferably, when using the big data analysis and scheduling module to group all the devices, use the big data estimation method to estimate the size of the data record volume of each device in advance, and when grouping, record the data The large amount of data is in the same group, and the small amount of data records is a group, so as to prevent the data records with a large amount of data records from flooding the data with a small amount of records during data training, so that the overall data federation model can be constructed accurately sex.

优选的，在利用所述大数据分析和调度模块对所有的设备进行分组时，预先利用大数据的预估方法预估获取各所述设备的数据记录量的大小，并且分组时，对数据记录量大的分组的所述设备的数量少，而数据记录量小的分组，所述设备的数量多，以便保证每个分组的所述数据记录的数量适中，以便使得所有的所述数据训练融合子模块的计算量合适。Preferably, when using the big data analysis and scheduling module to group all the devices, use the big data estimation method to estimate the size of the data record volume of each device in advance, and when grouping, the data records Groups with a large number of devices have a small number of devices, while groups with a small amount of data records have a large number of devices, so as to ensure that the number of data records in each group is moderate, so that all the data training fusion The amount of calculation of the submodule is appropriate.

优选的，对同一个分组的多个设备上的所述数据训练融合子模块、所述联邦数据训练模块，所述大数据分析和调度模块指定其中的一个所述联邦数据训练模块为该服务迁移系统的所述联邦数据训练模块，同一个分组的其中一个所述数据训练融合子模块为数据训练融合子模块，实现同一分组的数据训练，可以把其中的一个或数个数据训练融合子模块执行的数据训练任务交给所述大数据分析和调度模块执行。Preferably, for the data training fusion sub-module and the federated data training module on multiple devices in the same group, the big data analysis and scheduling module designates one of the federated data training modules as the service migration In the federated data training module of the system, one of the data training fusion sub-modules in the same group is a data training fusion sub-module to realize the data training of the same group, and one or several data training fusion sub-modules can be executed The data training task is handed over to the big data analysis and scheduling module for execution.

另外一方面，本申请还提供一种结合大数据分析反馈的联邦机器组合服务方法，包括结合大数据分析反馈的联邦机器组合服务系统，其特征在于，包括如下步骤：On the other hand, the present application also provides a federated machine combination service method combined with big data analysis feedback, including a federated machine combination service system combined with big data analysis feedback, which is characterized in that it includes the following steps:

步骤S1，初始化所述结合大数据分析反馈的联邦机器组合服务系统，所述大数据分析和调度模块利用所述设备以往的运行特点和产生的数据记录量特点，制定所述设备的分组规则，Step S1, initialize the federated machine combination service system combined with big data analysis feedback, the big data analysis and scheduling module uses the past operating characteristics of the equipment and the characteristics of the generated data records to formulate the grouping rules of the equipment,

从而所述大数据分析和调度模块对所有的设备进行分组，将所有的所述设备按照一定的规则分成数个分组，具体来说，所述大数据分析和调度模块预先利用大数据的预估方法预估获取各所述设备的数据记录量的大小，在利用对所有的设备进行分组时，将数据记录量大的在同一分组，而数据记录量小的为一个分组，以便防止数据记录量大的数据记录在进行数据训练时淹没了数量记录量小的数据，对数据记录量大的分组的所述设备的数量少，而数据记录量小的分组，所述设备的数量多，以便保证每个分组的所述数据记录的数量适中；并保证每个所述分组中存在至少一个所述数据训练融合子模块，并将所述分组的信息发送给所述数据读取模块、所述数据训练融合子模块和所述联邦数据训练模块，并且修改所述数据读取模块、所述数据训练融合子模块和所述联邦数据训练模块的所述数据记录的读取权限；Thus, the big data analysis and scheduling module groups all the devices, and divides all the devices into several groups according to certain rules. Specifically, the big data analysis and scheduling module uses the estimated The method estimates the size of the data recording volume of each of the devices. When using the grouping of all the devices, the data recording volume is large in the same group, and the data recording volume is small into one group, so as to prevent the data recording volume from Large data records submerge the data with a small amount of records during data training, the number of devices for groups with a large amount of data records is small, and the number of devices for groups with a small amount of data records is large, so as to ensure The quantity of the described data record of each grouping is moderate; And guarantee that there is at least one described data training fusion sub-module in each described grouping, and the information of described grouping is sent to described data reading module, described data training the fusion submodule and the federated data training module, and modifying the reading authority of the data records of the data reading module, the data training fusion submodule and the federated data training module;

步骤S2，所述设备在运行时，所述数据获取模块获取安装在该设备上的运行数据和状态数据，形成数据记录，并将所述数据记录存储于所述设备的所述单机存储模块；Step S2, when the device is running, the data acquisition module acquires the running data and status data installed on the device, forms a data record, and stores the data record in the stand-alone storage module of the device;

步骤S3，所述数据预处理模块读取存储于所述单机存储模块中的所述数据记录，并利用数理统计方法和设定的要求去分析每条所述数据记录，在发现某条数据记录出现明显不合理时，将该条数据记录删除；Step S3, the data preprocessing module reads the data records stored in the stand-alone storage module, and uses mathematical statistics methods and set requirements to analyze each of the data records, and when a certain data record is found When it is obviously unreasonable, delete the data record;

步骤S4，所述数据训练融合子模块依据所述大数据分析和调度模块分配的读取权限，和其对应的分组的所述数据读取模块建立数据通信连接，从而所述数据训练融合子模块通过所述数据读取模块读取存储于所述单机存储模块中存储的数据记录进行数据学习训练，得出数据联邦子模型；Step S4, the data training fusion sub-module establishes a data communication connection with the data reading module of its corresponding group according to the read authority assigned by the big data analysis and scheduling module, so that the data training fusion sub-module Reading the data records stored in the stand-alone storage module through the data reading module to perform data learning and training to obtain a data federation sub-model;

步骤S5，将该数据联邦子模型和随机从获得该数据联邦子模型中应用的数据记录中抽取一定量的数据记录发送给所述局域数据存储模块；Step S5, sending the data federation sub-model and a certain amount of data records randomly extracted from the data records used in obtaining the data federation sub-model to the local data storage module;

步骤S6，所述联邦数据训练模块读取存储于所述局域数据存储模块中的所述数据联邦子模型和所述数据记录，对所有所述联邦子模型采取参数加权的模式得出总的所述数据联邦模型，并利用读取的抽取的数据记录进行数据训练，从而得出总的所述数据联邦模型；Step S6, the federated data training module reads the data federated sub-models and the data records stored in the local data storage module, adopts a parameter weighted mode for all the federated sub-models to obtain the total The data federation model, and use the read and extracted data records to perform data training, so as to obtain the overall data federation model;

步骤S7，所述大数据分析和调度模块从所有设备上的所述单机存储模块任意抽取一定数量的数据记录，用于对所述总的所述数据联邦模型进行校验，在所述数据记录利用该总的所述数据联邦模型进行校验中，数据输出和数据记录中的数据符合模型精度要求时，则该总的所述数据联邦模型建立完成，Step S7, the big data analysis and scheduling module arbitrarily extracts a certain number of data records from the stand-alone storage modules on all devices to verify the overall data federation model, and in the data records When the overall data federation model is used for verification, when the data in the data output and data records meet the model accuracy requirements, the establishment of the overall data federation model is completed.

步骤S8，否则，在所述局域数据存储模块中储存的随机抽取的数据记录重新随机抽取，并再次利用所述联邦数据训练模块进行建立总的所述数据联邦模型的过程。Step S8, otherwise, the randomly selected data records stored in the local data storage module are randomly selected again, and the federated data training module is used again to establish the overall data federated model.

与现有技术相比，本发明的有益效果是：Compared with prior art, the beneficial effect of the present invention is:

1、本发明的结合大数据分析反馈的联邦机器组合服务系统，突破传统的全局整体的数据训练以便形成数据联邦模型，采用分布式的数据联邦训练方式，从而一方面可以提高数据样本规模，使得训练更加准确，另外一个可以减小数据整体成本。1. The federal machine combination service system combined with big data analysis feedback of the present invention breaks through the traditional global overall data training in order to form a data federation model, and adopts a distributed data federation training method, so that on the one hand, the scale of data samples can be increased, so that The training is more accurate, and the other can reduce the overall cost of data.

2、本发明的结合大数据分析反馈的联邦机器组合服务系统，利用大数据分析方法，对各设备获取的数据记录利用大数据进行数据分析，去对各设备进行预估，从而获得各设备的预估数据的大小，然后发送给所述联邦机器学习调度模块以便去提供数据分组，从而使得数据分组更加准确有效，有效解决分组的技术问题。2. The federal machine combination service system combined with big data analysis feedback of the present invention uses big data analysis methods to analyze the data records obtained by each device using big data to estimate each device, thereby obtaining the data of each device. Estimate the size of the data, and then send it to the federated machine learning scheduling module to provide data grouping, so that the data grouping is more accurate and effective, and effectively solve the technical problems of grouping.

3、本发明的结合大数据分析反馈的联邦机器组合服务系统，在对设备进行分组以便联邦训练时，将数据记录量大的在同一分组，而数据记录量小的为一个分组，以便防止数据记录量大的数据记录在进行数据训练时淹没了数量记录量小的数据，以便构建总的所述数据联邦模型准确性；同时，对数据记录量大的分组的所述设备的数量少，而数据记录量小的分组，所述设备的数量多，以便保证每个分组的所述数据记录的数量适中，以便使得所有的所述数据训练融合子模块的计算量合适。3. The federal machine combination service system combined with big data analysis feedback of the present invention, when grouping devices for federated training, group the devices with a large amount of data records into the same group, and group the devices with a small amount of data records into one group, so as to prevent data The data records with a large amount of records overwhelm the data with a small amount of records during data training, so as to construct the overall accuracy of the data federation model; at the same time, the number of devices grouped with a large amount of data records is small, and For groups with a small amount of data records, the number of devices is large, so as to ensure that the number of data records in each group is moderate, so that the calculation amount of all the data training and fusion sub-modules is appropriate.

4、本发明的结合大数据分析反馈的联邦机器组合服务系统，对数据记录进行数据清理，以便剔除数据记录异常的部分，同时，对数据异常部分进行合理分析，查找出异常的原因。4. The federal machine combination service system combined with big data analysis and feedback of the present invention cleans up the data records to remove abnormal parts of the data records, and at the same time, conducts reasonable analysis on the abnormal parts of the data to find out the cause of the abnormalities.

附图说明Description of drawings

图1为本发明的整体结构示意图；Fig. 1 is the overall structure schematic diagram of the present invention;

图2为本发明的设备中设置的各模块的数据流程结构示意图。Fig. 2 is a schematic diagram of the data flow structure of each module provided in the device of the present invention.

图中：1、设备；2、大数据分析和调度模块；3、数据预处理模块；4、数据融合子模块；5、联邦数据训练模块；6、数据传感模块；7、单机存储模块；8、局域数据存储设备；9、全局数据存储模块；10、分组；11、数据读取模块。In the figure: 1. Equipment; 2. Big data analysis and scheduling module; 3. Data preprocessing module; 4. Data fusion sub-module; 5. Federation data training module; 6. Data sensing module; 7. Stand-alone storage module; 8. Local data storage device; 9. Global data storage module; 10. Grouping; 11. Data reading module.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

具体实施例一：Specific embodiment one:

一种结合大数据分析反馈的联邦机器组合服务系统，包括分布在不同地址的多个设备1、大数据分析和调度模块2；每个设备1包括数据预处理模块3、数据获取模块6和数据读取模块11；A federal machine combination service system combined with big data analysis feedback, including multiple devices 1 distributed at different addresses, big data analysis and scheduling modules 2; each device 1 includes a data preprocessing module 3, a data acquisition module 6 and a data read module 11;

数据训练融合子模块4，所述数据训练融合子模块4设置于部分所述设备上；联邦数据训练模块5，所述联邦数据训练模块5设置于其中一个所述设备1上；所有设备1都包括单机存储模块7，设置有所述数据训练融合子模块4的设备上设置局域数据存储模块8，设置有所述联邦数据训练模块5的所述设备1设置有全局数据存储模块9；The data training fusion sub-module 4, the data training fusion sub-module 4 is set on some of the devices; the federated data training module 5, the federated data training module 5 is set on one of the devices 1; all the devices 1 Including a stand-alone storage module 7, a local data storage module 8 is set on the device provided with the data training fusion sub-module 4, and a global data storage module 9 is provided on the device 1 provided with the federated data training module 5;

所述大数据分析和调度模块2对所有参与联邦机器学习的设备、工作模块和数据进度数据分析和调度；所述大数据分析和调度模块2数据通信连接所述数据读取模块11、所述数据训练融合子模块4和所述联邦数据训练模块5；所述数据获取模块6和所述单机存储模块7数据通信连接，所述单机存储模块7还分别和所述数据预处理模块3和数据读取模块11数据通信连接；The big data analysis and scheduling module 2 analyzes and schedules all equipment, work modules and data progress data participating in federated machine learning; the big data analysis and scheduling module 2 data communication connects the data reading module 11, the Data training fusion sub-module 4 and the federated data training module 5; the data acquisition module 6 and the stand-alone storage module 7 are connected in data communication, and the stand-alone storage module 7 is also respectively connected with the data preprocessing module 3 and the data Read module 11 data communication connection;

所述设备1在运行时，所述数据获取模块6获取安装在该设备1上的运行数据和状态数据，形成数据记录，并将所述数据记录存储于所述设备1的所述单机存储模块7，所述数据预处理模块3读取存储于所述单机存储模块7中的所述数据记录，并利用数理统计方法和设定的要求去分析每条所述数据记录，在发现某条数据记录出现明显不合理时，将该条数据记录删除；When the device 1 is running, the data acquisition module 6 obtains the running data and status data installed on the device 1 to form a data record, and stores the data record in the stand-alone storage module of the device 1 7. The data preprocessing module 3 reads the data records stored in the stand-alone storage module 7, and uses mathematical statistics methods and set requirements to analyze each of the data records, and when a certain data is found When the record is obviously unreasonable, delete the data record;

所述大数据分析和调度模块2利用所述设备以往的运行特点和产生的数据记录量特点，制定所述设备的分组规则，从而所述大数据分析和调度模块2对所有的设备1进行分组，将所有的所述设备1按照一定的规则分成数个分组10，并保证每个所述分组10中存在至少一个所述数据训练融合子模块4，并将所述分组的信息发送给所述数据读取模块11、所述数据训练融合子模块4和所述联邦数据训练模块5，并且修改所述数据读取模块11、所述数据训练融合子模块4和所述联邦数据训练模块5的所述数据记录的读取权限；The big data analysis and scheduling module 2 utilizes the past operating characteristics of the equipment and the generated data record volume characteristics to formulate grouping rules for the equipment, so that the big data analysis and scheduling module 2 groups all the equipment 1 Divide all the devices 1 into several groups 10 according to certain rules, and ensure that there is at least one data training fusion sub-module 4 in each group 10, and send the group information to the Data reading module 11, the data training fusion sub-module 4 and the federated data training module 5, and modify the data reading module 11, the data training fusion sub-module 4 and the federated data training module 5 read access to said data records;

所述数据训练融合子模块4依据所述大数据分析和调度模块2分配的读取权限，和其对应的分组的所述数据读取模块11建立数据通信连接，从而所述数据训练融合子模块4通过所述数据读取模块11读取存储于所述单机存储模块7中存储的数据记录进行数据学习训练，得出数据联邦子模型，并将该数据联邦子模型和随机从获得该数据联邦子模型中应用的数据记录中抽取一定量的数据记录发送给所述局域数据存储模块8；The data training fusion sub-module 4 establishes a data communication connection with the data reading module 11 of its corresponding group according to the read authority assigned by the big data analysis and scheduling module 2, so that the data training fusion sub-module 4 Read the data records stored in the stand-alone storage module 7 through the data reading module 11 to perform data learning and training to obtain a data federation sub-model, and combine the data federation sub-model and the data federation obtained randomly A certain amount of data records are extracted from the data records applied in the sub-model and sent to the local data storage module 8;

所述联邦数据训练模块5读取存储于所述局域数据存储模块8中的所述数据联邦子模型和所述数据记录，对所有所述联邦子模型采取参数加权的模式得出总的所述数据联邦模型，并利用读取的抽取的数据记录进行数据训练，得出相应的参数，从而得出总的所述数据联邦模型，并发送给所述全局数据存储模块9进行存储；The federated data training module 5 reads the data federated sub-models and the data records stored in the local data storage module 8, and adopts a parameter weighted mode for all the federated sub-models to obtain the total The data federation model is described, and the extracted data records are used for data training to obtain corresponding parameters, so as to obtain the overall data federation model, and send to the global data storage module 9 for storage;

所述大数据分析和调度模块2从所有设备1上的所述单机存储模块7任意抽取一定数量的数据记录，用于对所述总的所述数据联邦模型进行校验，在所述数据记录利用该总的所述数据联邦模型进行校验中，数据输出和数据记录中的数据符合模型精度要求时，则该总的所述数据联邦模型建立完成，否则，在所述局域数据存储模块8中储存的随机抽取的数据记录重新随机抽取，并再次利用所述联邦数据训练模块5进行建立总的所述数据联邦模型的过程。The big data analysis and scheduling module 2 arbitrarily extracts a certain number of data records from the stand-alone storage modules 7 on all devices 1 to verify the overall data federation model. In the data records When the overall data federation model is used for verification, when the data in the data output and data records meet the model accuracy requirements, the establishment of the overall data federation model is completed; otherwise, in the local data storage module The randomly extracted data records stored in step 8 are re-selected randomly, and the federated data training module 5 is used again to establish the overall data federated model.

优选的，在所有所述联邦子模型采取参数加权的模式得出总的所述数据联邦模型的过程中，其中的参数初始值采用所述大数据分析和调度模块1根据以往的所述总的所述数据联邦模型采用的参数或者利用大数据分析所述数据记录量的特点作为参数初始值，在此基础上利用读取的抽取的数据记录进行数据记录训练，得出最后相应的参数，从而得出总的所述数据联邦模型，以加数据记录训练的收敛速度。Preferably, in the process of obtaining the overall data federation model in a parameter-weighted mode for all the federated sub-models, the initial value of the parameters is based on the previous overall data of the big data analysis and scheduling module 1 The parameters used by the data federation model or the characteristics of the data record volume analyzed by big data are used as the initial value of the parameters, and on this basis, the extracted data records are used for data record training to obtain the final corresponding parameters, so that The overall data federation model is obtained to increase the convergence speed of data recording training.

优选的，在各分组利用所述数据训练融合子模块4进行数据训练生成所述数据联邦子模型时，为了增加训练数据的维度，对所述分组，部分分组采用纵向数据联邦学习，剩余部分分组采用联邦迁移学习；或部分分组采用纵向联邦数据学习、部分分组采用横向联邦学习，剩余部分采用联邦迁移学习。Preferably, when each group uses the data training fusion sub-module 4 to perform data training to generate the data federation sub-model, in order to increase the dimension of the training data, for the group, some of the groups adopt longitudinal data federated learning, and the rest of the groups are grouped Adopt federated transfer learning; or use vertical federated data learning for some groups, horizontal federated learning for some groups, and federated transfer learning for the rest.

优选的，在所述数据预处理模块3进行数据记录的数据清理时，所述大数据分析和调度模块2利用已有的历史数据记录或数据联邦子模型对数据进行初步数据清理，对每条数据记录进行分析，在数据记录出现偏离一定程度时，剔除该数据记录，以使得数据记录清理更加准确。Preferably, when the data preprocessing module 3 performs data cleaning of data records, the big data analysis and scheduling module 2 uses existing historical data records or data federation sub-models to perform preliminary data cleaning on the data, and for each The data records are analyzed, and when the data records deviate to a certain extent, the data records are eliminated to make the data record cleaning more accurate.

优选的，所述大数据分析和调度模块2，在进行数据记录清理时，将剔除的不合理的数据记录结合以往的设备运行特点，分析出现数据记录异常其中的原因，以便对该数据记录进行修改或者添加数据记录的参数。Preferably, the big data analysis and scheduling module 2, when cleaning up data records, combines the unreasonable data records eliminated with the previous equipment operation characteristics to analyze the reasons for abnormal data records, so that the data records can be processed. Modify or add parameters of data records.

优选的，在利用所述大数据分析和调度模块2对所有的设备1进行分组时，预先利用大数据的预估方法预估获取各所述设备1的数据记录量的大小，并且分组时，将数据记录量大的在同一分组，而数据记录量小的为一个分组，以便防止数据记录量大的数据记录在进行数据训练时淹没了数量记录量小的数据，以便构建总的所述数据联邦模型准确性。Preferably, when using the big data analysis and scheduling module 2 to group all the devices 1, use the big data estimation method to estimate the size of the data records of each device 1 in advance, and when grouping, Group the data with a large amount of records into the same group, and group the data with a small amount of data records into one group, so as to prevent the data records with a large amount of data records from submerging the data with a small amount of records during data training, so as to construct the total data Federated Model Accuracy.

优选的，在利用所述大数据分析和调度模块2对所有的设备1进行分组时，预先利用大数据的预估方法预估获取各所述设备1的数据记录量的大小，并且分组时，对数据记录量大的分组的所述设备1的数量少，而数据记录量小的分组，所述设备1的数量多，以便保证每个分组的所述数据记录的数量适中，以便使得所有的所述数据训练融合子模块4的计算量合适。Preferably, when using the big data analysis and scheduling module 2 to group all the devices 1, use the big data estimation method to estimate the size of the data records of each device 1 in advance, and when grouping, For groups with a large amount of data records, the number of devices 1 is small, and for groups with a small amount of data records, the number of devices 1 is large, so as to ensure that the number of data records in each group is moderate, so that all The calculation amount of the data training fusion sub-module 4 is appropriate.

优选的，对同一个分组的多个设备上的所述数据训练融合子模块4、所述联邦数据训练模块5，所述大数据分析和调度模块2指定其中的一个所述联邦数据训练模块5为该服务迁移系统的所述联邦数据训练模块5，同一个分组的其中一个所述数据训练融合子模块4为数据训练融合子模块，实现同一分组的数据训练，可以把其中的一个或数个数据训练融合子模块4执行的数据训练任务交给所述大数据分析和调度模块2执行。Preferably, for the data training fusion sub-module 4 and the federated data training module 5 on multiple devices in the same group, the big data analysis and scheduling module 2 specifies one of the federated data training modules 5 For the federated data training module 5 of the service migration system, one of the data training fusion sub-modules 4 of the same group is a data training fusion sub-module to realize the data training of the same group, and one or several of them can be The data training task performed by the data training fusion sub-module 4 is handed over to the big data analysis and scheduling module 2 for execution.

具体实施例二：Specific embodiment two:

一种结合大数据分析反馈的联邦机器组合服务方法，包括结合大数据分析反馈的联邦机器组合服务系统，包括如下步骤：A federated machine combination service method combined with big data analysis feedback, including a federated machine combination service system combined with big data analysis feedback, including the following steps:

步骤S1，初始化所述结合大数据分析反馈的联邦机器组合服务系统，所述大数据分析和调度模块2利用所述设备以往的运行特点和产生的数据记录量特点，制定所述设备的分组规则，Step S1, initialize the federated machine combination service system combined with big data analysis feedback, the big data analysis and scheduling module 2 uses the past operating characteristics of the equipment and the characteristics of the generated data records to formulate the grouping rules of the equipment ,

从而所述大数据分析和调度模块2对所有的设备1进行分组，将所有的所述设备1按照一定的规则分成数个分组10，具体来说，所述大数据分析和调度模块2预先利用大数据的预估方法预估获取各所述设备1的数据记录量的大小，在利用对所有的设备1进行分组时，将数据记录量大的在同一分组，而数据记录量小的为一个分组，以便防止数据记录量大的数据记录在进行数据训练时淹没了数量记录量小的数据，对数据记录量大的分组的所述设备1的数量少，而数据记录量小的分组，所述设备1的数量多，以便保证每个分组的所述数据记录的数量适中；并保证每个所述分组10中存在至少一个所述数据训练融合子模块4，并将所述分组的信息发送给所述数据读取模块11、所述数据训练融合子模块4和所述联邦数据训练模块5，并且修改所述数据读取模块11、所述数据训练融合子模块4和所述联邦数据训练模块5的所述数据记录的读取权限；Thus, the big data analysis and scheduling module 2 groups all the devices 1, and divides all the devices 1 into several groups 10 according to certain rules. Specifically, the big data analysis and scheduling module 2 pre-uses The big data estimation method estimates the size of the data record volume of each of the devices 1 acquired. When grouping all the devices 1 by using Grouping, in order to prevent the data records with a large amount of data records from flooding the data with a small amount of records during data training, the number of devices 1 in the group with a large amount of data records is small, and the grouping with a small amount of data records, so The number of said devices 1 is large, so as to ensure that the number of said data records of each group is moderate; and ensure that there is at least one said data training fusion sub-module 4 in each said group 10, and send the information of said group Give the data reading module 11, the data training fusion sub-module 4 and the federated data training module 5, and modify the data reading module 11, the data training fusion sub-module 4 and the federated data training Read access to said data records of module 5;

步骤S2，所述设备1在运行时，所述数据获取模块6获取安装在该设备1上的运行数据和状态数据，形成数据记录，并将所述数据记录存储于所述设备1的所述单机存储模块7；Step S2, when the device 1 is running, the data acquisition module 6 obtains the running data and status data installed on the device 1, forms a data record, and stores the data record in the Stand-alone storage module 7;

步骤S3，所述数据预处理模块3读取存储于所述单机存储模块7中的所述数据记录，并利用数理统计方法和设定的要求去分析每条所述数据记录，在发现某条数据记录出现明显不合理时，将该条数据记录删除；Step S3, the data preprocessing module 3 reads the data records stored in the stand-alone storage module 7, and uses mathematical statistics methods and set requirements to analyze each of the data records, and when a certain data record is found When the data record is obviously unreasonable, delete the data record;

步骤S4，所述数据训练融合子模块4依据所述大数据分析和调度模块2分配的读取权限，和其对应的分组的所述数据读取模块11建立数据通信连接，从而所述数据训练融合子模块4通过所述数据读取模块11读取存储于所述单机存储模块7中存储的数据记录进行数据学习训练，得出数据联邦子模型；Step S4, the data training fusion sub-module 4 establishes a data communication connection with the data reading module 11 of its corresponding group according to the reading authority assigned by the big data analysis and scheduling module 2, so that the data training The fusion sub-module 4 reads the data records stored in the stand-alone storage module 7 through the data reading module 11 and carries out data learning and training to obtain the data federation sub-model;

步骤S5，将该数据联邦子模型和随机从获得该数据联邦子模型中应用的数据记录中抽取一定量的数据记录发送给所述局域数据存储模块8；Step S5, sending the data federation sub-model and a certain amount of data records randomly extracted from the data records used in obtaining the data federation sub-model to the local data storage module 8;

步骤S6，所述联邦数据训练模块5读取存储于所述局域数据存储模块8中的所述数据联邦子模型和所述数据记录，对所有所述联邦子模型采取参数加权的模式得出总的所述数据联邦模型，并利用读取的抽取的数据记录进行数据训练，从而得出总的所述数据联邦模型；Step S6, the federated data training module 5 reads the data federated sub-models and the data records stored in the local data storage module 8, and adopts a parameter weighted mode for all the federated sub-models to obtain Summarize the data federation model, and use the read and extracted data records to perform data training, so as to obtain the overall data federation model;

步骤S7，所述大数据分析和调度模块2从所有设备1上的所述单机存储模块7任意抽取一定数量的数据记录，用于对所述总的所述数据联邦模型进行校验，在所述数据记录利用该总的所述数据联邦模型进行校验中，数据输出和数据记录中的数据符合模型精度要求时，则该总的所述数据联邦模型建立完成，Step S7, the big data analysis and scheduling module 2 arbitrarily extracts a certain number of data records from the stand-alone storage modules 7 on all devices 1, and uses them to verify the overall data federation model. When the data records are checked using the overall data federation model, when the data in the data output and data records meet the model accuracy requirements, the overall data federation model is established.

步骤S8，否则，在所述局域数据存储模块8中储存的随机抽取的数据记录重新随机抽取，并再次利用所述联邦数据训练模块5进行建立总的所述数据联邦模型的过程。Step S8, otherwise, the randomly selected data records stored in the local data storage module 8 are randomly selected again, and the federated data training module 5 is used again to establish the overall data federated model.

需要说明的是，在本文中，诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来，而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者设备所固有的要素。It should be noted that in this article, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply that there is a relationship between these entities or operations. There is no such actual relationship or order between them. Furthermore, the term "comprises", "comprises" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus comprising a set of elements includes not only those elements, but also includes elements not expressly listed. other elements of or also include elements inherent in such a process, method, article, or device.

尽管已经示出和描述了本发明的实施例，对于本领域的普通技术人员而言，可以理解在不脱离本发明的原理和精神的情况下可以对这些实施例进行多种变化、修改、替换和变型，本发明的范围由所附权利要求及其等同物限定。Although the embodiments of the present invention have been shown and described, those skilled in the art can understand that various changes, modifications and substitutions can be made to these embodiments without departing from the principle and spirit of the present invention. and modifications, the scope of the invention is defined by the appended claims and their equivalents.

Claims

1. A federal machine combination service system combined with big data analysis feedback, including multiple devices (1) distributed at different addresses, big data analysis and scheduling modules (2); each device (1) includes a data preprocessing module (3), data acquisition module (6) and data reading module (11);

The data training fusion sub-module (4), the data training fusion sub-module (4) is set on some of the devices; the federated data training module (5), the federated data training module (5) is set in one of the On the device (1); all devices (1) include a stand-alone storage module (7), and a local data storage module (8) is set on the device provided with the data training fusion sub-module (4), provided with the federation The device (1) of the data training module (5) is provided with a global data storage module (9);

The big data analysis and scheduling module (2) analyzes and schedules all equipment, work modules and data progress data participating in federated machine learning; the big data analysis and scheduling module (2) data communication connects the data reading module (11), the data training fusion sub-module (4) and the federated data training module (5); the data acquisition module (6) is connected to the stand-alone storage module (7) data communication, and the stand-alone storage The module (7) is also connected with the data communication of the data preprocessing module (3) and the data reading module (11) respectively;

It is characterized by:

When the device (1) is running, the data acquisition module (6) obtains the running data and status data installed on the device (1), forms a data record, and stores the data record in the device ( 1) the stand-alone storage module (7), the data preprocessing module (3) reads the data records stored in the stand-alone storage module (7), and uses mathematical statistics methods and set requirements To analyze each of the above data records, and delete the data record when it is found that a certain data record is obviously unreasonable;

The big data analysis and scheduling module (2) utilizes the past operating characteristics of the equipment and the characteristics of the amount of data records generated to formulate grouping rules for the equipment, so that the big data analysis and scheduling module (2) performs all The device (1) is grouped, and all the devices (1) are divided into several groups (10) according to certain rules, and it is guaranteed that there is at least one of the data training fusion submodules in each group (10) ( 4), and send the grouped information to the data reading module (11), the data training fusion sub-module (4) and the federated data training module (5), and modify the data reading module (11), the data training fusion sub-module (4) and the read authority of the data records of the federated data training module (5);

The data training fusion sub-module (4) establishes a data communication connection with the data reading module (11) of its corresponding grouping according to the reading authority assigned by the big data analysis and scheduling module (2), so that the The data training fusion sub-module (4) reads the data records stored in the stand-alone storage module (7) through the data reading module (11) and performs data learning and training to obtain a data federation sub-model, and The data federation sub-model and a certain amount of data records randomly extracted from the data records applied in the data federation sub-model are sent to the local data storage module (8);

The federated data training module (5) reads the data federated sub-model and the data records stored in the local data storage module (8), and adopts a parameter weighted mode for all the federated sub-models to obtain Go out the total data federation model, and utilize the extracted data record of reading to carry out data training, obtain corresponding parameter, thereby draw the total data federation model, and send to described global data storage module (9) to store;

The big data analysis and scheduling module (2) arbitrarily extracts a certain number of data records from the stand-alone storage modules (7) on all devices (1), and is used to verify the overall data federation model. When the data records are checked using the general data federation model, when the data in the data output and data records meet the model accuracy requirements, the establishment of the general data federation model is completed; otherwise, the local data storage module The randomly extracted data records stored in (8) are randomly extracted again, and the process of establishing a total data federation model is carried out by using the federated data training module (5) again;

Wherein, the manner of dividing all the devices (1) into several groups (10) according to certain rules is:

When using the big data analysis and scheduling module (2) to group all the devices (1), use a big data estimation method in advance to estimate the size of the data records acquired by each of the devices (1), and When grouping, the data records with a large amount of data are grouped together, and the data records with a small amount of data are grouped into one group, so as to prevent the data records with a large amount of data records from submerging the data with a small amount of records during data training; When the big data analysis and scheduling module (2) groups all the devices (1), it uses the estimation method of big data to estimate the size of the data records of each device (1), and when grouping, the A group with a large amount of data recording has a small number of devices ( 1 ), while a group with a small amount of data recording has a large number of devices ( 1 ).

2. A federated machine combination service system combined with big data analysis feedback according to claim 1, characterized in that: in the process of obtaining the overall data federated model by adopting a parameter weighted mode for all the federated sub-models, The initial value of the parameter adopts the parameters adopted by the big data analysis and scheduling module (2) according to the previous total data federation model or utilizes the characteristics of the data record volume analyzed by the big data as the initial value of the parameter, on this basis Use the read and extracted data records to perform data record training, and obtain the final corresponding parameters, so as to obtain the overall data federation model, so as to increase the convergence speed of data record training.

3. A kind of federal machine combination service system combined with big data analysis feedback according to claim 1, characterized in that: use the data training fusion sub-module (4) in each group to carry out data training to generate the data federation sub-module For the model, in order to increase the dimension of the training data, some of the groups adopt longitudinal data federated learning, and the rest of the groups adopt federated transfer learning; or some of the groups adopt vertical federated data learning, some groups adopt horizontal federated learning, and the rest adopt Federated Transfer Learning.

4. A kind of federal machine combination service system combined with big data analysis feedback according to claim 1, characterized in that: when the data preprocessing module (3) performs data cleaning of data records, the big data analysis and scheduling module (2) use the existing historical data records or data federation sub-model to perform preliminary data cleaning on the data, analyze each data record, and remove the data record when the data record deviates to a certain extent, so that the data Record cleaning is more accurate.

5. A federated machine combination service system combined with big data analysis feedback according to claim 4, characterized in that: said big data analysis and scheduling module (2), when cleaning up data records, removes unidentified Reasonable data records are combined with the past equipment operation characteristics to analyze the reasons for abnormal data records, so as to modify the data records or add data record parameters.

6. A kind of federated machine combination service system combined with big data analysis feedback according to claim 1, characterized in that: the data training fusion sub-module (4), the The federated data training module (5), the big data analysis and scheduling module (2) designates one of the federated data training modules (5) as the federated data training module (5) of the federated machine combination service system, One of the data training fusion submodules (4) of the same group is a data training fusion submodule, which realizes the data training of the same group, and the data training performed by one or several data training fusion submodules (4) can be carried out. The task is handed over to the big data analysis and scheduling module (2) for execution.

7. A federated machine combination service method combined with big data analysis feedback, comprising the federated machine combination service system combined with big data analysis feedback as described in any one of claims 1-6, characterized in that it comprises the following steps:

Step S1, initialize the federated machine combination service system combined with big data analysis feedback, the big data analysis and scheduling module (2) utilizes the past operating characteristics of the equipment and the characteristics of the generated data records to formulate the grouping rules,

Thus, the big data analysis and scheduling module (2) groups all the devices (1), divides all the devices (1) into several groups (10) according to certain rules, specifically, the big The data analysis and scheduling module (2) uses the estimation method of big data to estimate and acquire the size of the data recording volume of each said equipment (1) in advance, and when all equipments (1) are grouped, the data recording volume The large ones are in the same group, and the ones with a small amount of data records are grouped into one group, so as to prevent the data records with a large amount of data records from flooding the data with a small amount of records during data training, and the devices for groups with a large amount of data records The quantity of (1) is few, and the grouping of data record amount is little, the quantity of described equipment (1) is many, so that guarantee the quantity of the described data record of each grouping; And guarantee that in each described grouping (10) There is at least one said data training fusion sub-module (4), and the information of said grouping is sent to said data reading module (11), said data training fusion sub-module (4) and said federated data training module (5), and modify the reading authority of the data records of the data reading module (11), the data training fusion sub-module (4) and the federated data training module (5);

Step S2, when the device (1) is running, the data acquisition module (6) obtains the running data and status data installed on the device (1), forms a data record, and stores the data record in the The stand-alone storage module (7) of the device (1);

Step S3, the data preprocessing module (3) reads the data records stored in the stand-alone storage module (7), and uses mathematical statistics methods and set requirements to analyze each of the data records, When a data record is found to be obviously unreasonable, delete the data record;

Step S4, the data training fusion sub-module (4) establishes a data communication connection with the data reading module (11) of its corresponding group according to the read permission assigned by the big data analysis and scheduling module (2) , so that the data training fusion sub-module (4) reads and stores the data records stored in the stand-alone storage module (7) through the data reading module (11) to carry out data learning and training, and obtains the data federation sub-model ;

Step S5, sending the data federation sub-model and a certain amount of data records randomly extracted from the data records used in obtaining the data federation sub-model to the local data storage module (8);

Step S6, the federated data training module (5) reads the data federated sub-models and the data records stored in the local data storage module (8), and adopts parameter weighting for all the federated sub-models The overall data federation model is obtained from the model, and the data training is performed using the read and extracted data records, so as to obtain the overall data federation model;

Step S7, the big data analysis and scheduling module (2) arbitrarily extracts a certain number of data records from the stand-alone storage modules (7) on all devices (1) for calibrating the overall data federation model In the verification of the data records using the general data federation model, when the data in the data output and data records meet the model accuracy requirements, the establishment of the general data federation model is completed.

Step S8, otherwise, the randomly extracted data records stored in the local data storage module (8) are rerandomly extracted, and the federated data training module (5) is used again to establish a general data federated model.

8. A kind of federal machine combination service method combined with big data analysis feedback according to claim 7, characterized in that: the data training fusion sub-module (4), the The federated data training module (5), the big data analysis and scheduling module (2) designates one of the federated data training modules (5) as the federated data training module (5) of the federated machine combination service system, One of the data training fusion submodules (4) of the same group is a data training fusion submodule, which realizes the data training of the same group, and the data training performed by one or several data training fusion submodules (4) can be carried out. The task is handed over to the big data analysis and scheduling module (2) for execution.