CN113222169B - Federated Machine Composition Service Method and System Combined with Big Data Analysis Feedback - Google Patents
Federated Machine Composition Service Method and System Combined with Big Data Analysis Feedback Download PDFInfo
- Publication number
- CN113222169B CN113222169B CN202110289138.8A CN202110289138A CN113222169B CN 113222169 B CN113222169 B CN 113222169B CN 202110289138 A CN202110289138 A CN 202110289138A CN 113222169 B CN113222169 B CN 113222169B
- Authority
- CN
- China
- Prior art keywords
- data
- module
- training
- federated
- records
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000007405 data analysis Methods 0.000 title claims abstract description 96
- 238000000034 method Methods 0.000 title claims description 57
- 238000012549 training Methods 0.000 claims abstract description 188
- 230000004927 fusion Effects 0.000 claims abstract description 79
- 238000003860 storage Methods 0.000 claims abstract description 41
- 238000004891 communication Methods 0.000 claims abstract description 22
- 238000007781 pre-processing Methods 0.000 claims abstract description 18
- 238000010801 machine learning Methods 0.000 claims abstract description 11
- 230000002159 abnormal effect Effects 0.000 claims abstract description 9
- 238000013500 data storage Methods 0.000 claims description 29
- 238000004140 cleaning Methods 0.000 claims description 12
- 239000000284 extract Substances 0.000 claims description 6
- 238000013526 transfer learning Methods 0.000 claims description 6
- 238000012795 verification Methods 0.000 claims description 5
- 238000013499 data model Methods 0.000 abstract description 2
- 238000012545 processing Methods 0.000 description 15
- 238000004364 calculation method Methods 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 8
- 238000004458 analytical method Methods 0.000 description 6
- 238000007477 logistic regression Methods 0.000 description 5
- 230000006399 behavior Effects 0.000 description 4
- 238000004519 manufacturing process Methods 0.000 description 4
- 238000013508 migration Methods 0.000 description 4
- 230000005012 migration Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000005856 abnormality Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000004806 packaging method and process Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000002035 prolonged effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Medical Informatics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
技术领域technical field
本发明涉及在智能制造技术领域,具体为一种结合大数据分析反馈的联邦机器组合服务方法与系统。The invention relates to the technical field of intelligent manufacturing, and specifically relates to a federated machine combination service method and system combined with big data analysis and feedback.
背景技术Background technique
二十世纪,是一个智能化生产和智能制造的年底,现今生活中,设备不仅自己智能化,自动化,而且,还从以前设备单独的运行,到如今的协同化操作,这其中就离不开跨域、跨设备的运行和协作,而其中必然就涉及了不同设备或领域之间的协同,传统的技术中,如果通常常规的逻辑关系去处理,则不仅处理周期长,导致需要大量数据逻辑的运算,从而对处理器的运算处理能力需求进一步加强,因此,为了适应于大规模的计算和运算处理能力,这必然会提高对处理器的逻辑计算能力需求,而大规模集成电路或超大规模集成电路的运算处理能力,也直接影响到了其生产成本。一方面,需要提升处理器的自身处理能力,另外一方面,希望降低对运算能力的需求,即优化人工智能的运算处理需求,因此,现今,也一直有大量的研究针对算法,已通过算法去改进计算模型,降低运算的需求。另外一方面,对各种智能设备来说,为了实现各种智能化处理的技术效果,需要联合各传感设备或各机构的数据,进行综合逻辑计算后,进行汇总后作出综合判断,但要将分散在各地的数据进行整合在巨大的困难和经济成本,而目前也存在对分布在各地的数据进行联合访问和处理的技术,如联邦机器学习,又名联邦学习,联合学习,联盟学习,联邦机器学习是一个机器学习框架,能有效帮助多个机构在满足用户隐私保护、数据安全和政府法规的要求下,进行数据使用和机器学习建模;The twentieth century is the end of intelligent production and intelligent manufacturing. In today's life, equipment is not only intelligent and automatic, but also from the separate operation of equipment in the past to the collaborative operation of today, which is inseparable from Cross-domain, cross-device operation and collaboration, which inevitably involves collaboration between different devices or fields. In traditional technologies, if the usual logical relationship is used to process, not only the processing cycle is long, but a large amount of data logic is required Therefore, in order to adapt to large-scale computing and computing processing capabilities, this will inevitably increase the demand for logical computing capabilities of processors, and large-scale integrated circuits or ultra-large-scale The computing and processing capabilities of integrated circuits also directly affect their production costs. On the one hand, it is necessary to improve the processing capability of the processor itself. On the other hand, it is hoped to reduce the demand for computing power, that is, to optimize the computing and processing requirements of artificial intelligence. Improve the calculation model to reduce the demand for calculation. On the other hand, for all kinds of smart devices, in order to achieve the technical effects of various intelligent processing, it is necessary to combine the data of each sensor device or each institution, perform comprehensive logic calculations, and make a comprehensive judgment after summarizing. There are huge difficulties and economic costs in integrating scattered data, and there are currently technologies for joint access and processing of distributed data, such as federated machine learning, also known as federated learning, federated learning, federated learning, Federated machine learning is a machine learning framework that can effectively help multiple agencies conduct data usage and machine learning modeling while meeting the requirements of user privacy protection, data security, and government regulations;
如专利CN 110263936A公开了一种横向联邦学习方法,该方法包括:社区协调者获取中央协调者发送的全局模型参数,并将全局模型参数发送至各参与者;获取各参与者发送的基于全局模型参数进行模型训练得到的模型参数更新,对各模型参数更新进行融合以获取社区模型参数更新,并确定是否需要将社区模型参数更新发送至中央协调者;若是,则将社区模型参数更新发送至中央协调者,获取中央协调者返回的全局模型参数更新,并将全局模型参数更新发送至各参与者,以便各参与者基于全局模型参数更新进行模型训练。该发明还公开了一种横向联邦学习装置、设备和一种计算机存储介质。本发明提高了横向联邦学习的学习效率。For example, patent CN 110263936A discloses a horizontal federated learning method, which includes: the community coordinator obtains the global model parameters sent by the central coordinator, and sends the global model parameters to each participant; The model parameter update obtained by model training with parameters, the update of each model parameter is fused to obtain the update of the community model parameter, and it is determined whether the community model parameter update needs to be sent to the central coordinator; if so, the community model parameter update is sent to the central coordinator The coordinator obtains the global model parameter update returned by the central coordinator, and sends the global model parameter update to each participant, so that each participant can perform model training based on the global model parameter update. The invention also discloses a horizontal federated learning device, equipment and a computer storage medium. The invention improves the learning efficiency of horizontal federated learning.
专利CN111324440A公开了一种本发明公开了一种自动化流程的执行方法、装置、设备及可读存储介质,涉及金融科技领域,该方法包括步骤:获取终端对应的多模态数据,将所述多模态数据作为深度学习模型的输入,以得到意图分析模型;通过所述意图分析模型确定终端中自动化流程对应的行为意图;根据所述行为意图执行所述自动化流程对应的目标操作指令,以执行所述自动化流程。该发明实现了通过意图分析模型来分析自动化流程对应的行为意图,根据行为意图来来执行自动化流程,避免自动化流程对应执行环境的变化导致自动化流程执行失败的情况出现,从而提高了自动化流程对不同执行环境的适应性,即提高了自动化流程的适应性,以及提高了自动化流程的执行成功率。Patent CN111324440A discloses an execution method, device, equipment and readable storage medium of an automated process, and relates to the field of financial technology. The modal data is used as the input of the deep learning model to obtain the intention analysis model; the behavior intention corresponding to the automation process in the terminal is determined through the intention analysis model; the target operation instruction corresponding to the automation process is executed according to the behavior intention to execute The automated process. The invention realizes the analysis of the behavior intention corresponding to the automation process through the intention analysis model, executes the automation process according to the behavior intention, and avoids the failure of the execution of the automation process caused by the change of the execution environment corresponding to the automation process, thereby improving the automation process. The adaptability of the execution environment means that the adaptability of the automation process is improved, and the execution success rate of the automation process is improved.
专利CN111882308A公开了一种区块链安全交易方法,包括:收集安全交易中参与的各个节点的交易请求信息;记录所述交易请求信息,并验证合法性;产生交易块,将所述交易块打包至区块链的区块中;获得且验证所述记账权节点转交的记账权;向各个节点广播验证后的所述记账权的归属情况,以使所述各个节点按照预设的共识设算法达成共识;通过所述记账权验证所述交易请求信息,当验证成功后,通知各个节点记录所述交易请求信息,并同步更新各个节点的账户信息。该发明提供的区块链安全交易方法,能够解决现有技术中,交易数据确认时延长,违法信息不能篡改,安全性差,隐私性不佳等问题。Patent CN111882308A discloses a blockchain security transaction method, including: collecting transaction request information of each node participating in the security transaction; recording the transaction request information, and verifying the legality; generating a transaction block, and packaging the transaction block to the block of the block chain; obtain and verify the bookkeeping right transferred by the bookkeeping right node; broadcast the verified ownership of the bookkeeping right to each node, so that each node follows the preset The consensus setting algorithm reaches a consensus; the transaction request information is verified through the bookkeeping right, and when the verification is successful, each node is notified to record the transaction request information, and the account information of each node is updated synchronously. The blockchain security transaction method provided by the invention can solve the problems in the prior art, such as prolonged transaction data confirmation, illegal information cannot be tampered with, poor security, and poor privacy.
专利CN112257876A公开了一种联邦学习方法、装置、计算机设备及介质,属于计算机技术领域。该方法包括:第一计算机设备获取样本标识对应的样本标签信息,获取样本标识对应的第一融合信息;第二计算机设备获取样本标识对应的第二融合信息,向第一计算机设备发送第二融合信息;第一计算机设备基于第一融合信息、第二融合信息和样本标签信息,获取样本标识对应的梯度算子,向第二计算机设备发送梯度算子;第一计算机设备和第二计算机设备基于梯度算子,分别调整机器学习模型中第一子模型的模型参数和第二子模型的模型参数。该方法在保证用户隐私的同时,提高了模型的训练速度,且丰富了样本的特征的信息量,提高了模型的准确率。Patent CN112257876A discloses a federated learning method, device, computer equipment and media, belonging to the field of computer technology. The method includes: the first computer device acquires the sample label information corresponding to the sample identification, and obtains the first fusion information corresponding to the sample identification; the second computer equipment obtains the second fusion information corresponding to the sample identification, and sends the second fusion information to the first computer equipment information; the first computer device obtains the gradient operator corresponding to the sample identification based on the first fusion information, the second fusion information and the sample label information, and sends the gradient operator to the second computer device; the first computer device and the second computer device based on The gradient operator adjusts the model parameters of the first sub-model and the model parameters of the second sub-model in the machine learning model respectively. While ensuring user privacy, this method improves the training speed of the model, enriches the amount of information of the characteristics of the sample, and improves the accuracy of the model.
专利CN112217706A公开了一种数据处理方法、装置、设备,一方面,数据处理系统中的设备采用环形结构连接,每个设备与其他设备有两条通信链路,即使其中一条通信链路暂时中断,该设备也可以通过其他通信链路与其他设备进行通信,该数据处理系统具有很好的稳定性和鲁棒性。另一方面,该数据处理系统在进行数据处理时,每个设备确定的模型参数按照上述通信链路依次传递,且每个设备对接收到的模型参数与自身确定的模型参数融合后再传递,设备之间传递的数据量较小,且无需向一个设备集中发送模型参数,能够有效避免过载和通信拥塞的问题,能够有效提高数据处理速度和效率,保证数据处理的稳定性。Patent CN112217706A discloses a data processing method, device, and equipment. On the one hand, the equipment in the data processing system is connected in a ring structure, and each equipment has two communication links with other equipment. Even if one of the communication links is temporarily interrupted, The device can also communicate with other devices through other communication links, and the data processing system has good stability and robustness. On the other hand, when the data processing system performs data processing, the model parameters determined by each device are sequentially transmitted according to the above-mentioned communication link, and each device fuses the received model parameters with the model parameters determined by itself before transmitting, The amount of data transmitted between devices is small, and there is no need to send model parameters to one device collectively, which can effectively avoid the problems of overload and communication congestion, effectively improve the speed and efficiency of data processing, and ensure the stability of data processing.
专利CN112330048A公开了一种评分卡模型训练方法、装置、存储介质及电子装置,该方法包括:将数据宽表中的连续变量进行分箱得到离散的变量;将该变量输入带约束的逻辑回归模型中,将该逻辑回归模型转换为评分卡模型,并计算该评分卡模型的补偿和刻度,其中,该逻辑回归模型的约束条件是限制该变量系数的下界为非负。通过该发明,由于限制逻辑回归模型中变量系数的下界为非负,解决了相关技术中评分卡模型在使用逻辑回归算法训练模型时自变量间存在多重共线的关系而导致个别变量系数为负,进而导致模型失去原有的解释力的问题,进而达到了避免多次模型迭代、减少模型训练的时间成本和训练开销的效果。Patent CN112330048A discloses a scoring card model training method, device, storage medium and electronic device. The method includes: binning the continuous variables in the wide data table to obtain discrete variables; inputting the variables into a constrained logistic regression model In , convert the logistic regression model into a scorecard model, and calculate the compensation and scale of the scorecard model, where the constraint condition of the logistic regression model is to limit the lower bound of the variable coefficient to be non-negative. Through this invention, since the lower bound of the variable coefficient in the logistic regression model is restricted to be non-negative, it solves the problem that the scorecard model in the related art has a multi-collinear relationship between independent variables when the logistic regression algorithm is used to train the model, which causes the individual variable coefficient to be negative. , which leads to the problem that the model loses its original explanatory power, and thus achieves the effect of avoiding multiple model iterations and reducing the time cost and training overhead of model training.
可见,目前,市面上的基于联邦学习的知识迁移技术方面还存在以下缺陷:It can be seen that at present, the knowledge transfer technology based on federated learning on the market still has the following defects:
1.在现有技术中,对大数据的应用,主要还是用于纯对海量数据的管理,用于经济预测或者商业应用,在工业应用方面或者工业应用指导方面还是很少。1. In the existing technology, the application of big data is mainly used for pure management of massive data, for economic forecasting or commercial applications, and there are still few industrial applications or industrial application guidance.
2.另外一方面,对数据的分组,既没有该相应的技术或相应的技术启示,从而在对数据分组,也不知如何分组,如果依据经验的话,显然不科学,而不科学的分组显然会导致数据训练得到的模型不准确。2. On the other hand, there is neither the corresponding technology nor the corresponding technical inspiration for grouping data, so when grouping data, I don’t know how to group it. If it is based on experience, it is obviously unscientific, and unscientific grouping will obviously The model obtained by data training is inaccurate.
3.现有技术中,对数据记录的训练时,未考虑数据量的大小和数量的多少,直接对所有的数据进行训练以便获得模型时,容易导致数据量太大,从而一方面数据运算量大,数据运算困难;同时数据量量大容易导致数据训练模型不准确。3. In the prior art, when training data records, the size and quantity of data are not considered, and when all data are directly trained in order to obtain a model, it is easy to cause the amount of data to be too large, so that on the one hand, the amount of data calculation Large data calculation is difficult; at the same time, the large amount of data can easily lead to inaccurate data training models.
4.现有技术中,对数据记录中,可能存在的异常数据记录并未进行数据初步清理,容易产生异常数据导致数据训练得到的模型异常。4. In the prior art, in the data records, the abnormal data records that may exist have not been preliminarily cleaned up, and it is easy to generate abnormal data and cause the model obtained by data training to be abnormal.
面对上述技术问题,人们希望提供一种能够快速进行数据训练,同时降低对数据处理系统的能力的需求的联邦机器学习服务方法的技术手段,以便快速处理数据的得到数据模型的技术方案。但到目前为止,现有技术中并无有效办法解决上述技术难题。Faced with the above technical problems, people hope to provide a technical means of federated machine learning service method that can quickly perform data training while reducing the demand on the capabilities of the data processing system, so as to obtain a technical solution for quickly processing data and obtaining a data model. But so far, there is no effective way to solve the above technical problems in the prior art.
面对上述技术问题,希望提供一种结合大数据分析反馈的联邦机器组合服务方法与系统,以解决上述技术问题。Facing the above technical problems, it is hoped to provide a federated machine combination service method and system combined with big data analysis feedback to solve the above technical problems.
发明内容Contents of the invention
针对上述技术问题,本发明的目的在于提供一种结合大数据分析反馈的联邦机器组合服务方法与系统,以解决上述背景技术中提出的问题。In view of the above technical problems, the purpose of the present invention is to provide a federated machine combination service method and system combined with big data analysis feedback to solve the problems raised in the above background technology.
为实现上述目的,本发明提供如下技术方案:To achieve the above object, the present invention provides the following technical solutions:
一种结合大数据分析反馈的联邦机器组合服务系统,包括分布在不同地址的多个设备、大数据分析和调度模块;每个设备包括数据预处理模块、数据获取模块和数据读取模块;A federal machine combination service system combined with big data analysis feedback, including multiple devices distributed at different addresses, big data analysis and scheduling modules; each device includes a data preprocessing module, a data acquisition module and a data reading module;
数据训练融合子模块,所述数据训练融合子模块设置于部分所述设备上;联邦数据训练模块,所述联邦数据训练模块设置于其中一个所述设备上;所有设备都包括单机存储模块,设置有所述数据训练融合子模块的设备上设置局域数据存储模块,设置有所述联邦数据训练模块的所述设备设置有全局数据存储模块;The data training fusion sub-module, the data training fusion sub-module is set on some of the devices; the federated data training module, the federated data training module is set on one of the devices; all devices include a stand-alone storage module, set A local data storage module is set on the device with the data training fusion sub-module, and a global data storage module is set on the device with the federated data training module;
所述大数据分析和调度模块对所有参与联邦机器学习的设备、工作模块和数据进度数据分析和调度;所述大数据分析和调度模块数据通信连接所述数据读取模块、所述数据训练融合子模块和所述联邦数据训练模块;所述数据获取模块和所述单机存储模块数据通信连接,所述单机存储模块还分别和所述数据预处理模块和数据读取模块数据通信连接;The big data analysis and scheduling module analyzes and schedules all equipment, work modules and data progress data participating in federated machine learning; the big data analysis and scheduling module connects the data reading module and the data training fusion The sub-module is connected to the federated data training module; the data acquisition module is connected to the stand-alone storage module in data communication, and the stand-alone storage module is also connected to the data preprocessing module and the data reading module in data communication respectively;
所述设备在运行时,所述数据获取模块获取安装在该设备上的运行数据和状态数据,形成数据记录,并将所述数据记录存储于所述设备的所述单机存储模块,所述数据预处理模块读取存储于所述单机存储模块中的所述数据记录,并利用数理统计方法和设定的要求去分析每条所述数据记录,在发现某条数据记录出现明显不合理时,将该条数据记录删除;When the device is running, the data acquisition module acquires the operating data and status data installed on the device, forms a data record, and stores the data record in the stand-alone storage module of the device, and the data The preprocessing module reads the data records stored in the stand-alone storage module, and uses mathematical statistics methods and set requirements to analyze each data record, and when a certain data record is found to be obviously unreasonable, delete the data record;
所述大数据分析和调度模块利用所述设备以往的运行特点和产生的数据记录量特点,制定所述设备的分组规则,从而所述大数据分析和调度模块对所有的设备进行分组,将所有的所述设备按照一定的规则分成数个分组,并保证每个所述分组中存在至少一个所述数据训练融合子模块,并将所述分组的信息发送给所述数据读取模块、所述数据训练融合子模块和所述联邦数据训练模块,并且修改所述数据读取模块、所述数据训练融合子模块和所述联邦数据训练模块的所述数据记录的读取权限;The big data analysis and scheduling module uses the past operating characteristics of the equipment and the characteristics of the generated data records to formulate grouping rules for the equipment, so that the big data analysis and scheduling module groups all the equipment and divides all The device is divided into several groups according to certain rules, and at least one of the data training fusion sub-modules exists in each group, and the information of the groups is sent to the data reading module, the data training fusion sub-module and the federated data training module, and modify the read permission of the data records of the data reading module, the data training fusion sub-module and the federated data training module;
所述数据训练融合子模块依据所述大数据分析和调度模块分配的读取权限,和其对应的分组的所述数据读取模块建立数据通信连接,从而所述数据训练融合子模块通过所述数据读取模块读取存储于所述单机存储模块中存储的数据记录进行数据学习训练,得出数据联邦子模型,并将该数据联邦子模型和随机从获得该数据联邦子模型中应用的数据记录中抽取一定量的数据记录发送给所述局域数据存储模块;The data training fusion sub-module establishes a data communication connection with the data reading module corresponding to the group according to the read authority assigned by the big data analysis and scheduling module, so that the data training fusion sub-module passes through the The data reading module reads the data records stored in the stand-alone storage module for data learning and training, obtains the data federation sub-model, and combines the data federation sub-model and the data randomly obtained from the data federation sub-model A certain amount of data records are extracted from the records and sent to the local data storage module;
所述联邦数据训练模块读取存储于所述局域数据存储模块中的所述数据联邦子模型和所述数据记录,对所有所述联邦子模型采取参数加权的模式得出总的所述数据联邦模型,并利用读取的抽取的数据记录进行数据训练,得出相应的参数,从而得出总的所述数据联邦模型,并发送给所述全局数据存储模块进行存储;The federated data training module reads the data federated sub-models and the data records stored in the local data storage module, and adopts a parameter weighted mode for all the federated sub-models to obtain the total data Federated model, and use the read and extracted data records for data training to obtain corresponding parameters, so as to obtain the overall data federated model, and send it to the global data storage module for storage;
所述大数据分析和调度模块从所有设备上的所述单机存储模块任意抽取一定数量的数据记录,用于对所述总的所述数据联邦模型进行校验,在所述数据记录利用该总的所述数据联邦模型进行校验中,数据输出和数据记录中的数据符合模型精度要求时,则该总的所述数据联邦模型建立完成,否则,在所述局域数据存储模块中储存的随机抽取的数据记录重新随机抽取,并再次利用所述联邦数据训练模块进行建立总的所述数据联邦模型的过程。The big data analysis and scheduling module arbitrarily extracts a certain number of data records from the stand-alone storage modules on all devices to verify the overall data federation model. During the verification of the data federation model, if the data in the data output and data records meet the model accuracy requirements, then the establishment of the overall data federation model is completed; otherwise, the data stored in the local data storage module Randomly selected data records are randomly selected again, and the process of establishing the overall data federation model is performed by using the federated data training module again.
优选的,在所有所述联邦子模型采取参数加权的模式得出总的所述数据联邦模型的过程中,其中的参数初始值采用所述大数据分析和调度模块根据以往的所述总的所述数据联邦模型采用的参数或者利用大数据分析所述数据记录量的特点作为参数初始值,在此基础上利用读取的抽取的数据记录进行数据记录训练,得出最后相应的参数,从而得出总的所述数据联邦模型,以加数据记录训练的收敛速度。Preferably, in the process of obtaining the overall data federation model in a parameter-weighted mode for all the federated sub-models, the initial values of the parameters are based on the previous total of the data federation models by the big data analysis and scheduling module. The parameters adopted by the above-mentioned data federation model or the characteristics of the data record volume analyzed by using big data are used as the initial value of the parameters, and on this basis, the read and extracted data records are used for data record training, and the final corresponding parameters are obtained. Generate the overall data federation model to increase the convergence speed of data record training.
优选的,在各分组利用所述数据训练融合子模块进行数据训练生成所述数据联邦子模型时,为了增加训练数据的维度,对所述分组,部分分组采用纵向数据联邦学习,剩余部分分组采用联邦迁移学习;或部分分组采用纵向联邦数据学习、部分分组采用横向联邦学习,剩余部分采用联邦迁移学习。Preferably, when each group uses the data training fusion sub-module to perform data training to generate the data federation sub-model, in order to increase the dimension of the training data, for the group, some of the groups adopt longitudinal data federated learning, and the rest of the groups adopt Federated transfer learning; or some groups use vertical federated data learning, some groups use horizontal federated learning, and the rest use federated transfer learning.
优选的,在所述数据预处理模块进行数据记录的数据清理时,所述大数据分析和调度模块利用已有的历史数据记录或数据联邦子模型对数据进行初步数据清理,对每条数据记录进行分析,在数据记录出现偏离一定程度时,剔除该数据记录,以使得数据记录清理更加准确。Preferably, when the data preprocessing module performs data cleaning of data records, the big data analysis and scheduling module uses existing historical data records or data federation sub-models to perform preliminary data cleaning on the data, and for each data record Carry out analysis, and when the data record deviates to a certain extent, the data record is eliminated to make the data record cleaning more accurate.
优选的,所述大数据分析和调度模块,在进行数据记录清理时,将剔除的不合理的数据记录结合以往的设备运行特点,分析出现数据记录异常其中的原因,以便对该数据记录进行修改或者添加数据记录的参数。Preferably, the big data analysis and scheduling module, when cleaning the data records, combines the unreasonable data records eliminated with the previous equipment operation characteristics to analyze the reasons for the abnormal data records, so as to modify the data records Or add parameters for data logging.
优选的,在利用所述大数据分析和调度模块对所有的设备进行分组时,预先利用大数据的预估方法预估获取各所述设备的数据记录量的大小,并且分组时,将数据记录量大的在同一分组,而数据记录量小的为一个分组,以便防止数据记录量大的数据记录在进行数据训练时淹没了数量记录量小的数据,以便构建总的所述数据联邦模型准确性。Preferably, when using the big data analysis and scheduling module to group all the devices, use the big data estimation method to estimate the size of the data record volume of each device in advance, and when grouping, record the data The large amount of data is in the same group, and the small amount of data records is a group, so as to prevent the data records with a large amount of data records from flooding the data with a small amount of records during data training, so that the overall data federation model can be constructed accurately sex.
优选的,在利用所述大数据分析和调度模块对所有的设备进行分组时,预先利用大数据的预估方法预估获取各所述设备的数据记录量的大小,并且分组时,对数据记录量大的分组的所述设备的数量少,而数据记录量小的分组,所述设备的数量多,以便保证每个分组的所述数据记录的数量适中,以便使得所有的所述数据训练融合子模块的计算量合适。Preferably, when using the big data analysis and scheduling module to group all the devices, use the big data estimation method to estimate the size of the data record volume of each device in advance, and when grouping, the data records Groups with a large number of devices have a small number of devices, while groups with a small amount of data records have a large number of devices, so as to ensure that the number of data records in each group is moderate, so that all the data training fusion The amount of calculation of the submodule is appropriate.
优选的,对同一个分组的多个设备上的所述数据训练融合子模块、所述联邦数据训练模块,所述大数据分析和调度模块指定其中的一个所述联邦数据训练模块为该服务迁移系统的所述联邦数据训练模块,同一个分组的其中一个所述数据训练融合子模块为数据训练融合子模块,实现同一分组的数据训练,可以把其中的一个或数个数据训练融合子模块执行的数据训练任务交给所述大数据分析和调度模块执行。Preferably, for the data training fusion sub-module and the federated data training module on multiple devices in the same group, the big data analysis and scheduling module designates one of the federated data training modules as the service migration In the federated data training module of the system, one of the data training fusion sub-modules in the same group is a data training fusion sub-module to realize the data training of the same group, and one or several data training fusion sub-modules can be executed The data training task is handed over to the big data analysis and scheduling module for execution.
另外一方面,本申请还提供一种结合大数据分析反馈的联邦机器组合服务方法,包括结合大数据分析反馈的联邦机器组合服务系统,其特征在于,包括如下步骤:On the other hand, the present application also provides a federated machine combination service method combined with big data analysis feedback, including a federated machine combination service system combined with big data analysis feedback, which is characterized in that it includes the following steps:
步骤S1,初始化所述结合大数据分析反馈的联邦机器组合服务系统,所述大数据分析和调度模块利用所述设备以往的运行特点和产生的数据记录量特点,制定所述设备的分组规则,Step S1, initialize the federated machine combination service system combined with big data analysis feedback, the big data analysis and scheduling module uses the past operating characteristics of the equipment and the characteristics of the generated data records to formulate the grouping rules of the equipment,
从而所述大数据分析和调度模块对所有的设备进行分组,将所有的所述设备按照一定的规则分成数个分组,具体来说,所述大数据分析和调度模块预先利用大数据的预估方法预估获取各所述设备的数据记录量的大小,在利用对所有的设备进行分组时,将数据记录量大的在同一分组,而数据记录量小的为一个分组,以便防止数据记录量大的数据记录在进行数据训练时淹没了数量记录量小的数据,对数据记录量大的分组的所述设备的数量少,而数据记录量小的分组,所述设备的数量多,以便保证每个分组的所述数据记录的数量适中;并保证每个所述分组中存在至少一个所述数据训练融合子模块,并将所述分组的信息发送给所述数据读取模块、所述数据训练融合子模块和所述联邦数据训练模块,并且修改所述数据读取模块、所述数据训练融合子模块和所述联邦数据训练模块的所述数据记录的读取权限;Thus, the big data analysis and scheduling module groups all the devices, and divides all the devices into several groups according to certain rules. Specifically, the big data analysis and scheduling module uses the estimated The method estimates the size of the data recording volume of each of the devices. When using the grouping of all the devices, the data recording volume is large in the same group, and the data recording volume is small into one group, so as to prevent the data recording volume from Large data records submerge the data with a small amount of records during data training, the number of devices for groups with a large amount of data records is small, and the number of devices for groups with a small amount of data records is large, so as to ensure The quantity of the described data record of each grouping is moderate; And guarantee that there is at least one described data training fusion sub-module in each described grouping, and the information of described grouping is sent to described data reading module, described data training the fusion submodule and the federated data training module, and modifying the reading authority of the data records of the data reading module, the data training fusion submodule and the federated data training module;
步骤S2,所述设备在运行时,所述数据获取模块获取安装在该设备上的运行数据和状态数据,形成数据记录,并将所述数据记录存储于所述设备的所述单机存储模块;Step S2, when the device is running, the data acquisition module acquires the running data and status data installed on the device, forms a data record, and stores the data record in the stand-alone storage module of the device;
步骤S3,所述数据预处理模块读取存储于所述单机存储模块中的所述数据记录,并利用数理统计方法和设定的要求去分析每条所述数据记录,在发现某条数据记录出现明显不合理时,将该条数据记录删除;Step S3, the data preprocessing module reads the data records stored in the stand-alone storage module, and uses mathematical statistics methods and set requirements to analyze each of the data records, and when a certain data record is found When it is obviously unreasonable, delete the data record;
步骤S4,所述数据训练融合子模块依据所述大数据分析和调度模块分配的读取权限,和其对应的分组的所述数据读取模块建立数据通信连接,从而所述数据训练融合子模块通过所述数据读取模块读取存储于所述单机存储模块中存储的数据记录进行数据学习训练,得出数据联邦子模型;Step S4, the data training fusion sub-module establishes a data communication connection with the data reading module of its corresponding group according to the read authority assigned by the big data analysis and scheduling module, so that the data training fusion sub-module Reading the data records stored in the stand-alone storage module through the data reading module to perform data learning and training to obtain a data federation sub-model;
步骤S5,将该数据联邦子模型和随机从获得该数据联邦子模型中应用的数据记录中抽取一定量的数据记录发送给所述局域数据存储模块;Step S5, sending the data federation sub-model and a certain amount of data records randomly extracted from the data records used in obtaining the data federation sub-model to the local data storage module;
步骤S6,所述联邦数据训练模块读取存储于所述局域数据存储模块中的所述数据联邦子模型和所述数据记录,对所有所述联邦子模型采取参数加权的模式得出总的所述数据联邦模型,并利用读取的抽取的数据记录进行数据训练,从而得出总的所述数据联邦模型;Step S6, the federated data training module reads the data federated sub-models and the data records stored in the local data storage module, adopts a parameter weighted mode for all the federated sub-models to obtain the total The data federation model, and use the read and extracted data records to perform data training, so as to obtain the overall data federation model;
步骤S7,所述大数据分析和调度模块从所有设备上的所述单机存储模块任意抽取一定数量的数据记录,用于对所述总的所述数据联邦模型进行校验,在所述数据记录利用该总的所述数据联邦模型进行校验中,数据输出和数据记录中的数据符合模型精度要求时,则该总的所述数据联邦模型建立完成,Step S7, the big data analysis and scheduling module arbitrarily extracts a certain number of data records from the stand-alone storage modules on all devices to verify the overall data federation model, and in the data records When the overall data federation model is used for verification, when the data in the data output and data records meet the model accuracy requirements, the establishment of the overall data federation model is completed.
步骤S8,否则,在所述局域数据存储模块中储存的随机抽取的数据记录重新随机抽取,并再次利用所述联邦数据训练模块进行建立总的所述数据联邦模型的过程。Step S8, otherwise, the randomly selected data records stored in the local data storage module are randomly selected again, and the federated data training module is used again to establish the overall data federated model.
优选的,对同一个分组的多个设备上的所述数据训练融合子模块、所述联邦数据训练模块,所述大数据分析和调度模块指定其中的一个所述联邦数据训练模块为该服务迁移系统的所述联邦数据训练模块,同一个分组的其中一个所述数据训练融合子模块为数据训练融合子模块,实现同一分组的数据训练,可以把其中的一个或数个数据训练融合子模块执行的数据训练任务交给所述大数据分析和调度模块执行。Preferably, for the data training fusion sub-module and the federated data training module on multiple devices in the same group, the big data analysis and scheduling module designates one of the federated data training modules as the service migration In the federated data training module of the system, one of the data training fusion sub-modules in the same group is a data training fusion sub-module to realize the data training of the same group, and one or several data training fusion sub-modules can be executed The data training task is handed over to the big data analysis and scheduling module for execution.
与现有技术相比,本发明的有益效果是:Compared with prior art, the beneficial effect of the present invention is:
1、本发明的结合大数据分析反馈的联邦机器组合服务系统,突破传统的全局整体的数据训练以便形成数据联邦模型,采用分布式的数据联邦训练方式,从而一方面可以提高数据样本规模,使得训练更加准确,另外一个可以减小数据整体成本。1. The federal machine combination service system combined with big data analysis feedback of the present invention breaks through the traditional global overall data training in order to form a data federation model, and adopts a distributed data federation training method, so that on the one hand, the scale of data samples can be increased, so that The training is more accurate, and the other can reduce the overall cost of data.
2、本发明的结合大数据分析反馈的联邦机器组合服务系统,利用大数据分析方法,对各设备获取的数据记录利用大数据进行数据分析,去对各设备进行预估,从而获得各设备的预估数据的大小,然后发送给所述联邦机器学习调度模块以便去提供数据分组,从而使得数据分组更加准确有效,有效解决分组的技术问题。2. The federal machine combination service system combined with big data analysis feedback of the present invention uses big data analysis methods to analyze the data records obtained by each device using big data to estimate each device, thereby obtaining the data of each device. Estimate the size of the data, and then send it to the federated machine learning scheduling module to provide data grouping, so that the data grouping is more accurate and effective, and effectively solve the technical problems of grouping.
3、本发明的结合大数据分析反馈的联邦机器组合服务系统,在对设备进行分组以便联邦训练时,将数据记录量大的在同一分组,而数据记录量小的为一个分组,以便防止数据记录量大的数据记录在进行数据训练时淹没了数量记录量小的数据,以便构建总的所述数据联邦模型准确性;同时,对数据记录量大的分组的所述设备的数量少,而数据记录量小的分组,所述设备的数量多,以便保证每个分组的所述数据记录的数量适中,以便使得所有的所述数据训练融合子模块的计算量合适。3. The federal machine combination service system combined with big data analysis feedback of the present invention, when grouping devices for federated training, group the devices with a large amount of data records into the same group, and group the devices with a small amount of data records into one group, so as to prevent data The data records with a large amount of records overwhelm the data with a small amount of records during data training, so as to construct the overall accuracy of the data federation model; at the same time, the number of devices grouped with a large amount of data records is small, and For groups with a small amount of data records, the number of devices is large, so as to ensure that the number of data records in each group is moderate, so that the calculation amount of all the data training and fusion sub-modules is appropriate.
4、本发明的结合大数据分析反馈的联邦机器组合服务系统,对数据记录进行数据清理,以便剔除数据记录异常的部分,同时,对数据异常部分进行合理分析,查找出异常的原因。4. The federal machine combination service system combined with big data analysis and feedback of the present invention cleans up the data records to remove abnormal parts of the data records, and at the same time, conducts reasonable analysis on the abnormal parts of the data to find out the cause of the abnormalities.
附图说明Description of drawings
图1为本发明的整体结构示意图;Fig. 1 is the overall structure schematic diagram of the present invention;
图2为本发明的设备中设置的各模块的数据流程结构示意图。Fig. 2 is a schematic diagram of the data flow structure of each module provided in the device of the present invention.
图中:1、设备;2、大数据分析和调度模块;3、数据预处理模块;4、数据融合子模块;5、联邦数据训练模块;6、数据传感模块;7、单机存储模块;8、局域数据存储设备;9、全局数据存储模块;10、分组;11、数据读取模块。In the figure: 1. Equipment; 2. Big data analysis and scheduling module; 3. Data preprocessing module; 4. Data fusion sub-module; 5. Federation data training module; 6. Data sensing module; 7. Stand-alone storage module; 8. Local data storage device; 9. Global data storage module; 10. Grouping; 11. Data reading module.
具体实施方式Detailed ways
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.
具体实施例一:Specific embodiment one:
一种结合大数据分析反馈的联邦机器组合服务系统,包括分布在不同地址的多个设备1、大数据分析和调度模块2;每个设备1包括数据预处理模块3、数据获取模块6和数据读取模块11;A federal machine combination service system combined with big data analysis feedback, including
数据训练融合子模块4,所述数据训练融合子模块4设置于部分所述设备上;联邦数据训练模块5,所述联邦数据训练模块5设置于其中一个所述设备1上;所有设备1都包括单机存储模块7,设置有所述数据训练融合子模块4的设备上设置局域数据存储模块8,设置有所述联邦数据训练模块5的所述设备1设置有全局数据存储模块9;The data
所述大数据分析和调度模块2对所有参与联邦机器学习的设备、工作模块和数据进度数据分析和调度;所述大数据分析和调度模块2数据通信连接所述数据读取模块11、所述数据训练融合子模块4和所述联邦数据训练模块5;所述数据获取模块6和所述单机存储模块7数据通信连接,所述单机存储模块7还分别和所述数据预处理模块3和数据读取模块11数据通信连接;The big data analysis and
所述设备1在运行时,所述数据获取模块6获取安装在该设备1上的运行数据和状态数据,形成数据记录,并将所述数据记录存储于所述设备1的所述单机存储模块7,所述数据预处理模块3读取存储于所述单机存储模块7中的所述数据记录,并利用数理统计方法和设定的要求去分析每条所述数据记录,在发现某条数据记录出现明显不合理时,将该条数据记录删除;When the
所述大数据分析和调度模块2利用所述设备以往的运行特点和产生的数据记录量特点,制定所述设备的分组规则,从而所述大数据分析和调度模块2对所有的设备1进行分组,将所有的所述设备1按照一定的规则分成数个分组10,并保证每个所述分组10中存在至少一个所述数据训练融合子模块4,并将所述分组的信息发送给所述数据读取模块11、所述数据训练融合子模块4和所述联邦数据训练模块5,并且修改所述数据读取模块11、所述数据训练融合子模块4和所述联邦数据训练模块5的所述数据记录的读取权限;The big data analysis and
所述数据训练融合子模块4依据所述大数据分析和调度模块2分配的读取权限,和其对应的分组的所述数据读取模块11建立数据通信连接,从而所述数据训练融合子模块4通过所述数据读取模块11读取存储于所述单机存储模块7中存储的数据记录进行数据学习训练,得出数据联邦子模型,并将该数据联邦子模型和随机从获得该数据联邦子模型中应用的数据记录中抽取一定量的数据记录发送给所述局域数据存储模块8;The data
所述联邦数据训练模块5读取存储于所述局域数据存储模块8中的所述数据联邦子模型和所述数据记录,对所有所述联邦子模型采取参数加权的模式得出总的所述数据联邦模型,并利用读取的抽取的数据记录进行数据训练,得出相应的参数,从而得出总的所述数据联邦模型,并发送给所述全局数据存储模块9进行存储;The federated
所述大数据分析和调度模块2从所有设备1上的所述单机存储模块7任意抽取一定数量的数据记录,用于对所述总的所述数据联邦模型进行校验,在所述数据记录利用该总的所述数据联邦模型进行校验中,数据输出和数据记录中的数据符合模型精度要求时,则该总的所述数据联邦模型建立完成,否则,在所述局域数据存储模块8中储存的随机抽取的数据记录重新随机抽取,并再次利用所述联邦数据训练模块5进行建立总的所述数据联邦模型的过程。The big data analysis and
优选的,在所有所述联邦子模型采取参数加权的模式得出总的所述数据联邦模型的过程中,其中的参数初始值采用所述大数据分析和调度模块1根据以往的所述总的所述数据联邦模型采用的参数或者利用大数据分析所述数据记录量的特点作为参数初始值,在此基础上利用读取的抽取的数据记录进行数据记录训练,得出最后相应的参数,从而得出总的所述数据联邦模型,以加数据记录训练的收敛速度。Preferably, in the process of obtaining the overall data federation model in a parameter-weighted mode for all the federated sub-models, the initial value of the parameters is based on the previous overall data of the big data analysis and
优选的,在各分组利用所述数据训练融合子模块4进行数据训练生成所述数据联邦子模型时,为了增加训练数据的维度,对所述分组,部分分组采用纵向数据联邦学习,剩余部分分组采用联邦迁移学习;或部分分组采用纵向联邦数据学习、部分分组采用横向联邦学习,剩余部分采用联邦迁移学习。Preferably, when each group uses the data
优选的,在所述数据预处理模块3进行数据记录的数据清理时,所述大数据分析和调度模块2利用已有的历史数据记录或数据联邦子模型对数据进行初步数据清理,对每条数据记录进行分析,在数据记录出现偏离一定程度时,剔除该数据记录,以使得数据记录清理更加准确。Preferably, when the
优选的,所述大数据分析和调度模块2,在进行数据记录清理时,将剔除的不合理的数据记录结合以往的设备运行特点,分析出现数据记录异常其中的原因,以便对该数据记录进行修改或者添加数据记录的参数。Preferably, the big data analysis and
优选的,在利用所述大数据分析和调度模块2对所有的设备1进行分组时,预先利用大数据的预估方法预估获取各所述设备1的数据记录量的大小,并且分组时,将数据记录量大的在同一分组,而数据记录量小的为一个分组,以便防止数据记录量大的数据记录在进行数据训练时淹没了数量记录量小的数据,以便构建总的所述数据联邦模型准确性。Preferably, when using the big data analysis and
优选的,在利用所述大数据分析和调度模块2对所有的设备1进行分组时,预先利用大数据的预估方法预估获取各所述设备1的数据记录量的大小,并且分组时,对数据记录量大的分组的所述设备1的数量少,而数据记录量小的分组,所述设备1的数量多,以便保证每个分组的所述数据记录的数量适中,以便使得所有的所述数据训练融合子模块4的计算量合适。Preferably, when using the big data analysis and
优选的,对同一个分组的多个设备上的所述数据训练融合子模块4、所述联邦数据训练模块5,所述大数据分析和调度模块2指定其中的一个所述联邦数据训练模块5为该服务迁移系统的所述联邦数据训练模块5,同一个分组的其中一个所述数据训练融合子模块4为数据训练融合子模块,实现同一分组的数据训练,可以把其中的一个或数个数据训练融合子模块4执行的数据训练任务交给所述大数据分析和调度模块2执行。Preferably, for the data
具体实施例二:Specific embodiment two:
一种结合大数据分析反馈的联邦机器组合服务方法,包括结合大数据分析反馈的联邦机器组合服务系统,包括如下步骤:A federated machine combination service method combined with big data analysis feedback, including a federated machine combination service system combined with big data analysis feedback, including the following steps:
步骤S1,初始化所述结合大数据分析反馈的联邦机器组合服务系统,所述大数据分析和调度模块2利用所述设备以往的运行特点和产生的数据记录量特点,制定所述设备的分组规则,Step S1, initialize the federated machine combination service system combined with big data analysis feedback, the big data analysis and
从而所述大数据分析和调度模块2对所有的设备1进行分组,将所有的所述设备1按照一定的规则分成数个分组10,具体来说,所述大数据分析和调度模块2预先利用大数据的预估方法预估获取各所述设备1的数据记录量的大小,在利用对所有的设备1进行分组时,将数据记录量大的在同一分组,而数据记录量小的为一个分组,以便防止数据记录量大的数据记录在进行数据训练时淹没了数量记录量小的数据,对数据记录量大的分组的所述设备1的数量少,而数据记录量小的分组,所述设备1的数量多,以便保证每个分组的所述数据记录的数量适中;并保证每个所述分组10中存在至少一个所述数据训练融合子模块4,并将所述分组的信息发送给所述数据读取模块11、所述数据训练融合子模块4和所述联邦数据训练模块5,并且修改所述数据读取模块11、所述数据训练融合子模块4和所述联邦数据训练模块5的所述数据记录的读取权限;Thus, the big data analysis and
步骤S2,所述设备1在运行时,所述数据获取模块6获取安装在该设备1上的运行数据和状态数据,形成数据记录,并将所述数据记录存储于所述设备1的所述单机存储模块7;Step S2, when the
步骤S3,所述数据预处理模块3读取存储于所述单机存储模块7中的所述数据记录,并利用数理统计方法和设定的要求去分析每条所述数据记录,在发现某条数据记录出现明显不合理时,将该条数据记录删除;Step S3, the
步骤S4,所述数据训练融合子模块4依据所述大数据分析和调度模块2分配的读取权限,和其对应的分组的所述数据读取模块11建立数据通信连接,从而所述数据训练融合子模块4通过所述数据读取模块11读取存储于所述单机存储模块7中存储的数据记录进行数据学习训练,得出数据联邦子模型;Step S4, the data
步骤S5,将该数据联邦子模型和随机从获得该数据联邦子模型中应用的数据记录中抽取一定量的数据记录发送给所述局域数据存储模块8;Step S5, sending the data federation sub-model and a certain amount of data records randomly extracted from the data records used in obtaining the data federation sub-model to the local
步骤S6,所述联邦数据训练模块5读取存储于所述局域数据存储模块8中的所述数据联邦子模型和所述数据记录,对所有所述联邦子模型采取参数加权的模式得出总的所述数据联邦模型,并利用读取的抽取的数据记录进行数据训练,从而得出总的所述数据联邦模型;Step S6, the federated
步骤S7,所述大数据分析和调度模块2从所有设备1上的所述单机存储模块7任意抽取一定数量的数据记录,用于对所述总的所述数据联邦模型进行校验,在所述数据记录利用该总的所述数据联邦模型进行校验中,数据输出和数据记录中的数据符合模型精度要求时,则该总的所述数据联邦模型建立完成,Step S7, the big data analysis and
步骤S8,否则,在所述局域数据存储模块8中储存的随机抽取的数据记录重新随机抽取,并再次利用所述联邦数据训练模块5进行建立总的所述数据联邦模型的过程。Step S8, otherwise, the randomly selected data records stored in the local
优选的,对同一个分组的多个设备上的所述数据训练融合子模块4、所述联邦数据训练模块5,所述大数据分析和调度模块2指定其中的一个所述联邦数据训练模块5为该服务迁移系统的所述联邦数据训练模块5,同一个分组的其中一个所述数据训练融合子模块4为数据训练融合子模块,实现同一分组的数据训练,可以把其中的一个或数个数据训练融合子模块4执行的数据训练任务交给所述大数据分析和调度模块2执行。Preferably, for the data
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。It should be noted that in this article, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply that there is a relationship between these entities or operations. There is no such actual relationship or order between them. Furthermore, the term "comprises", "comprises" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus comprising a set of elements includes not only those elements, but also includes elements not expressly listed. other elements of or also include elements inherent in such a process, method, article, or device.
尽管已经示出和描述了本发明的实施例,对于本领域的普通技术人员而言,可以理解在不脱离本发明的原理和精神的情况下可以对这些实施例进行多种变化、修改、替换和变型,本发明的范围由所附权利要求及其等同物限定。Although the embodiments of the present invention have been shown and described, those skilled in the art can understand that various changes, modifications and substitutions can be made to these embodiments without departing from the principle and spirit of the present invention. and modifications, the scope of the invention is defined by the appended claims and their equivalents.
Claims (8)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110289138.8A CN113222169B (en) | 2021-03-18 | 2021-03-18 | Federated Machine Composition Service Method and System Combined with Big Data Analysis Feedback |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110289138.8A CN113222169B (en) | 2021-03-18 | 2021-03-18 | Federated Machine Composition Service Method and System Combined with Big Data Analysis Feedback |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113222169A CN113222169A (en) | 2021-08-06 |
CN113222169B true CN113222169B (en) | 2023-06-23 |
Family
ID=77083846
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110289138.8A Active CN113222169B (en) | 2021-03-18 | 2021-03-18 | Federated Machine Composition Service Method and System Combined with Big Data Analysis Feedback |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113222169B (en) |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110633806A (en) * | 2019-10-21 | 2019-12-31 | 深圳前海微众银行股份有限公司 | Vertical federated learning system optimization method, device, equipment and readable storage medium |
CN111275188A (en) * | 2020-01-20 | 2020-06-12 | 深圳前海微众银行股份有限公司 | Method and device for optimizing horizontal federated learning system and readable storage medium |
CN111444848A (en) * | 2020-03-27 | 2020-07-24 | 广州英码信息科技有限公司 | Specific scene model upgrading method and system based on federal learning |
CN111477290A (en) * | 2020-03-05 | 2020-07-31 | 上海交通大学 | Federal learning and image classification method, system and terminal for protecting user privacy |
CN111522669A (en) * | 2020-04-29 | 2020-08-11 | 深圳前海微众银行股份有限公司 | Method, device and equipment for optimizing horizontal federated learning system and readable storage medium |
CN111537945A (en) * | 2020-06-28 | 2020-08-14 | 南方电网科学研究院有限责任公司 | Intelligent ammeter fault diagnosis method and equipment based on federal learning |
CN111768008A (en) * | 2020-06-30 | 2020-10-13 | 平安科技(深圳)有限公司 | Federal learning method, device, equipment and storage medium |
CN111915019A (en) * | 2020-08-07 | 2020-11-10 | 平安科技(深圳)有限公司 | Federal learning method, system, computer device, and storage medium |
CN111970277A (en) * | 2020-08-18 | 2020-11-20 | 中国工商银行股份有限公司 | Flow identification method and device based on federal learning |
CN112085159A (en) * | 2020-07-24 | 2020-12-15 | 西安电子科技大学 | A user tag data prediction system, method, device and electronic device |
CN112232519A (en) * | 2020-10-15 | 2021-01-15 | 成都数融科技有限公司 | Joint modeling method based on federal learning |
-
2021
- 2021-03-18 CN CN202110289138.8A patent/CN113222169B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110633806A (en) * | 2019-10-21 | 2019-12-31 | 深圳前海微众银行股份有限公司 | Vertical federated learning system optimization method, device, equipment and readable storage medium |
CN111275188A (en) * | 2020-01-20 | 2020-06-12 | 深圳前海微众银行股份有限公司 | Method and device for optimizing horizontal federated learning system and readable storage medium |
CN111477290A (en) * | 2020-03-05 | 2020-07-31 | 上海交通大学 | Federal learning and image classification method, system and terminal for protecting user privacy |
CN111444848A (en) * | 2020-03-27 | 2020-07-24 | 广州英码信息科技有限公司 | Specific scene model upgrading method and system based on federal learning |
CN111522669A (en) * | 2020-04-29 | 2020-08-11 | 深圳前海微众银行股份有限公司 | Method, device and equipment for optimizing horizontal federated learning system and readable storage medium |
CN111537945A (en) * | 2020-06-28 | 2020-08-14 | 南方电网科学研究院有限责任公司 | Intelligent ammeter fault diagnosis method and equipment based on federal learning |
CN111768008A (en) * | 2020-06-30 | 2020-10-13 | 平安科技(深圳)有限公司 | Federal learning method, device, equipment and storage medium |
CN112085159A (en) * | 2020-07-24 | 2020-12-15 | 西安电子科技大学 | A user tag data prediction system, method, device and electronic device |
CN111915019A (en) * | 2020-08-07 | 2020-11-10 | 平安科技(深圳)有限公司 | Federal learning method, system, computer device, and storage medium |
CN111970277A (en) * | 2020-08-18 | 2020-11-20 | 中国工商银行股份有限公司 | Flow identification method and device based on federal learning |
CN112232519A (en) * | 2020-10-15 | 2021-01-15 | 成都数融科技有限公司 | Joint modeling method based on federal learning |
Non-Patent Citations (4)
Title |
---|
Accelerating Federated Learning for IoT in Big Data Analytics With Pruning, Quantization and Selective Updating;Xu Wenyuan等;《IEEE ACCESS》;第9卷;38457-38466 * |
Federated learning: A survey on enabling technologies, protocols, and applications;Aledhari M等;《IEEE Access》;第8卷;140699-140725 * |
Towards federated learning at scale: System design;Bonawitz K等;《Proceedings of machine learning and systems》;第1卷;374-388 * |
面向数据共享交换的联邦学习技术发展综述;王亚珅;《无人系统技术》;第2卷(第6期);58-62 * |
Also Published As
Publication number | Publication date |
---|---|
CN113222169A (en) | 2021-08-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111191897B (en) | Method and system for online compliance prediction of business process based on bidirectional GRU neural network | |
CN113325816A (en) | Industrial Internet-oriented digital twin body data management method | |
CN107122985A (en) | A kind of agricultural-product supply-chain traceability system based on Internet of Things and block chain | |
CN106778894A (en) | A kind of method of author's cooperative relationship prediction in academic Heterogeneous Information network | |
CN117236855A (en) | Biological feed warehouse management system and method | |
CN104699044B (en) | A kind of printing machine interconnection plane system and plant maintenance method and formula method of servicing | |
CN111915143B (en) | Complex product assembly management and control system based on intelligent contract | |
CN103955760B (en) | Expert system for optimizing parameters of oxidation dyeing process of aluminum rod | |
CN112200486A (en) | Supply chain financial risk control method | |
CN108563201A (en) | A kind of parts in small batch machining process quality improvement method of DMAIC drivings | |
CN112884165B (en) | Full-flow service migration method and system for federal machine learning | |
CN108268357A (en) | real-time data processing method and device | |
CN117474238A (en) | A system and method based on WEB-side construction progress control | |
CN112884164B (en) | A federated machine learning migration method and system for smart mobile terminals | |
CN111199055A (en) | Privacy public link block chain data visualization analysis method and system | |
CN113222169B (en) | Federated Machine Composition Service Method and System Combined with Big Data Analysis Feedback | |
CN113159279B (en) | Cross-domain knowledge assistance method and system based on neural network and deep learning | |
CN107341608A (en) | One kind production basic data index analysis method | |
CN111651890B (en) | Data-driven aluminum electrolysis digital twin factory, control method and system | |
CN118075782A (en) | AI-driven 5G (Internet of things) based data processing system | |
CN110532153A (en) | A kind of business level user's operation experience visualization system | |
CN117055494A (en) | Intelligent factory production on-line monitoring analysis system based on digital twinning | |
WO2015176516A1 (en) | Method and apparatus for tracking service process | |
CN112965810B (en) | Multi-kernel browser data integration method based on shared network channel | |
CN115640846A (en) | A blockchain-based federated learning method for data imbalance |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |