WO2021004478A1 - Système d'ia distribuée - Google Patents

Système d'ia distribuée Download PDF

Info

Publication number
WO2021004478A1
WO2021004478A1 PCT/CN2020/100833 CN2020100833W WO2021004478A1 WO 2021004478 A1 WO2021004478 A1 WO 2021004478A1 CN 2020100833 W CN2020100833 W CN 2020100833W WO 2021004478 A1 WO2021004478 A1 WO 2021004478A1
Authority
WO
WIPO (PCT)
Prior art keywords
task
distributed
unit
components
component
Prior art date
Application number
PCT/CN2020/100833
Other languages
English (en)
Chinese (zh)
Inventor
朱越
张宝峰
王成录
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021004478A1 publication Critical patent/WO2021004478A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06312Adjustment or analysis of established resource schedule, e.g. resource or task levelling, or dynamic rescheduling

Definitions

  • This application relates to the field of artificial intelligence (AI), and in particular to a distributed artificial intelligence AI system.
  • AI artificial intelligence
  • the machine learning system is the most important branch of the AI system.
  • Distributed machine learning (DML) systems are currently commonly used systems for processing large-scale artificial intelligence applications.
  • the traditional distributed machine learning system is a centralized system that uses computing clusters to obtain prediction models through training on massive user data.
  • Such centralized systems often require intensive computing resources, and massive amounts of user data are uploaded to the cloud for storage, which can easily cause privacy and security issues.
  • the federated learning (FL) system recently proposed a local and cloud interaction mode, which protects user data privacy by storing data locally and performing calculations locally, while using homomorphic encryption and model aggregation
  • the method of differential privacy makes it difficult to infer the user's information from the model and related variables transmitted in the interaction between the local and the cloud.
  • the node connection structure in the system is relatively stable, the number of working nodes is limited, and the feature space and label space of the data samples stored in the working nodes are consistent.
  • the device accesses the network at any time and disconnects from the network, and the features collected by heterogeneous devices belong to different feature spaces.
  • Even the AI tasks faced by each terminal device are different and belong to different label spaces. problem.
  • the above two AI systems are difficult to meet the needs of completing artificial intelligence application tasks under the above-mentioned equipment dynamic access, disconnection and equipment heterogeneous conditions.
  • the above two kinds of AI systems both rely on a central node for global synchronization, which will cause relatively large communication overhead, which greatly affects the flexibility and efficiency in solving AI tasks.
  • This provides a distributed AI system to solve artificial intelligence application tasks flexibly and efficiently.
  • the distributed AI system is a decentralized distributed artificial intelligence (decentralized distributed AI, DDAI) system, including: a registration unit for dynamic access of components The DDAI system registers or logs off when the components are disconnected from the DDAI system; the task plan management unit is used to plan and manage distributed AI tasks according to the characteristics of the connected components; the task interaction unit is used to connect Information exchange between incoming components; task execution unit, used for accessing components to execute distributed AI subtasks to complete the distributed AI task; standardization unit, used to enable the distributed AI system Corresponding to a unified space, where the unified space includes a unified feature space and a unified mark space; wherein, the components may be independent physical devices or cloud virtual nodes; each component carries one or more of the unit.
  • DDAI decentralized distributed artificial intelligence
  • the above-mentioned DDAI system does not need to rely on a central node, and can solve AI tasks flexibly and efficiently through the heterogeneity, dynamics, automatic collaboration and automatic adaptation of multiple components, and at the same time can save communication overhead.
  • the registration unit is also used to discover the adjacent components of the component to which the registration unit belongs; the adjacent components integrate a virtual node, and all the components integrating a virtual node are physical devices or all It is a cloud virtual node.
  • AI tasks can be solved flexibly and efficiently through heterogeneous components.
  • the task plan management unit includes a distributed AI task plan unit, which is used to initiate and receive AI task plans according to specific rules or specific algorithms; the task plan management unit is specifically used to communicate with the The task plan management unit of other components in the virtual node cooperates, and according to the characteristics of the component including the task plan management unit, agrees on the distributed AI subtask assignment of each component, and records the task plan management unit including the task plan management unit.
  • the subtask plan of the component wherein the characteristics include peripheral characteristics, functional characteristics, and computing capabilities.
  • the task plan management unit in any virtual node can dynamically elect a component of the virtual node as a gateway node according to the characteristics of the component to which it belongs. In this way, the communication between the virtual nodes can be completed only through the component as the gateway node, and the task plan management unit of the gateway node initiates task negotiation or task assignment of all components.
  • This implementation is simple and flexible, and saves communication overhead.
  • all the components in any virtual node are directly connected to the gateway node; or when the any virtual node is integrated by multiple components, the multiple components in any virtual node Connected in a ring structure, and a component has a pre-order node and a post-order node.
  • the task plan management unit further includes a status recording unit, which is used to record the task status of other components in the virtual node and update it in real time/timing; the task plan management unit is also used to The task status fed back by other components in the virtual node initiates a distributed AI post-processing task through the distributed AI task planning unit. In this way, AI tasks can be completed cooperatively.
  • the distributed AI post-processing tasks include exception processing tasks, decision processing tasks, and action coding tasks, where: the exception processing task is abnormal state processing; the decision processing task is All feedbacks are integrated, and decisions are made according to preset rules or preset algorithms; the action coding task is to convert the decision results into action codes and transmit them to a designated device for execution.
  • the task interaction unit is also used to compress the information to be sent before sending information to other components; after receiving the information sent by other components, decompress the information.
  • the task execution unit is further configured to execute the subtask according to the subtask plan specified by the task plan management unit, and feed back the execution result to the task plan management unit after completion.
  • the distributed AI task planning unit of a certain component triggers the data collection task according to certain conditions, the data collection and feature extraction tasks of the component will be added to the subtask queue and the virtual node will be notified The other components of the project plan corresponding data collection and feature extraction subtasks.
  • the standardization unit is specifically configured to map the feature data extracted by the component to which the standardization unit belongs to a subspace of the unified feature space corresponding to the distributed AI system.
  • the standardization unit is further configured to map the task-related tags extracted by the components of the standardization unit to a unified tag space corresponding to the distributed AI system.
  • multiple virtual nodes correspond to a unified feature space, which can be specifically: each virtual node passes through a standardized unit corresponding to each component in the virtual node to perform feature transformation on the characteristics of each component. , So that the transformed feature space of each component belongs to the subspace of the unified feature space. In this way, it is not required that each component belong to the same feature space, and only needs to be transformed to make any virtual node belong to the same feature space. In the same way, if each component belongs to a different mark space, each component transforms the mark corresponding to its data into a unified mark space through a standardized unit.
  • heterogeneous devices can complete AI model training and update tasks based on a unified feature space and a unified label space.
  • the distributed AI task planning unit of the gateway node plans the model training and update tasks according to a certain algorithm
  • the task plan is passed to the other components of the virtual node before the plan is executed.
  • the distributed AI task planning unit of the component negotiates and confirms the respective model training and update subtasks according to the characteristics of the component to which it belongs.
  • the gateway node initiates a distributed AI post-processing task through the distributed AI task planning unit according to the task status fed back by other components in the virtual node to which it belongs.
  • AI tasks can be adaptively allocated according to the characteristics of the components.
  • the DDAI system further includes a security verification unit for verifying the identity of the accessed component, so as to ensure the integrity of distributed tasks and protect the privacy of interactive data.
  • the task interaction unit compresses the information to be sent before sending information to other components, and the security verification unit signs and encrypts the information; after the task interaction unit receives the information sent by other components, The information is decompressed, and the security verification unit decrypts the information and verifies the signature.
  • the DDAI system further includes a model version management unit, which is used to maintain the historical version of the model so as to automatically expire and roll back the model.
  • Figure 1 is a schematic diagram of the architecture of a distributed AI system provided by this application.
  • FIG. 2 is a schematic diagram of a distributed AI system provided by this application.
  • FIG. 3 is a schematic diagram of a virtual node including a gateway node provided by this application;
  • FIG. 4 is a schematic diagram of the structural connection of a virtual node provided by this application.
  • Figure 5 is a schematic diagram of a feature transformation framework provided by this application.
  • FIG. 6 is a schematic structural diagram of a component provided by this application.
  • FIG. 7 is a schematic structural diagram of a machine learning engine provided by this application.
  • FIG. 8 is a schematic diagram of a joint optimization process provided by this application.
  • FIG. 9 is a schematic diagram of a process of updating a global model provided by this application.
  • FIG. 10 is a schematic diagram of a training/update process of a personalized model provided by this application.
  • FIG. 11 is a schematic diagram of a process for adapting a personalized model provided by this application.
  • the embodiments of the present application provide a distributed AI system to flexibly and efficiently solve artificial intelligence application tasks.
  • AI tasks include data collection, feature extraction, model training and update, and model execution.
  • Data collection is specifically recording raw data and storing it. After feature extraction, the stored data becomes a feature vector composed of real numbers.
  • Model training and update are based on a specific algorithm, input the generated feature vector, and output the trained or updated model.
  • Model execution is to use the model to predict or make decisions on the newly generated feature vectors.
  • Different types of equipment have different channels of data collected, and different ways of extracting features will lead to different feature spaces to which feature vectors belong.
  • Different devices have different computing capabilities, and the supported model complexity may also be different.
  • the central node integrates the calculation results of each distributed node and then sends them to each working node for update.
  • the working nodes are usually the same node, the connection structure is relatively stable, and the feature space requirements of the sample data of all nodes are consistent.
  • it is necessary to rely on the global synchronization of the central node which will cause greater communication overhead.
  • the more distributed nodes the greater the corresponding overhead, which will lead to computing bottlenecks.
  • the distributed AI system is based on the statistics of a large number of users, and does not consider user personalization.
  • the existing distributed AI system has poor flexibility and low efficiency when solving AI tasks.
  • this application proposes a distributed AI system that can jointly complete specific distributed AI tasks through a large number of heterogeneous components, and support components to dynamically access or disconnect from the distributed AI system.
  • the characteristics of each component dynamically plan distributed AI tasks, so that AI tasks can be solved flexibly and efficiently.
  • the embodiment of the present application provides a distributed AI system.
  • the AI system may be a decentralized distributed artificial intelligence (decentralized distributed AI, DDAI) system.
  • DDAI distributed artificial intelligence
  • the schematic diagram of the DDAI system is shown in FIG. 1.
  • the DDAI system may include a registration unit, a task plan management unit, a task interaction unit, a task execution unit, and a standardization unit. specific:
  • the registration unit is used to register when the component is dynamically connected to the DDAI system or to cancel when the component is disconnected from the system;
  • the task plan management unit is used to plan and manage the components according to the characteristics of the connected components Distributed AI task;
  • the task interaction unit is used to exchange information between the accessed components;
  • the task execution unit is used to execute the allocated distributed AI subtasks to complete the distributed AI task
  • the standardization unit is used to make the DDAI system correspond to a unified space, and the unified space may include a unified feature space and a unified mark space.
  • the components may be independent physical devices or cloud virtual nodes; each component carries one or more of the units.
  • the above-mentioned DDAI system does not need to rely on a central node, and can solve AI tasks flexibly and efficiently through the heterogeneity, dynamics, automatic coordination and automatic adaptation of multiple components, and at the same time can save communication overhead.
  • multiple components form the DDAI system, and the component that carries the registration unit can discover adjacent components; the adjacent components can integrate a virtual node, which integrates a virtual node.
  • the components of the nodes are all physical devices or all cloud virtual nodes. Exemplarily, the physical devices mentioned above may be, but are not limited to, terminal devices such as smart phones, smart watches, personal computers (PC), and tablets.
  • multiple components form multiple virtual nodes.
  • multiple virtual nodes integrated by the components may be as indicated by the 201 in FIG. 2.
  • the multiple virtual nodes can be dynamically connected to form a decentralized virtual cloud, as indicated by the symbol 202 in FIG. 2, which is the DDAI system. Since one or more components in the DDAI system can be connected or disconnected at any time as needed, the components in any virtual node may be different at different times, that is, the virtual node changes in real time and is connected at a certain time
  • the component integrates the virtual node at the current moment on demand.
  • physical device 1 and physical device 2 may integrate new virtual node 2;
  • the physical device 4 and the physical device 5 integrate the virtual node 3 at the current moment, and the physical device 6 and the physical device 7 integrate the virtual node 4, but the next moment may be based on task requirements, the physical device 4, the physical device 5, and the physical device 6
  • a new virtual node 5 is integrated, and the physical device 7 is integrated with a new virtual node 6.
  • the integration situation of the virtual nodes is not only the situation described above, but also can have many other situations, which are not listed here in this application.
  • the mission plan management unit of a component may include a distributed AI mission plan unit, which is used to initiate and receive AI mission plans according to specific rules or specific algorithms; the mission plan management unit is specifically used to: The task plan management unit of the other components in the collaboration, according to the characteristics of the component including the task plan management unit, agree on the distribution of the distributed AI subtasks of each component, and record the subtask plan of the component including the task plan management unit ; Wherein, the characteristics include peripheral characteristics, functional characteristics, and computing capabilities.
  • the task plan management unit in any virtual node may dynamically elect a component of the virtual node as a gateway node according to the characteristics of the component to which it belongs. In this way, subsequent communication between virtual nodes can be completed only by components as virtual nodes, and the task plan management unit of the gateway node initiates task negotiation or task assignment of all components.
  • This implementation is simple and flexible, and saves communication overhead.
  • the task plan management unit of the current gateway node initiates a new negotiation to elect the gateway node. If the new node is elected as the gateway node, the current gateway node transfers the mission plan copy and the gateway responsibility to the new node. If a component is disconnected from the DDAI system, the registration unit of the gateway node automatically discovers it, then cancels it, and initiates a post-processing task to handle the abnormal situation that the component leaves and the corresponding subtask stops. If the registration units of other components in the virtual node jointly detect that the current gateway node leaves the network, the respective task planning management unit re-initiates the gateway node election task.
  • each component can be dynamically connected to or disconnected from the DDAI system without affecting the entire DDAI system.
  • any virtual node and the gateway node it includes may be as shown in (a) or (b) in FIG. 3.
  • the gateway node in each virtual node is determined by all components in the virtual node through election.
  • a component with the best performance state among all the components integrated with the virtual node at a certain time may be selected as the gateway node of the virtual node.
  • the task plan management unit of a component further includes a status recording unit, which is used to record the task status of other components in the virtual node and update it in real time or regularly; the task plan management unit is also used for According to the task status fed back by other components in the virtual node, the distributed AI task planning unit initiates a distributed AI post-processing task.
  • the distributed AI task planning unit of this component is triggered.
  • the distributed AI task planning unit of the component will collect the data of the component and The feature extraction task is added to the subtask queue, and other components of the virtual node are notified to plan the corresponding data collection and feature extraction subtasks.
  • the distributed AI task planning unit of each gateway node plans model training and update tasks according to a certain algorithm, and before executing the plan, the task plan is passed to other components of the virtual node, and the task plan of all components
  • the units negotiate and confirm their respective model training and update subtasks according to their characteristics.
  • the gateway node initiates a distributed AI post-processing task through the distributed AI task planning unit according to the task status fed back by other components in the virtual node.
  • the distributed AI post-processing tasks include exception processing tasks, decision processing tasks, and action coding tasks.
  • the exception processing task is abnormal processing of abnormal conditions
  • the decision processing task is the synthesis of all feedback, in accordance with preset rules Or a preset algorithm makes a decision
  • the action coding task is to transform the decision result into an action code and transmit it to a designated device for execution.
  • the task execution unit is further configured to execute the subtask according to the subtask plan specified by the task plan management unit, and feed back the execution result to the task plan management unit after completion.
  • the standardization unit is specifically configured to map the feature data extracted by the component including the standardization unit to a subspace of the unified feature space corresponding to the DDAI system.
  • the standardization unit is also used to map the task-related tags extracted by the components including the standardization unit to the unified tag space corresponding to the DDAI system.
  • multiple virtual nodes correspond to the unified feature space of the DDAI system.
  • each virtual node performs feature transformation on the feature of each component through the standardized unit corresponding to each component in the virtual node, so that the transformed feature space of each component belongs to the subspace of the unified feature space. That is to say, in practice, although the feature space of each component is not necessarily the same, through the above changes, multiple virtual nodes can be made to belong to a unified feature space, which can ensure higher efficiency and higher privacy to achieve artificial Intelligence related tasks.
  • the AI system further includes a security verification unit for verifying the identity of the accessed component, so as to ensure the integrity of distributed tasks and protect the privacy of interactive data.
  • the task interaction unit compresses the information to be sent, and the security verification unit signs and encrypts the information; after receiving the information sent by other components, the task interaction unit decompresses the information for safety The verification unit decrypts the information and verifies the signature.
  • the DDAI system further includes a model version management unit, which is used to maintain the historical version of the model so that the model automatically expires and rolls back.
  • each component can automatically complete feature mapping, machine learning training, task decision, etc. in collaboration. And implement adaptive scheduling according to the capabilities and constraints of each component, where the implementation of adaptive scheduling can be embodied but not limited to algorithm scheduling and so on.
  • any one of the virtual nodes when any one of the virtual nodes is integrated by multiple components, the multiple components in any one virtual node are connected in a ring structure, and one component has a pre-order node and a post-order node .
  • each component records the pre-order node and the subsequent node, and arranges the priority according to certain rules.
  • each component when the gateway node is selected, each component sends its own state information to the subsequent nodes, and according to the priority protocol, each component transmits the state information of the optimal component to the subsequent nodes, and finally reaches a consensus.
  • the optimal node is selected as the gateway node of a virtual node.
  • the gateway node can disassemble the objective function, model, and mapping into corresponding parts of each component, and then transfer them to each corresponding component one by one for corresponding update and training.
  • the transfer method is that the gateway node first transfers the task parts of all components to the subsequent nodes of the gateway node, and then the subsequent nodes of the gateway node receive their own task parts, and then continue to send the task parts of other components to the The subsequent nodes are passed, and the subsequent components execute the above process once until all components receive their own task parts.
  • each virtual node is responsible for collecting multi-modal features:
  • X [X (1) ,..., X (m) ] (m represents m different component types , That is, m types of modalities, X (m) is the sample collected on the m-th component, if there is no response device connected, it will be filled with 0), and through feature mapping ( Represents the feature mapping function corresponding to the m-th component), so that the features of different virtual nodes belong to a unified feature space ⁇ after transformation, that is, ⁇ i (X i ) ⁇ ⁇ for any virtual node in the DDAI system.
  • the feature transformation framework is shown in FIG. 5.
  • the feature mapping can be any function such as a linear function, a multilayer perceptron, a deep neural network, a decision tree, etc., to map the original feature space to the new feature space.
  • the data corresponding to X (k) and The corresponding feature transformation functions are respectively stored in the k-th component in the virtual node. It should be noted that X (k) is user privacy data and will not be shared.
  • each component passes its own model score on its current data to the gateway node for synthesis, and then the gateway node makes task decisions based on the comprehensive score, and then assigns each component's needs
  • the executed tasks are passed to each component for execution.
  • the scoring transfer process of each component and the task issuance process of the gateway node are similar to the information transfer process in the components described above, which can be referred to each other, and will not be described in detail here.
  • any virtual node can add a component signature (that is, a component identifier) to the information during the internal information transmission process of each component, and add integrity verification at the gateway node to ensure that the information is being transmitted. Keep it unmodified, damaged and lost during the process.
  • a component signature that is, a component identifier
  • the schematic structural diagram of any component may be as shown in FIG. 6, and may specifically include:
  • the registration unit is used for components to be added to the DDAI system at any time, and when a component exits from the DDAI system, it is discovered in time, and the connection in the virtual node is updated to the task plan management unit.
  • the task plan management unit is used to initiate and receive AI task plans according to specific rules or specific algorithms; the task plan management unit is specifically used to: cooperate with the task plan management units of other components in the virtual node, according to the For the characteristics of the components, agree on the distribution of the distributed AI subtasks of each component, and record the subtask plan including the components; wherein, the characteristics may include peripheral characteristics, functional characteristics, and computing capabilities.
  • the task plan management unit also records and refreshes the execution status of the task.
  • the data collection unit is used to perform the data collection task of the component and record the original sampled data in the database.
  • the feature extraction unit is used to extract features from the stored data; wherein, the features can include but are not limited to user or device portraits, behavior features, status features, etc., among which, you can use Represents the sample of the p-th type component in the i-th virtual node;
  • the feature mapping unit is used to map the features extracted by the feature extraction unit to the set feature space; specifically, the feature can be mapped to the subspace of the unified feature space according to the component type or function, where you can use Represents the feature mapping function of the p-th type component in the i-th virtual node;
  • the tag mapping unit is used to map the stored task tags to the set task space;
  • the set task space can be the unified task space of the system, and each component is mapped to the subspace of the unified task space, which can be used in the process Y i represents the sample in the i-th virtual node Corresponding mark
  • Machine learning engine used to train AI model, update feature mapping model, local model adaptive integration, global model update; specifically, according to features Feature mapping Y i and marker tag mapping AI personalized training model, updating the model feature mapping, model adaptive integrated, global model and the updated model execution.
  • the machine learning engine can also handle exceptions.
  • the model caching unit is used to cache multiple models so that the machine learning engine can perform local model adaptive integration
  • the interaction unit is used to send and receive data, compress and decompress, and specifically communicate with other components.
  • the data collection unit and the machine learning engine may all belong to the task execution unit in the DDAI system.
  • the machine learning engine may also include the post-processing task execution function described in the DDAI system.
  • the function of the machine learning engine in any component when the function of the machine learning engine in any component is realized, it may be as shown in FIG. 7 in the schematic diagram, and may specifically include:
  • the global model update unit is used to receive models, gradients, etc. from other virtual nodes, and perform joint average update of the global model
  • Local personalized model training unit used to update the personalized model with local data
  • the personalized model adaptive unit is used to adaptively select the model from the model cache module according to the optimization index and computing resources for integration;
  • the feature mapping update unit is used to update the feature mapping function
  • the model composition unit is used to composite the models obtained by the global model update module, the local personalized model training module, and the personalized model adaptive module.
  • the abnormal handling unit is used to handle abnormal situations or abnormal data.
  • the model execution platform is used for model execution based on the model obtained by the model composite module.
  • an implementation of joint optimization in the machine learning engine through a global model update unit, a local personalized model training unit, a personalized model adaptation unit, and a feature map update unit may be: : Adopt alternate optimization strategy, given other variable parameters, optimize one of the variables.
  • the implementation process may be as shown in Figure 8, and the specific process may include:
  • A2 given the global model, integrated personalized model, feature mapping, update the local personalized model
  • the personalized model is adaptive
  • an implementation solution for the global model update unit in the machine learning engine to perform the global model update may be: calculating and updating information about unified tags and features based on local data, feature mapping, tags, and tag mapping.
  • the gradient of the loss function and the local model; the gradient and the local model are sent to the neighboring component after security verification such as differential privacy, signature, etc.; the gradient and model sent by the neighboring component are received, and the local gradient and model are aggregated in the security module; use The aggregated model and the gradient update the global model.
  • the aforementioned security verification may be integrity verification.
  • the foregoing process may be as shown in the flowchart of FIG. 9.
  • an implementation solution for the local personalized model training unit in the machine learning engine to perform personalized model training/update may be: calculating and updating information about the local model based on local data, feature mapping, and marking.
  • the local model of the loss function of the label, the unified feature, and the local feature performs security verification such as differential privacy, signature, etc., sends the safely processed model to the adjacent component, and stores it in the model cache unit.
  • security verification such as differential privacy, signature, etc.
  • an implementation solution for the personalization model adaptation unit in the machine learning engine to perform the personalization model adaptation may be: according to the resources of the components in the virtual node and the constraints of the local tasks and the policy model sampling Model integration strategy; calculate the loss of the sampled strategy after integrating the model from the model cache module; update the sampling strategy model according to the feedback loss; alternate iteratively to the specified number of iterations or reach the convergence condition.
  • the foregoing process may be as shown in FIG. 11.
  • the implementation of the feature map update performed by the feature map update unit in the machine learning engine is similar to the implementation of the global model update module performed by the global model update module, and the optimized variable is replaced with the feature map,
  • the optimized variable is replaced with the feature map

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Educational Administration (AREA)
  • Game Theory and Decision Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Computer And Data Communications (AREA)

Abstract

L'invention concerne un système d'IA distribuée pouvant être un système d'IA distribuée décentralisée (IADD), qui sert à résoudre de manière flexible et efficace des tâches d'application d'un type d'intelligence artificielle. Le système d'IADD comprend une unité d'enregistrement, configurée pour effectuer un enregistrement au moment où des composants accèdent dynamiquement au système d'IADD ou pour effectuer une fermeture de session au moment où les composants se déconnectent du système d'IADD; une unité de planification et de gestion de tâches, configurée pour planifier et gérer une tâche d'IA distribuée conformément aux caractéristiques des composants ayant fait l'objet d'un accès; une unité d'échange de tâches, configurée pour échanger des informations entre les composants ayant fait l'objet d'un accès; une unité d'exécution de tâche, configurée pour permettre aux composants ayant fait l'objet d'un accès d'exécuter des sous-tâches d'IA distribuée attribuées, de façon à exécuter la tâche d'IA distribuée; et une unité de normalisation, configurée pour permettre au système d'IADD de correspondre à un espace unifié, l'espace unifié comprenant un espace de caractéristique unifié et un espace de marquage unifié. Les composants peuvent être des dispositifs physiques indépendants ou des nœuds virtuels en nuage, et chaque composant peut porter une ou plusieurs desdites unités.
PCT/CN2020/100833 2019-07-10 2020-07-08 Système d'ia distribuée WO2021004478A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910619531.1A CN112215326B (zh) 2019-07-10 2019-07-10 一种分布式ai系统
CN201910619531.1 2019-07-10

Publications (1)

Publication Number Publication Date
WO2021004478A1 true WO2021004478A1 (fr) 2021-01-14

Family

ID=74048053

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/100833 WO2021004478A1 (fr) 2019-07-10 2020-07-08 Système d'ia distribuée

Country Status (2)

Country Link
CN (1) CN112215326B (fr)
WO (1) WO2021004478A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113011494A (zh) * 2021-03-18 2021-06-22 北京百度网讯科技有限公司 一种特征处理方法、装置、设备以及存储介质
CN113301141A (zh) * 2021-05-20 2021-08-24 北京邮电大学 人工智能支持框架的部署方法和系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6981019B1 (en) * 2000-05-02 2005-12-27 International Business Machines Corporation System and method for a computer based cooperative work system
CN101316242A (zh) * 2008-07-17 2008-12-03 上海交通大学 面向服务的智能体平台
CN109561100A (zh) * 2018-12-24 2019-04-02 浙江天脉领域科技有限公司 基于分布式与人工智能的双工赋能网络攻防的方法及系统
CN109787788A (zh) * 2017-11-10 2019-05-21 中国信息通信研究院 一种构建基于人工智能的网络的方法

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102655532B (zh) * 2012-04-18 2014-10-22 上海和辰信息技术有限公司 分布式异构虚拟资源集成管理方法及系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6981019B1 (en) * 2000-05-02 2005-12-27 International Business Machines Corporation System and method for a computer based cooperative work system
CN101316242A (zh) * 2008-07-17 2008-12-03 上海交通大学 面向服务的智能体平台
CN109787788A (zh) * 2017-11-10 2019-05-21 中国信息通信研究院 一种构建基于人工智能的网络的方法
CN109561100A (zh) * 2018-12-24 2019-04-02 浙江天脉领域科技有限公司 基于分布式与人工智能的双工赋能网络攻防的方法及系统

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113011494A (zh) * 2021-03-18 2021-06-22 北京百度网讯科技有限公司 一种特征处理方法、装置、设备以及存储介质
CN113011494B (zh) * 2021-03-18 2024-02-27 北京百度网讯科技有限公司 一种特征处理方法、装置、设备以及存储介质
CN113301141A (zh) * 2021-05-20 2021-08-24 北京邮电大学 人工智能支持框架的部署方法和系统
CN113301141B (zh) * 2021-05-20 2022-06-17 北京邮电大学 人工智能支持框架的部署方法和系统

Also Published As

Publication number Publication date
CN112215326A (zh) 2021-01-12
CN112215326B (zh) 2024-03-29

Similar Documents

Publication Publication Date Title
CN111537945B (zh) 基于联邦学习的智能电表故障诊断方法及设备
US20230039182A1 (en) Method, apparatus, computer device, storage medium, and program product for processing data
Imteaj et al. Federated learning for resource-constrained iot devices: Panoramas and state of the art
WO2023141809A1 (fr) Procédé de protection de confidentialité d'informations partagées basé sur le métavers et appareil associé
CN109491790A (zh) 基于容器的工业物联网边缘计算资源分配方法及系统
WO2021004478A1 (fr) Système d'ia distribuée
Aouedi et al. Handling privacy-sensitive medical data with federated learning: challenges and future directions
CN107193669A (zh) 基于混合云或大规模集群的维护接口的系统和设计方法
Chan et al. Fedhe: Heterogeneous models and communication-efficient federated learning
CN110457337A (zh) 链路聚合方法、系统和设备
Lv et al. Cloud computing management platform of human resource based on mobile communication technology
Li et al. Research on QoS service composition based on coevolutionary genetic algorithm
CN114610475A (zh) 一种智能资源编排模型的训练方法
Fatima et al. Cyber physical systems and IoT: Architectural practices, interoperability, and transformation
CN116862012A (zh) 机器学习模型训练方法、业务数据处理方法、装置及系统
Wang et al. Deep Reinforcement Learning-based scheduling for optimizing system load and response time in edge and fog computing environments
Aswini et al. Artificial Intelligence Based Smart Routing in Software Defined Networks.
Kavitha et al. Ai Integration in Data Driven Decision Making for Resource Management in Internet of Things (Iot): A Survey
WO2022083549A1 (fr) Procédé et appareil de conversion de signal de trafic, dispositif électronique, et support de stockage
WO2020107350A1 (fr) Procédé et appareil de gestion de nœuds pour système de chaîne de blocs et dispositif de mémoire
CN113630476B (zh) 应用于计算机集群的通信方法及通信装置
Akcin et al. Decentralized data collection for robotic fleet learning: A game-theoretic approach
Zhang et al. Optimizing Efficient Personalized Federated Learning with Hypernetworks at Edge
CN117196014B (zh) 基于联邦学习的模型训练方法、装置、计算机设备及介质
Yu et al. 5G network education and smart campus based on heterogeneous distributed platform and multi-scheduling optimization

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20837201

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20837201

Country of ref document: EP

Kind code of ref document: A1