WO2021004478A1 - Distributed ai system - Google Patents

Distributed ai system Download PDF

Info

Publication number
WO2021004478A1
WO2021004478A1 PCT/CN2020/100833 CN2020100833W WO2021004478A1 WO 2021004478 A1 WO2021004478 A1 WO 2021004478A1 CN 2020100833 W CN2020100833 W CN 2020100833W WO 2021004478 A1 WO2021004478 A1 WO 2021004478A1
Authority
WO
WIPO (PCT)
Prior art keywords
task
distributed
unit
components
component
Prior art date
Application number
PCT/CN2020/100833
Other languages
French (fr)
Chinese (zh)
Inventor
朱越
张宝峰
王成录
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021004478A1 publication Critical patent/WO2021004478A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06312Adjustment or analysis of established resource schedule, e.g. resource or task levelling, or dynamic rescheduling

Definitions

  • This application relates to the field of artificial intelligence (AI), and in particular to a distributed artificial intelligence AI system.
  • AI artificial intelligence
  • the machine learning system is the most important branch of the AI system.
  • Distributed machine learning (DML) systems are currently commonly used systems for processing large-scale artificial intelligence applications.
  • the traditional distributed machine learning system is a centralized system that uses computing clusters to obtain prediction models through training on massive user data.
  • Such centralized systems often require intensive computing resources, and massive amounts of user data are uploaded to the cloud for storage, which can easily cause privacy and security issues.
  • the federated learning (FL) system recently proposed a local and cloud interaction mode, which protects user data privacy by storing data locally and performing calculations locally, while using homomorphic encryption and model aggregation
  • the method of differential privacy makes it difficult to infer the user's information from the model and related variables transmitted in the interaction between the local and the cloud.
  • the node connection structure in the system is relatively stable, the number of working nodes is limited, and the feature space and label space of the data samples stored in the working nodes are consistent.
  • the device accesses the network at any time and disconnects from the network, and the features collected by heterogeneous devices belong to different feature spaces.
  • Even the AI tasks faced by each terminal device are different and belong to different label spaces. problem.
  • the above two AI systems are difficult to meet the needs of completing artificial intelligence application tasks under the above-mentioned equipment dynamic access, disconnection and equipment heterogeneous conditions.
  • the above two kinds of AI systems both rely on a central node for global synchronization, which will cause relatively large communication overhead, which greatly affects the flexibility and efficiency in solving AI tasks.
  • This provides a distributed AI system to solve artificial intelligence application tasks flexibly and efficiently.
  • the distributed AI system is a decentralized distributed artificial intelligence (decentralized distributed AI, DDAI) system, including: a registration unit for dynamic access of components The DDAI system registers or logs off when the components are disconnected from the DDAI system; the task plan management unit is used to plan and manage distributed AI tasks according to the characteristics of the connected components; the task interaction unit is used to connect Information exchange between incoming components; task execution unit, used for accessing components to execute distributed AI subtasks to complete the distributed AI task; standardization unit, used to enable the distributed AI system Corresponding to a unified space, where the unified space includes a unified feature space and a unified mark space; wherein, the components may be independent physical devices or cloud virtual nodes; each component carries one or more of the unit.
  • DDAI decentralized distributed artificial intelligence
  • the above-mentioned DDAI system does not need to rely on a central node, and can solve AI tasks flexibly and efficiently through the heterogeneity, dynamics, automatic collaboration and automatic adaptation of multiple components, and at the same time can save communication overhead.
  • the registration unit is also used to discover the adjacent components of the component to which the registration unit belongs; the adjacent components integrate a virtual node, and all the components integrating a virtual node are physical devices or all It is a cloud virtual node.
  • AI tasks can be solved flexibly and efficiently through heterogeneous components.
  • the task plan management unit includes a distributed AI task plan unit, which is used to initiate and receive AI task plans according to specific rules or specific algorithms; the task plan management unit is specifically used to communicate with the The task plan management unit of other components in the virtual node cooperates, and according to the characteristics of the component including the task plan management unit, agrees on the distributed AI subtask assignment of each component, and records the task plan management unit including the task plan management unit.
  • the subtask plan of the component wherein the characteristics include peripheral characteristics, functional characteristics, and computing capabilities.
  • the task plan management unit in any virtual node can dynamically elect a component of the virtual node as a gateway node according to the characteristics of the component to which it belongs. In this way, the communication between the virtual nodes can be completed only through the component as the gateway node, and the task plan management unit of the gateway node initiates task negotiation or task assignment of all components.
  • This implementation is simple and flexible, and saves communication overhead.
  • all the components in any virtual node are directly connected to the gateway node; or when the any virtual node is integrated by multiple components, the multiple components in any virtual node Connected in a ring structure, and a component has a pre-order node and a post-order node.
  • the task plan management unit further includes a status recording unit, which is used to record the task status of other components in the virtual node and update it in real time/timing; the task plan management unit is also used to The task status fed back by other components in the virtual node initiates a distributed AI post-processing task through the distributed AI task planning unit. In this way, AI tasks can be completed cooperatively.
  • the distributed AI post-processing tasks include exception processing tasks, decision processing tasks, and action coding tasks, where: the exception processing task is abnormal state processing; the decision processing task is All feedbacks are integrated, and decisions are made according to preset rules or preset algorithms; the action coding task is to convert the decision results into action codes and transmit them to a designated device for execution.
  • the task interaction unit is also used to compress the information to be sent before sending information to other components; after receiving the information sent by other components, decompress the information.
  • the task execution unit is further configured to execute the subtask according to the subtask plan specified by the task plan management unit, and feed back the execution result to the task plan management unit after completion.
  • the distributed AI task planning unit of a certain component triggers the data collection task according to certain conditions, the data collection and feature extraction tasks of the component will be added to the subtask queue and the virtual node will be notified The other components of the project plan corresponding data collection and feature extraction subtasks.
  • the standardization unit is specifically configured to map the feature data extracted by the component to which the standardization unit belongs to a subspace of the unified feature space corresponding to the distributed AI system.
  • the standardization unit is further configured to map the task-related tags extracted by the components of the standardization unit to a unified tag space corresponding to the distributed AI system.
  • multiple virtual nodes correspond to a unified feature space, which can be specifically: each virtual node passes through a standardized unit corresponding to each component in the virtual node to perform feature transformation on the characteristics of each component. , So that the transformed feature space of each component belongs to the subspace of the unified feature space. In this way, it is not required that each component belong to the same feature space, and only needs to be transformed to make any virtual node belong to the same feature space. In the same way, if each component belongs to a different mark space, each component transforms the mark corresponding to its data into a unified mark space through a standardized unit.
  • heterogeneous devices can complete AI model training and update tasks based on a unified feature space and a unified label space.
  • the distributed AI task planning unit of the gateway node plans the model training and update tasks according to a certain algorithm
  • the task plan is passed to the other components of the virtual node before the plan is executed.
  • the distributed AI task planning unit of the component negotiates and confirms the respective model training and update subtasks according to the characteristics of the component to which it belongs.
  • the gateway node initiates a distributed AI post-processing task through the distributed AI task planning unit according to the task status fed back by other components in the virtual node to which it belongs.
  • AI tasks can be adaptively allocated according to the characteristics of the components.
  • the DDAI system further includes a security verification unit for verifying the identity of the accessed component, so as to ensure the integrity of distributed tasks and protect the privacy of interactive data.
  • the task interaction unit compresses the information to be sent before sending information to other components, and the security verification unit signs and encrypts the information; after the task interaction unit receives the information sent by other components, The information is decompressed, and the security verification unit decrypts the information and verifies the signature.
  • the DDAI system further includes a model version management unit, which is used to maintain the historical version of the model so as to automatically expire and roll back the model.
  • Figure 1 is a schematic diagram of the architecture of a distributed AI system provided by this application.
  • FIG. 2 is a schematic diagram of a distributed AI system provided by this application.
  • FIG. 3 is a schematic diagram of a virtual node including a gateway node provided by this application;
  • FIG. 4 is a schematic diagram of the structural connection of a virtual node provided by this application.
  • Figure 5 is a schematic diagram of a feature transformation framework provided by this application.
  • FIG. 6 is a schematic structural diagram of a component provided by this application.
  • FIG. 7 is a schematic structural diagram of a machine learning engine provided by this application.
  • FIG. 8 is a schematic diagram of a joint optimization process provided by this application.
  • FIG. 9 is a schematic diagram of a process of updating a global model provided by this application.
  • FIG. 10 is a schematic diagram of a training/update process of a personalized model provided by this application.
  • FIG. 11 is a schematic diagram of a process for adapting a personalized model provided by this application.
  • the embodiments of the present application provide a distributed AI system to flexibly and efficiently solve artificial intelligence application tasks.
  • AI tasks include data collection, feature extraction, model training and update, and model execution.
  • Data collection is specifically recording raw data and storing it. After feature extraction, the stored data becomes a feature vector composed of real numbers.
  • Model training and update are based on a specific algorithm, input the generated feature vector, and output the trained or updated model.
  • Model execution is to use the model to predict or make decisions on the newly generated feature vectors.
  • Different types of equipment have different channels of data collected, and different ways of extracting features will lead to different feature spaces to which feature vectors belong.
  • Different devices have different computing capabilities, and the supported model complexity may also be different.
  • the central node integrates the calculation results of each distributed node and then sends them to each working node for update.
  • the working nodes are usually the same node, the connection structure is relatively stable, and the feature space requirements of the sample data of all nodes are consistent.
  • it is necessary to rely on the global synchronization of the central node which will cause greater communication overhead.
  • the more distributed nodes the greater the corresponding overhead, which will lead to computing bottlenecks.
  • the distributed AI system is based on the statistics of a large number of users, and does not consider user personalization.
  • the existing distributed AI system has poor flexibility and low efficiency when solving AI tasks.
  • this application proposes a distributed AI system that can jointly complete specific distributed AI tasks through a large number of heterogeneous components, and support components to dynamically access or disconnect from the distributed AI system.
  • the characteristics of each component dynamically plan distributed AI tasks, so that AI tasks can be solved flexibly and efficiently.
  • the embodiment of the present application provides a distributed AI system.
  • the AI system may be a decentralized distributed artificial intelligence (decentralized distributed AI, DDAI) system.
  • DDAI distributed artificial intelligence
  • the schematic diagram of the DDAI system is shown in FIG. 1.
  • the DDAI system may include a registration unit, a task plan management unit, a task interaction unit, a task execution unit, and a standardization unit. specific:
  • the registration unit is used to register when the component is dynamically connected to the DDAI system or to cancel when the component is disconnected from the system;
  • the task plan management unit is used to plan and manage the components according to the characteristics of the connected components Distributed AI task;
  • the task interaction unit is used to exchange information between the accessed components;
  • the task execution unit is used to execute the allocated distributed AI subtasks to complete the distributed AI task
  • the standardization unit is used to make the DDAI system correspond to a unified space, and the unified space may include a unified feature space and a unified mark space.
  • the components may be independent physical devices or cloud virtual nodes; each component carries one or more of the units.
  • the above-mentioned DDAI system does not need to rely on a central node, and can solve AI tasks flexibly and efficiently through the heterogeneity, dynamics, automatic coordination and automatic adaptation of multiple components, and at the same time can save communication overhead.
  • multiple components form the DDAI system, and the component that carries the registration unit can discover adjacent components; the adjacent components can integrate a virtual node, which integrates a virtual node.
  • the components of the nodes are all physical devices or all cloud virtual nodes. Exemplarily, the physical devices mentioned above may be, but are not limited to, terminal devices such as smart phones, smart watches, personal computers (PC), and tablets.
  • multiple components form multiple virtual nodes.
  • multiple virtual nodes integrated by the components may be as indicated by the 201 in FIG. 2.
  • the multiple virtual nodes can be dynamically connected to form a decentralized virtual cloud, as indicated by the symbol 202 in FIG. 2, which is the DDAI system. Since one or more components in the DDAI system can be connected or disconnected at any time as needed, the components in any virtual node may be different at different times, that is, the virtual node changes in real time and is connected at a certain time
  • the component integrates the virtual node at the current moment on demand.
  • physical device 1 and physical device 2 may integrate new virtual node 2;
  • the physical device 4 and the physical device 5 integrate the virtual node 3 at the current moment, and the physical device 6 and the physical device 7 integrate the virtual node 4, but the next moment may be based on task requirements, the physical device 4, the physical device 5, and the physical device 6
  • a new virtual node 5 is integrated, and the physical device 7 is integrated with a new virtual node 6.
  • the integration situation of the virtual nodes is not only the situation described above, but also can have many other situations, which are not listed here in this application.
  • the mission plan management unit of a component may include a distributed AI mission plan unit, which is used to initiate and receive AI mission plans according to specific rules or specific algorithms; the mission plan management unit is specifically used to: The task plan management unit of the other components in the collaboration, according to the characteristics of the component including the task plan management unit, agree on the distribution of the distributed AI subtasks of each component, and record the subtask plan of the component including the task plan management unit ; Wherein, the characteristics include peripheral characteristics, functional characteristics, and computing capabilities.
  • the task plan management unit in any virtual node may dynamically elect a component of the virtual node as a gateway node according to the characteristics of the component to which it belongs. In this way, subsequent communication between virtual nodes can be completed only by components as virtual nodes, and the task plan management unit of the gateway node initiates task negotiation or task assignment of all components.
  • This implementation is simple and flexible, and saves communication overhead.
  • the task plan management unit of the current gateway node initiates a new negotiation to elect the gateway node. If the new node is elected as the gateway node, the current gateway node transfers the mission plan copy and the gateway responsibility to the new node. If a component is disconnected from the DDAI system, the registration unit of the gateway node automatically discovers it, then cancels it, and initiates a post-processing task to handle the abnormal situation that the component leaves and the corresponding subtask stops. If the registration units of other components in the virtual node jointly detect that the current gateway node leaves the network, the respective task planning management unit re-initiates the gateway node election task.
  • each component can be dynamically connected to or disconnected from the DDAI system without affecting the entire DDAI system.
  • any virtual node and the gateway node it includes may be as shown in (a) or (b) in FIG. 3.
  • the gateway node in each virtual node is determined by all components in the virtual node through election.
  • a component with the best performance state among all the components integrated with the virtual node at a certain time may be selected as the gateway node of the virtual node.
  • the task plan management unit of a component further includes a status recording unit, which is used to record the task status of other components in the virtual node and update it in real time or regularly; the task plan management unit is also used for According to the task status fed back by other components in the virtual node, the distributed AI task planning unit initiates a distributed AI post-processing task.
  • the distributed AI task planning unit of this component is triggered.
  • the distributed AI task planning unit of the component will collect the data of the component and The feature extraction task is added to the subtask queue, and other components of the virtual node are notified to plan the corresponding data collection and feature extraction subtasks.
  • the distributed AI task planning unit of each gateway node plans model training and update tasks according to a certain algorithm, and before executing the plan, the task plan is passed to other components of the virtual node, and the task plan of all components
  • the units negotiate and confirm their respective model training and update subtasks according to their characteristics.
  • the gateway node initiates a distributed AI post-processing task through the distributed AI task planning unit according to the task status fed back by other components in the virtual node.
  • the distributed AI post-processing tasks include exception processing tasks, decision processing tasks, and action coding tasks.
  • the exception processing task is abnormal processing of abnormal conditions
  • the decision processing task is the synthesis of all feedback, in accordance with preset rules Or a preset algorithm makes a decision
  • the action coding task is to transform the decision result into an action code and transmit it to a designated device for execution.
  • the task execution unit is further configured to execute the subtask according to the subtask plan specified by the task plan management unit, and feed back the execution result to the task plan management unit after completion.
  • the standardization unit is specifically configured to map the feature data extracted by the component including the standardization unit to a subspace of the unified feature space corresponding to the DDAI system.
  • the standardization unit is also used to map the task-related tags extracted by the components including the standardization unit to the unified tag space corresponding to the DDAI system.
  • multiple virtual nodes correspond to the unified feature space of the DDAI system.
  • each virtual node performs feature transformation on the feature of each component through the standardized unit corresponding to each component in the virtual node, so that the transformed feature space of each component belongs to the subspace of the unified feature space. That is to say, in practice, although the feature space of each component is not necessarily the same, through the above changes, multiple virtual nodes can be made to belong to a unified feature space, which can ensure higher efficiency and higher privacy to achieve artificial Intelligence related tasks.
  • the AI system further includes a security verification unit for verifying the identity of the accessed component, so as to ensure the integrity of distributed tasks and protect the privacy of interactive data.
  • the task interaction unit compresses the information to be sent, and the security verification unit signs and encrypts the information; after receiving the information sent by other components, the task interaction unit decompresses the information for safety The verification unit decrypts the information and verifies the signature.
  • the DDAI system further includes a model version management unit, which is used to maintain the historical version of the model so that the model automatically expires and rolls back.
  • each component can automatically complete feature mapping, machine learning training, task decision, etc. in collaboration. And implement adaptive scheduling according to the capabilities and constraints of each component, where the implementation of adaptive scheduling can be embodied but not limited to algorithm scheduling and so on.
  • any one of the virtual nodes when any one of the virtual nodes is integrated by multiple components, the multiple components in any one virtual node are connected in a ring structure, and one component has a pre-order node and a post-order node .
  • each component records the pre-order node and the subsequent node, and arranges the priority according to certain rules.
  • each component when the gateway node is selected, each component sends its own state information to the subsequent nodes, and according to the priority protocol, each component transmits the state information of the optimal component to the subsequent nodes, and finally reaches a consensus.
  • the optimal node is selected as the gateway node of a virtual node.
  • the gateway node can disassemble the objective function, model, and mapping into corresponding parts of each component, and then transfer them to each corresponding component one by one for corresponding update and training.
  • the transfer method is that the gateway node first transfers the task parts of all components to the subsequent nodes of the gateway node, and then the subsequent nodes of the gateway node receive their own task parts, and then continue to send the task parts of other components to the The subsequent nodes are passed, and the subsequent components execute the above process once until all components receive their own task parts.
  • each virtual node is responsible for collecting multi-modal features:
  • X [X (1) ,..., X (m) ] (m represents m different component types , That is, m types of modalities, X (m) is the sample collected on the m-th component, if there is no response device connected, it will be filled with 0), and through feature mapping ( Represents the feature mapping function corresponding to the m-th component), so that the features of different virtual nodes belong to a unified feature space ⁇ after transformation, that is, ⁇ i (X i ) ⁇ ⁇ for any virtual node in the DDAI system.
  • the feature transformation framework is shown in FIG. 5.
  • the feature mapping can be any function such as a linear function, a multilayer perceptron, a deep neural network, a decision tree, etc., to map the original feature space to the new feature space.
  • the data corresponding to X (k) and The corresponding feature transformation functions are respectively stored in the k-th component in the virtual node. It should be noted that X (k) is user privacy data and will not be shared.
  • each component passes its own model score on its current data to the gateway node for synthesis, and then the gateway node makes task decisions based on the comprehensive score, and then assigns each component's needs
  • the executed tasks are passed to each component for execution.
  • the scoring transfer process of each component and the task issuance process of the gateway node are similar to the information transfer process in the components described above, which can be referred to each other, and will not be described in detail here.
  • any virtual node can add a component signature (that is, a component identifier) to the information during the internal information transmission process of each component, and add integrity verification at the gateway node to ensure that the information is being transmitted. Keep it unmodified, damaged and lost during the process.
  • a component signature that is, a component identifier
  • the schematic structural diagram of any component may be as shown in FIG. 6, and may specifically include:
  • the registration unit is used for components to be added to the DDAI system at any time, and when a component exits from the DDAI system, it is discovered in time, and the connection in the virtual node is updated to the task plan management unit.
  • the task plan management unit is used to initiate and receive AI task plans according to specific rules or specific algorithms; the task plan management unit is specifically used to: cooperate with the task plan management units of other components in the virtual node, according to the For the characteristics of the components, agree on the distribution of the distributed AI subtasks of each component, and record the subtask plan including the components; wherein, the characteristics may include peripheral characteristics, functional characteristics, and computing capabilities.
  • the task plan management unit also records and refreshes the execution status of the task.
  • the data collection unit is used to perform the data collection task of the component and record the original sampled data in the database.
  • the feature extraction unit is used to extract features from the stored data; wherein, the features can include but are not limited to user or device portraits, behavior features, status features, etc., among which, you can use Represents the sample of the p-th type component in the i-th virtual node;
  • the feature mapping unit is used to map the features extracted by the feature extraction unit to the set feature space; specifically, the feature can be mapped to the subspace of the unified feature space according to the component type or function, where you can use Represents the feature mapping function of the p-th type component in the i-th virtual node;
  • the tag mapping unit is used to map the stored task tags to the set task space;
  • the set task space can be the unified task space of the system, and each component is mapped to the subspace of the unified task space, which can be used in the process Y i represents the sample in the i-th virtual node Corresponding mark
  • Machine learning engine used to train AI model, update feature mapping model, local model adaptive integration, global model update; specifically, according to features Feature mapping Y i and marker tag mapping AI personalized training model, updating the model feature mapping, model adaptive integrated, global model and the updated model execution.
  • the machine learning engine can also handle exceptions.
  • the model caching unit is used to cache multiple models so that the machine learning engine can perform local model adaptive integration
  • the interaction unit is used to send and receive data, compress and decompress, and specifically communicate with other components.
  • the data collection unit and the machine learning engine may all belong to the task execution unit in the DDAI system.
  • the machine learning engine may also include the post-processing task execution function described in the DDAI system.
  • the function of the machine learning engine in any component when the function of the machine learning engine in any component is realized, it may be as shown in FIG. 7 in the schematic diagram, and may specifically include:
  • the global model update unit is used to receive models, gradients, etc. from other virtual nodes, and perform joint average update of the global model
  • Local personalized model training unit used to update the personalized model with local data
  • the personalized model adaptive unit is used to adaptively select the model from the model cache module according to the optimization index and computing resources for integration;
  • the feature mapping update unit is used to update the feature mapping function
  • the model composition unit is used to composite the models obtained by the global model update module, the local personalized model training module, and the personalized model adaptive module.
  • the abnormal handling unit is used to handle abnormal situations or abnormal data.
  • the model execution platform is used for model execution based on the model obtained by the model composite module.
  • an implementation of joint optimization in the machine learning engine through a global model update unit, a local personalized model training unit, a personalized model adaptation unit, and a feature map update unit may be: : Adopt alternate optimization strategy, given other variable parameters, optimize one of the variables.
  • the implementation process may be as shown in Figure 8, and the specific process may include:
  • A2 given the global model, integrated personalized model, feature mapping, update the local personalized model
  • the personalized model is adaptive
  • an implementation solution for the global model update unit in the machine learning engine to perform the global model update may be: calculating and updating information about unified tags and features based on local data, feature mapping, tags, and tag mapping.
  • the gradient of the loss function and the local model; the gradient and the local model are sent to the neighboring component after security verification such as differential privacy, signature, etc.; the gradient and model sent by the neighboring component are received, and the local gradient and model are aggregated in the security module; use The aggregated model and the gradient update the global model.
  • the aforementioned security verification may be integrity verification.
  • the foregoing process may be as shown in the flowchart of FIG. 9.
  • an implementation solution for the local personalized model training unit in the machine learning engine to perform personalized model training/update may be: calculating and updating information about the local model based on local data, feature mapping, and marking.
  • the local model of the loss function of the label, the unified feature, and the local feature performs security verification such as differential privacy, signature, etc., sends the safely processed model to the adjacent component, and stores it in the model cache unit.
  • security verification such as differential privacy, signature, etc.
  • an implementation solution for the personalization model adaptation unit in the machine learning engine to perform the personalization model adaptation may be: according to the resources of the components in the virtual node and the constraints of the local tasks and the policy model sampling Model integration strategy; calculate the loss of the sampled strategy after integrating the model from the model cache module; update the sampling strategy model according to the feedback loss; alternate iteratively to the specified number of iterations or reach the convergence condition.
  • the foregoing process may be as shown in FIG. 11.
  • the implementation of the feature map update performed by the feature map update unit in the machine learning engine is similar to the implementation of the global model update module performed by the global model update module, and the optimized variable is replaced with the feature map,
  • the optimized variable is replaced with the feature map

Abstract

A distributed AI system, which can be a decentralized distributed AI (DDAI) system, and used for flexibly and efficiently solving application tasks of an artificial intelligence type. The DDAI system comprises a registration unit, configured to perform registration when components dynamically access the DDAI system or perform logout when the components are disconnected from the DDAI system; a task planning and management unit, configured to plan and manage a distributed AI task according to the features of the accessed components; a task exchange unit, configured to exchange information between the accessed components; a task execution unit, configured to enable the accessed components to execute allocated distributed AI sub-tasks, so as to complete the distributed AI task; and a standardization unit, configured to enable the DDAI system to correspond to a unified space, the unified space comprising a unified feature space and a unified marking space. The components can be independent physical devices or cloud virtual nodes, and each component can carry one or more of said units.

Description

一种分布式AI系统A distributed AI system
相关申请的交叉引用Cross references to related applications
本申请要求在2019年07月10日提交中国专利局、申请号为201910619531.1、申请名称为“一种分布式AI系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office, the application number is 201910619531.1, and the application name is "a distributed AI system" on July 10, 2019, the entire content of which is incorporated into this application by reference.
技术领域Technical field
本申请涉及人工智能(artificial intelligence,AI)领域,尤其涉及一种分布式人工智能AI系统。This application relates to the field of artificial intelligence (AI), and in particular to a distributed artificial intelligence AI system.
背景技术Background technique
机器学习系统是AI系统最重要的分支。分布式机器学习(distributed machine learning,DML)系统是目前常用的处理大规模人工智能类应用任务的系统。传统分布式机器学习系统是中心化的系统,利用计算集群,通过对海量用户数据进行训练得到预测模型。这样的中心化系统往往要求密集的计算资源,且海量的用户数据上传云侧存储,极易造成隐私安全问题。The machine learning system is the most important branch of the AI system. Distributed machine learning (DML) systems are currently commonly used systems for processing large-scale artificial intelligence applications. The traditional distributed machine learning system is a centralized system that uses computing clusters to obtain prediction models through training on massive user data. Such centralized systems often require intensive computing resources, and massive amounts of user data are uploaded to the cloud for storage, which can easily cause privacy and security issues.
为了解决上述隐私保护问题,最近联邦学习(federated learning,FL)系统提出了本地与云的交互模式,通过数据存储在本地并在本地进行计算以保护用户数据隐私,同时利用同态加密、模型聚合、差分隐私的手段使得本地与云的交互中传递的模型以及相关变量难以反推出用户的信息。In order to solve the above privacy protection problems, the federated learning (FL) system recently proposed a local and cloud interaction mode, which protects user data privacy by storing data locally and performing calculations locally, while using homomorphic encryption and model aggregation The method of differential privacy makes it difficult to infer the user's information from the model and related variables transmitted in the interaction between the local and the cloud.
在上述两种主要的处理大规模人工智能任务的AI系统中,系统中节点连接的结构相对稳定,工作节点的数目有限,且存储于工作节点的数据样本的特征空间、标记空间均一致。然而对于终端设备来说,经常面临设备随时接入网络与从网络断开、异构设备采集的特征分属不同的特征空间,甚至每个终端设备面临的AI任务不同,属于不同的标记空间的问题。上述两种AI系统难以满足在上述设备动态接入、断开以及设备异构条件下完成人工智能类应用任务的需求。进一步地,上述两种AI系统均依赖于一个中心节点全局同步,会造成较大的通信开销,使得在解决AI任务时的灵活性与效率受到极大影响。In the above two main AI systems dealing with large-scale artificial intelligence tasks, the node connection structure in the system is relatively stable, the number of working nodes is limited, and the feature space and label space of the data samples stored in the working nodes are consistent. However, for terminal devices, it is often faced with the fact that the device accesses the network at any time and disconnects from the network, and the features collected by heterogeneous devices belong to different feature spaces. Even the AI tasks faced by each terminal device are different and belong to different label spaces. problem. The above two AI systems are difficult to meet the needs of completing artificial intelligence application tasks under the above-mentioned equipment dynamic access, disconnection and equipment heterogeneous conditions. Furthermore, the above two kinds of AI systems both rely on a central node for global synchronization, which will cause relatively large communication overhead, which greatly affects the flexibility and efficiency in solving AI tasks.
发明内容Summary of the invention
本提供一种分布式AI系统,用以灵活、高效地解决人工智能类应用任务。This provides a distributed AI system to solve artificial intelligence application tasks flexibly and efficiently.
第一方面,本申请提供了一种分布式AI系统,所述分布式AI系统为去中心化分布式人工智能(decentralized distributed AI,DDAI)系统,包括:注册单元,用于当组件动态接入所述DDAI系统时进行注册或当组件从所述DDAI系统断开时进行注销;任务计划管理单元,用于根据接入组件的特性,计划、管理分布式AI任务;任务交互单元,用于接入的组件之间交互信息;任务执行单元,用于接入的组件执行被分配的分布式AI子任务,使其完成所述分布式AI任务;标准化单元,用于使所述分布式AI系统对应统一的空间,其中,所述统一的空间包括统一的特征空间、统一的标记空间;其中,所述的组件可以为独立物理设备或者云虚拟节点;每个组件承载一个或者多个所述的单元。In the first aspect, this application provides a distributed AI system. The distributed AI system is a decentralized distributed artificial intelligence (decentralized distributed AI, DDAI) system, including: a registration unit for dynamic access of components The DDAI system registers or logs off when the components are disconnected from the DDAI system; the task plan management unit is used to plan and manage distributed AI tasks according to the characteristics of the connected components; the task interaction unit is used to connect Information exchange between incoming components; task execution unit, used for accessing components to execute distributed AI subtasks to complete the distributed AI task; standardization unit, used to enable the distributed AI system Corresponding to a unified space, where the unified space includes a unified feature space and a unified mark space; wherein, the components may be independent physical devices or cloud virtual nodes; each component carries one or more of the unit.
上述DDAI系统无需依赖中心节点,可以通过多个组件的异构性、动态性、自动协同与自动适配灵活且高效地解决AI任务,同时可以节省通信开销。The above-mentioned DDAI system does not need to rely on a central node, and can solve AI tasks flexibly and efficiently through the heterogeneity, dynamics, automatic collaboration and automatic adaptation of multiple components, and at the same time can save communication overhead.
在一种可能的设计中,所述注册单元还用于发现所述注册单元所属组件的相邻接组件;相邻接的组件集成一个虚拟节点,集成一个虚拟节点的组件全部为物理设备或者全部为云虚拟节点。这样可以后续通过异构组件灵活且高效地解决AI任务。In a possible design, the registration unit is also used to discover the adjacent components of the component to which the registration unit belongs; the adjacent components integrate a virtual node, and all the components integrating a virtual node are physical devices or all It is a cloud virtual node. In this way, AI tasks can be solved flexibly and efficiently through heterogeneous components.
在一种可能的设计中,所述任务计划管理单元包括分布式AI任务计划单元,用于根据特定规则或特定算法发起、接收AI任务计划;所述任务计划管理单元具体用于通过与所述虚拟节点中其它组件的任务计划管理单元协同,根据包括所述任务计划管理单元的组件的特性,在每个组件的分布式AI子任务分配上达成一致,并记录包括所述任务计划管理单元的组件的子任务计划;其中,所述特性包括外设特性、功能特性、计算能力。In a possible design, the task plan management unit includes a distributed AI task plan unit, which is used to initiate and receive AI task plans according to specific rules or specific algorithms; the task plan management unit is specifically used to communicate with the The task plan management unit of other components in the virtual node cooperates, and according to the characteristics of the component including the task plan management unit, agrees on the distributed AI subtask assignment of each component, and records the task plan management unit including the task plan management unit. The subtask plan of the component; wherein the characteristics include peripheral characteristics, functional characteristics, and computing capabilities.
在一种可能的设计中,任一个虚拟节点中的任务计划管理单元可以根据所属的组件的特性,动态选举出虚拟节点的一个组件作为一个网关节点。这样可以后续只通过作为网关节点的组件完成虚拟节点之间的通信,且由该网关节点的任务计划管理单元发起所有组件的任务协商或任务指派。这样实现简单灵活,且节省通信开销。In a possible design, the task plan management unit in any virtual node can dynamically elect a component of the virtual node as a gateway node according to the characteristics of the component to which it belongs. In this way, the communication between the virtual nodes can be completed only through the component as the gateway node, and the task plan management unit of the gateway node initiates task negotiation or task assignment of all components. This implementation is simple and flexible, and saves communication overhead.
在一种可能的设计中,任一个虚拟节点中的所有的组件均直接与网关节点相连;或者当所述任一个虚拟节点由多个组件集成时,所述任一个虚拟节点中的多个组件以环形结构连接,且一个组件有一个前序节点和一个后序节点。In a possible design, all the components in any virtual node are directly connected to the gateway node; or when the any virtual node is integrated by multiple components, the multiple components in any virtual node Connected in a ring structure, and a component has a pre-order node and a post-order node.
在一种可能的设计中,所述任务计划管理单元还包括状态记录单元,用于记录所述虚拟节点中其它组件的任务状态,并实时/定时更新;所述任务计划管理单元还用于根据虚拟节点中其它组件反馈的任务状态,通过所述分布式AI任务计划单元发起分布式AI后处理任务。这样可以协同完成AI任务。In a possible design, the task plan management unit further includes a status recording unit, which is used to record the task status of other components in the virtual node and update it in real time/timing; the task plan management unit is also used to The task status fed back by other components in the virtual node initiates a distributed AI post-processing task through the distributed AI task planning unit. In this way, AI tasks can be completed cooperatively.
在一种可能的设计中,所述分布式AI后处理任务包括异常处理任务、决策处理任务、动作编码任务,其中:所述异常处理任务为对异常状态进行异常处理;所述决策处理任务为综合所有反馈,按照预设规则或预设算法做出决策;所述动作编码任务为将决策结果转化为动作编码,传输给指定设备执行。In a possible design, the distributed AI post-processing tasks include exception processing tasks, decision processing tasks, and action coding tasks, where: the exception processing task is abnormal state processing; the decision processing task is All feedbacks are integrated, and decisions are made according to preset rules or preset algorithms; the action coding task is to convert the decision results into action codes and transmit them to a designated device for execution.
在一种可能的设计中,所述任务交互单元还用于在向其它组件发送信息前,对待发送的信息进行压缩;在接收到其它组件发送的信息后,对信息进行解压。In a possible design, the task interaction unit is also used to compress the information to be sent before sending information to other components; after receiving the information sent by other components, decompress the information.
在一种可能的设计中,所述任务执行单元还用于按照所述任务计划管理单元指定的子任务计划执行子任务,完成后向所述任务计划管理单元反馈执行结果。In a possible design, the task execution unit is further configured to execute the subtask according to the subtask plan specified by the task plan management unit, and feed back the execution result to the task plan management unit after completion.
在一种可能的设计中,如果某个组件的分布式AI任务计划单元按照某种条件触发了数据收集任务,则将所属组件的数据收集与特征提取任务加入子任务队列,并通知所属虚拟节点的其它组件计划相应数据收集与特征提取子任务。In a possible design, if the distributed AI task planning unit of a certain component triggers the data collection task according to certain conditions, the data collection and feature extraction tasks of the component will be added to the subtask queue and the virtual node will be notified The other components of the project plan corresponding data collection and feature extraction subtasks.
在一种可能的设计中,所述标准化单元具体用于将所述标准化单元所属的组件所提取的特征数据映射到所述分布式AI系统对应的统一特征空间的子空间。In a possible design, the standardization unit is specifically configured to map the feature data extracted by the component to which the standardization unit belongs to a subspace of the unified feature space corresponding to the distributed AI system.
在一种可能的设计中,所述标准化单元还用于将所述标准化单元的组件所提取的任务相关标记映射到所述分布式AI系统对应的统一标记空间。In a possible design, the standardization unit is further configured to map the task-related tags extracted by the components of the standardization unit to a unified tag space corresponding to the distributed AI system.
在一种可能的设计中,多个虚拟节点对应统一的特征空间,具体可以为:每个虚拟节点通过该虚拟节点中每个组件分别对应的标准化单元,分别对每个组件的特征进行特征变换,以使每个组件变换后的特征空间属于所述统一的特征空间的子空间。这样可以不要求每个组件都属于同一个特征空间,只需通过变换使得任一虚拟节点属于同一个特征空间。 同理,如果每个组件分属不同的标记空间,则每个组件通过标准化单元将各自数据对应的标记变换到统一的标记空间。从而可以使得异构设备能够基于统一的特征空间与统一的标记空间完成AI模型训练与更新任务。In a possible design, multiple virtual nodes correspond to a unified feature space, which can be specifically: each virtual node passes through a standardized unit corresponding to each component in the virtual node to perform feature transformation on the characteristics of each component. , So that the transformed feature space of each component belongs to the subspace of the unified feature space. In this way, it is not required that each component belong to the same feature space, and only needs to be transformed to make any virtual node belong to the same feature space. In the same way, if each component belongs to a different mark space, each component transforms the mark corresponding to its data into a unified mark space through a standardized unit. Thus, heterogeneous devices can complete AI model training and update tasks based on a unified feature space and a unified label space.
在一种可能的设计中,如果网关节点的分布式AI任务计划单元按照某种算法计划了模型训练与更新任务,则在执行计划前,先将任务计划传递给所属虚拟节点的其它组件,所有组件的分布式AI任务计划单元根据所属的组件的特性协商确认各自的模型训练与更新子任务。In a possible design, if the distributed AI task planning unit of the gateway node plans the model training and update tasks according to a certain algorithm, the task plan is passed to the other components of the virtual node before the plan is executed. The distributed AI task planning unit of the component negotiates and confirms the respective model training and update subtasks according to the characteristics of the component to which it belongs.
在一种可能的设计中,在网关节点根据所属虚拟节点中其它组件反馈的任务状态,通过所述分布式AI任务计划单元发起分布式AI后处理任务。In a possible design, the gateway node initiates a distributed AI post-processing task through the distributed AI task planning unit according to the task status fed back by other components in the virtual node to which it belongs.
通过上述设计,可以使各个组件后续协同解决AI任务,且能够根据所述的组件的特性自适应分配AI任务。Through the above design, various components can subsequently cooperate to solve AI tasks, and AI tasks can be adaptively allocated according to the characteristics of the components.
在一种可能的设计中,所述DDAI系统还包括安全验证单元,用于验证接入的组件的身份,以保证分布式任务的完整性和保护交互数据的隐私性。In a possible design, the DDAI system further includes a security verification unit for verifying the identity of the accessed component, so as to ensure the integrity of distributed tasks and protect the privacy of interactive data.
在一种可能的设计中,任务交互单元在向其它组件发送信息前,对待发送的信息进行压缩,安全验证单元对信息进行签名、加密;在任务交互单元接收到其它组件发送的信息后,对信息进行解压,安全验证单元对信息进行解密与签名验证。In a possible design, the task interaction unit compresses the information to be sent before sending information to other components, and the security verification unit signs and encrypts the information; after the task interaction unit receives the information sent by other components, The information is decompressed, and the security verification unit decrypts the information and verifies the signature.
在一种可能的设计中,所述DDAI系统还包括模型版本管理单元,用于维护模型的历史版本,以使模型自动过期、回退。In a possible design, the DDAI system further includes a model version management unit, which is used to maintain the historical version of the model so as to automatically expire and roll back the model.
附图说明Description of the drawings
图1为本申请提供的一种分布式AI系统的架构示意图;Figure 1 is a schematic diagram of the architecture of a distributed AI system provided by this application;
图2为本申请提供的一种分布式AI系统的示意图;Figure 2 is a schematic diagram of a distributed AI system provided by this application;
图3为本申请提供的一种虚拟节点包括网关节点的示意图;FIG. 3 is a schematic diagram of a virtual node including a gateway node provided by this application;
图4为本申请提供的一种虚拟节点的结构连接示意图;4 is a schematic diagram of the structural connection of a virtual node provided by this application;
图5为本申请提供的一种特征变换框架的示意图;Figure 5 is a schematic diagram of a feature transformation framework provided by this application;
图6为本申请提供的一种组件的结构示意图;FIG. 6 is a schematic structural diagram of a component provided by this application;
图7为本申请提供的一种机器学习引擎的结构示意图;FIG. 7 is a schematic structural diagram of a machine learning engine provided by this application;
图8为本申请提供的一种联合优化的流程示意图;FIG. 8 is a schematic diagram of a joint optimization process provided by this application;
图9为本申请提供的一种全局模型更新的流程示意图;FIG. 9 is a schematic diagram of a process of updating a global model provided by this application;
图10为本申请提供的一种个性化模型训练/更新的流程示意图;FIG. 10 is a schematic diagram of a training/update process of a personalized model provided by this application;
图11为本申请提供的一种进行个性化模型自适应的流程示意图。FIG. 11 is a schematic diagram of a process for adapting a personalized model provided by this application.
具体实施方式Detailed ways
下面将结合附图对本申请作进一步地详细描述。The application will be further described in detail below in conjunction with the accompanying drawings.
本申请实施例提供了一种分布式AI系统,用以灵活、高效地解决人工智能类应用任务。The embodiments of the present application provide a distributed AI system to flexibly and efficiently solve artificial intelligence application tasks.
在本申请的描述中,“至少一个”指一个或者多个。In the description of this application, "at least one" refers to one or more.
通常情况下,AI任务包括数据采集、特征提取、模型训练与更新、模型执行。数据采集具体为记录原始数据,并将其储。存储的数据经过特征提取变成实数组成的特征向量。 模型训练与更新则是根据特定的算法,输入产生的特征向量,输出训练好或是更新好的模型。模型执行则是对新产生的特征向量,用模型进行预测或决策。不同类型的设备采集的数据的通道不同,提取特征的方式不同会导致特征向量所属的特征空间不同。不同的设备的计算能力不同,所支持的模型复杂度也可能不同。Under normal circumstances, AI tasks include data collection, feature extraction, model training and update, and model execution. Data collection is specifically recording raw data and storing it. After feature extraction, the stored data becomes a feature vector composed of real numbers. Model training and update are based on a specific algorithm, input the generated feature vector, and output the trained or updated model. Model execution is to use the model to predict or make decisions on the newly generated feature vectors. Different types of equipment have different channels of data collected, and different ways of extracting features will lead to different feature spaces to which feature vectors belong. Different devices have different computing capabilities, and the supported model complexity may also be different.
在现有的分布式AI系统中,存在中心节点和分布式工作节点,由中心节点来对各个分布式节点的计算结果进行综合再下发到各个工作节点进行更新。在现有的分布式AI系统中,工作节点通常为相同的节点,连接的结构相对比较稳定,所有节点的样本数据的特征空间要求一致。而且在现有的分布式AI系统中,必须依赖中心节点全局同步,会造成较大的通讯开销,分布式节点越多,相应的开销越大,会导致计算瓶颈。同时分布式AI系统基于大量用户的统计,也没有考虑用户个性化。现有的分布式AI系统在解决AI任务时,灵活性较差、效率较低。基于此,本申请提出了一种分布式AI系统,可以通过海量异构组件共同完成特定的分布式AI任务,支持组件动态的接入分布式AI系统或从分布式AI系统断开,可以根据组件各自的特性动态计划分布式AI任务,从而达到可以灵活地、高效地协作解决AI任务。In the existing distributed AI system, there are central nodes and distributed working nodes, and the central node integrates the calculation results of each distributed node and then sends them to each working node for update. In the existing distributed AI system, the working nodes are usually the same node, the connection structure is relatively stable, and the feature space requirements of the sample data of all nodes are consistent. Moreover, in the existing distributed AI system, it is necessary to rely on the global synchronization of the central node, which will cause greater communication overhead. The more distributed nodes, the greater the corresponding overhead, which will lead to computing bottlenecks. At the same time, the distributed AI system is based on the statistics of a large number of users, and does not consider user personalization. The existing distributed AI system has poor flexibility and low efficiency when solving AI tasks. Based on this, this application proposes a distributed AI system that can jointly complete specific distributed AI tasks through a large number of heterogeneous components, and support components to dynamically access or disconnect from the distributed AI system. The characteristics of each component dynamically plan distributed AI tasks, so that AI tasks can be solved flexibly and efficiently.
为了更加清晰地描述本申请实施例的技术方案,下面结合附图,对本申请实施例提供的分布式AI系统进行详细说明。In order to describe the technical solutions of the embodiments of the present application more clearly, the distributed AI system provided by the embodiments of the present application will be described in detail below with reference to the accompanying drawings.
本申请实施例提供了一种分布式AI系统,所述AI系统可以是去中心化分布式人工智能(decentralized distributed AI,DDAI)系统,DDAI系统的架构示意图如图1所示。其中,所述DDAI系统可以包括注册单元、任务计划管理单元、任务交互单元、任务执行单元和标准化单元。具体的:The embodiment of the present application provides a distributed AI system. The AI system may be a decentralized distributed artificial intelligence (decentralized distributed AI, DDAI) system. The schematic diagram of the DDAI system is shown in FIG. 1. Wherein, the DDAI system may include a registration unit, a task plan management unit, a task interaction unit, a task execution unit, and a standardization unit. specific:
所述注册单元用于当组件动态接入所述DDAI系统时进行注册或当组件从所述系统断开时进行注销;所述任务计划管理单元用于根据接入的组件的特性,计划、管理分布式AI任务;所述任务交互单元用于接入的组件之间交互信息;所述任务执行单元用于接入的组件执行分配的分布式AI子任务,以使完成所述分布式AI任务;所述标准化单元用于使所述DDAI系统对应统一的空间,所述统一的空间可以包括统一的特征空间、统一的标记空间。所述的组件可以为独立的物理设备或者云虚拟节点;每个组件承载一个或者多个所述的单元。通过上述设计,上述DDAI系统无需依赖中心节点,可以通过多个组件的异构性、动态性、自动协同与自动适配灵活且高效地解决AI任务,同时可以节省通信开销。The registration unit is used to register when the component is dynamically connected to the DDAI system or to cancel when the component is disconnected from the system; the task plan management unit is used to plan and manage the components according to the characteristics of the connected components Distributed AI task; the task interaction unit is used to exchange information between the accessed components; the task execution unit is used to execute the allocated distributed AI subtasks to complete the distributed AI task The standardization unit is used to make the DDAI system correspond to a unified space, and the unified space may include a unified feature space and a unified mark space. The components may be independent physical devices or cloud virtual nodes; each component carries one or more of the units. Through the above design, the above-mentioned DDAI system does not need to rely on a central node, and can solve AI tasks flexibly and efficiently through the heterogeneity, dynamics, automatic coordination and automatic adaptation of multiple components, and at the same time can save communication overhead.
在一种可选的实施方式中,多个组件组成了所述DDAI系统,承载所述注册单元的组件可以发现相邻接的组件;相邻接的组件可以集成一个虚拟节点,其中集成一个虚拟节点的组件全部为物理设备或者全部为云虚拟节点。示例性的,上述提及的物理设备可以但不限于为智能手机、智能手表、个人计算机(personal computer,PC)、平板等等终端设备。In an optional implementation manner, multiple components form the DDAI system, and the component that carries the registration unit can discover adjacent components; the adjacent components can integrate a virtual node, which integrates a virtual node. The components of the nodes are all physical devices or all cloud virtual nodes. Exemplarily, the physical devices mentioned above may be, but are not limited to, terminal devices such as smart phones, smart watches, personal computers (PC), and tablets.
示例性的,多个组件形成了多个虚拟节点,例如,由组件集成的多个虚拟节点可以如图2中的201标识所示。所述多个虚拟节点可以动态连通形成去中心虚拟云,如图2中202标识所示,亦即所述的DDAI系统。由于DDAI系统中的一个或多个组件可以按需随时接入或者断开,在不同时刻任一个虚拟节点中的组件可能不尽相同,即虚拟节点是实时变化的,由某一时刻接入的组件按需集成当前时刻的虚拟节点。Exemplarily, multiple components form multiple virtual nodes. For example, multiple virtual nodes integrated by the components may be as indicated by the 201 in FIG. 2. The multiple virtual nodes can be dynamically connected to form a decentralized virtual cloud, as indicated by the symbol 202 in FIG. 2, which is the DDAI system. Since one or more components in the DDAI system can be connected or disconnected at any time as needed, the components in any virtual node may be different at different times, that is, the virtual node changes in real time and is connected at a certain time The component integrates the virtual node at the current moment on demand.
例如,在当前时刻物理设备1和物理设备2集成了虚拟节点1,在下一时刻物理设备1断开,物理设备3接入,可能会由物理设备2和物理设备3集成新的虚拟节点2;又例如,当前时刻物理设备4和物理设备5集成虚拟节点3,物理设备6和物理设备7集成虚 拟节点4,但是下一时刻可能按照任务需求,可能物理设备4、物理设备5和物理设备6集成新的虚拟节点5,物理设备7集成新的虚拟节点6。当然,虚拟节点的集成情况不仅仅只是上述描述的情况,还可以有其它多种情况,本申请此处不再一一列举。For example, at the current moment physical device 1 and physical device 2 integrate virtual node 1, and at the next moment when physical device 1 is disconnected and physical device 3 is connected, physical device 2 and physical device 3 may integrate new virtual node 2; For another example, the physical device 4 and the physical device 5 integrate the virtual node 3 at the current moment, and the physical device 6 and the physical device 7 integrate the virtual node 4, but the next moment may be based on task requirements, the physical device 4, the physical device 5, and the physical device 6 A new virtual node 5 is integrated, and the physical device 7 is integrated with a new virtual node 6. Of course, the integration situation of the virtual nodes is not only the situation described above, but also can have many other situations, which are not listed here in this application.
在一种实施例中,一个组件的任务计划管理单元可以包括分布式AI任务计划单元,用于根据特定规则或特定算法发起、接收AI任务计划;任务计划管理单元具体用于:通过与虚拟节点中其它组件的任务计划管理单元协同,根据包括任务计划管理单元的组件的特性,在每个组件的分布式AI子任务的分配上达成一致,并记录包括任务计划管理单元的组件的子任务计划;其中,所述特性包括外设特性、功能特性、计算能力。In an embodiment, the mission plan management unit of a component may include a distributed AI mission plan unit, which is used to initiate and receive AI mission plans according to specific rules or specific algorithms; the mission plan management unit is specifically used to: The task plan management unit of the other components in the collaboration, according to the characteristics of the component including the task plan management unit, agree on the distribution of the distributed AI subtasks of each component, and record the subtask plan of the component including the task plan management unit ; Wherein, the characteristics include peripheral characteristics, functional characteristics, and computing capabilities.
在一种可选的实施方式中,任一个虚拟节点中的任务计划管理单元可以根据所属的组件的特性,动态选举出虚拟节点的一个组件作为一个网关节点。这样后续可以只通过作为虚拟节点的组件完成虚拟节点之间的通信,且由该网关节点的任务计划管理单元发起所有组件的任务协商或任务指派。这样实现简单灵活,且节省通信开销。In an optional implementation manner, the task plan management unit in any virtual node may dynamically elect a component of the virtual node as a gateway node according to the characteristics of the component to which it belongs. In this way, subsequent communication between virtual nodes can be completed only by components as virtual nodes, and the task plan management unit of the gateway node initiates task negotiation or task assignment of all components. This implementation is simple and flexible, and saves communication overhead.
具体的,当有组件通过注册单元加入DDAI系统时,当前网关节点的任务计划管理单元发起新的协商,选举网关节点。如果新的节点当选网关节点,则当前网关节点将任务计划副本与网关责任移交新的节点。如果有组件从DDAI系统断开连接,网关节点的注册单元自动发现后,将其注销,并发起后处理任务,处理该组件离开、对应子任务停止的异常情况。如果虚拟节点中的其他组件的注册单元共同检测到当前网关节点离开网络,则由各自的任务计划管理单元重新发起网关节点选举任务。通过上述设计,可以使各个组件动态接入DDAI系统或从DDAI系统断开,而不会对整个DDAI系统造成影响。Specifically, when a component joins the DDAI system through the registration unit, the task plan management unit of the current gateway node initiates a new negotiation to elect the gateway node. If the new node is elected as the gateway node, the current gateway node transfers the mission plan copy and the gateway responsibility to the new node. If a component is disconnected from the DDAI system, the registration unit of the gateway node automatically discovers it, then cancels it, and initiates a post-processing task to handle the abnormal situation that the component leaves and the corresponding subtask stops. If the registration units of other components in the virtual node jointly detect that the current gateway node leaves the network, the respective task planning management unit re-initiates the gateway node election task. Through the above design, each component can be dynamically connected to or disconnected from the DDAI system without affecting the entire DDAI system.
示例性的,任一个虚拟节点以及其包括的网关节点可以如图3中(a)或(b)所示出的。其中,每个虚拟节点中的网关节点由该虚拟节点中的所有组件通过选举决定。示例性的,可以选择在某一时刻集成该虚拟节点的所有组件中性能状态最好的一个组件作为该虚拟节点的网关节点。Exemplarily, any virtual node and the gateway node it includes may be as shown in (a) or (b) in FIG. 3. Among them, the gateway node in each virtual node is determined by all components in the virtual node through election. Exemplarily, a component with the best performance state among all the components integrated with the virtual node at a certain time may be selected as the gateway node of the virtual node.
在一种实施例中,一个组件的所述任务计划管理单元还包括状态记录单元,用于记录虚拟节点中其它组件的任务状态,并实时或定时更新;所述任务计划管理单元,还用于根据所述虚拟节点中其它组件反馈的任务状态,通过所述分布式AI任务计划单元发起分布式AI后处理任务。In an embodiment, the task plan management unit of a component further includes a status recording unit, which is used to record the task status of other components in the virtual node and update it in real time or regularly; the task plan management unit is also used for According to the task status fed back by other components in the virtual node, the distributed AI task planning unit initiates a distributed AI post-processing task.
例如,每个虚拟节点中的任一组件满足某种条件,触发了该组件分布式AI任务计划单元,计划了数据收集任务时,该组件的分布式AI任务计划单元将所属组件的数据收集与特征提取任务加入子任务队列,并通知所属虚拟节点的其它组件计划相应数据收集与特征提取子任务。For example, if any component in each virtual node meets a certain condition, the distributed AI task planning unit of this component is triggered. When the data collection task is scheduled, the distributed AI task planning unit of the component will collect the data of the component and The feature extraction task is added to the subtask queue, and other components of the virtual node are notified to plan the corresponding data collection and feature extraction subtasks.
具体的,每个网关节点的分布式AI任务计划单元按照某种算法计划了模型训练与更新任务,则在执行计划前,先将任务计划传递给所属虚拟节点的其它组件,所有组件的任务计划单元根据所属的特性协商确认各自的模型训练与更新子任务。Specifically, the distributed AI task planning unit of each gateway node plans model training and update tasks according to a certain algorithm, and before executing the plan, the task plan is passed to other components of the virtual node, and the task plan of all components The units negotiate and confirm their respective model training and update subtasks according to their characteristics.
一种实施方式,网关节点根据所述虚拟节点中其它组件反馈的任务状态,通过所述分布式AI任务计划单元发起分布式AI后处理任务。所述分布式AI后处理任务包括异常处理任务、决策处理任务、动作编码任务,其中:所述异常处理任务为对异常状态进行异常处理;所述决策处理任务为综合所有反馈,按照预设规则或预设算法做出决策;所述动作编码任务为将决策结果转化为动作编码,传输给指定设备执行。In an implementation manner, the gateway node initiates a distributed AI post-processing task through the distributed AI task planning unit according to the task status fed back by other components in the virtual node. The distributed AI post-processing tasks include exception processing tasks, decision processing tasks, and action coding tasks. Among them: the exception processing task is abnormal processing of abnormal conditions; the decision processing task is the synthesis of all feedback, in accordance with preset rules Or a preset algorithm makes a decision; the action coding task is to transform the decision result into an action code and transmit it to a designated device for execution.
在一种可选的实施方式中,所述任务执行单元还用于按照所述任务计划管理单元指定 的子任务计划执行子任务,完成后向所述任务计划管理单元反馈执行结果。In an optional implementation manner, the task execution unit is further configured to execute the subtask according to the subtask plan specified by the task plan management unit, and feed back the execution result to the task plan management unit after completion.
在一种可选的实施方式中,所述标准化单元具体用于将包括标准化单元的组件所提取的特征数据映射到所述DDAI系统对应的统一特征空间的子空间。所述标准化单元,还用于将包括标准化单元的组件所提取的任务相关标记映射到DDAI系统对应的统一标记空间。In an optional implementation manner, the standardization unit is specifically configured to map the feature data extracted by the component including the standardization unit to a subspace of the unified feature space corresponding to the DDAI system. The standardization unit is also used to map the task-related tags extracted by the components including the standardization unit to the unified tag space corresponding to the DDAI system.
具体实施时,多个虚拟节点对应DDAI系统的统一特征空间。具体的,每个虚拟节点通过该虚拟节点中每个组件分别对应的标准化单元,分别对每个组件的特征进行特征变换,以使每个组件变换后的特征空间属于统一特征空间的子空间。也就是说,在实际中,虽然每个组件的特征空间并不一定相同,但是通过上述变化,可以使得多个虚拟节点属于统一特征空间,这样可以保证较高效率和较高私密性地实现人工智能相关任务。During specific implementation, multiple virtual nodes correspond to the unified feature space of the DDAI system. Specifically, each virtual node performs feature transformation on the feature of each component through the standardized unit corresponding to each component in the virtual node, so that the transformed feature space of each component belongs to the subspace of the unified feature space. That is to say, in practice, although the feature space of each component is not necessarily the same, through the above changes, multiple virtual nodes can be made to belong to a unified feature space, which can ensure higher efficiency and higher privacy to achieve artificial Intelligence related tasks.
在一种可选的实施方式中,所述AI系统还包括安全验证单元,用于验证接入的组件的身份,以保证分布式任务的完整性和保护交互数据的隐私性。In an optional implementation manner, the AI system further includes a security verification unit for verifying the identity of the accessed component, so as to ensure the integrity of distributed tasks and protect the privacy of interactive data.
具体的,任务交互单元在向其它组件发送信息前,对待发送的信息进行压缩,安全验证单元对信息进行签名、加密;任务交互单元在接收到其它组件发送的信息后,对信息进行解压,安全验证单元对信息进行解密与签名验证。Specifically, before sending information to other components, the task interaction unit compresses the information to be sent, and the security verification unit signs and encrypts the information; after receiving the information sent by other components, the task interaction unit decompresses the information for safety The verification unit decrypts the information and verifies the signature.
在一种可选的实施方式中,所述DDAI系统还包括模型版本管理单元,用于维护模型的历史版本,以使模型自动过期、回退。In an optional implementation manner, the DDAI system further includes a model version management unit, which is used to maintain the historical version of the model so that the model automatically expires and rolls back.
通过上述设计,可以实现各个组件自动协同完成特征映射、机器学习训练与任务决策等等。并且针对各个组件的能力和约束实施自适应地调度,其中实施自适应地调度可以体现为但不限于为算法调度等等。Through the above design, each component can automatically complete feature mapping, machine learning training, task decision, etc. in collaboration. And implement adaptive scheduling according to the capabilities and constraints of each component, where the implementation of adaptive scheduling can be embodied but not limited to algorithm scheduling and so on.
在一种可能的实施方式中,当所述任一个虚拟节点由多个组件集成时,任一个虚拟节点中的多个组件以环形结构连接,且一个组件有一个前序节点和一个后序节点。例如图4所示。具体的,每个组件记录前序节点和后续节点,并按照一定规则约定排列等的优先级。通过这样的结构连接,可以实现多组件协商、协同训练、分布式决策等等功能。In a possible implementation, when any one of the virtual nodes is integrated by multiple components, the multiple components in any one virtual node are connected in a ring structure, and one component has a pre-order node and a post-order node . For example, as shown in Figure 4. Specifically, each component records the pre-order node and the subsequent node, and arranges the priority according to certain rules. Through this structural connection, functions such as multi-component negotiation, collaborative training, and distributed decision-making can be realized.
示例性的,在选举网关节点时,每个组件向后序节点发送自身的状态信息,并根据优先级协议,每个组件将最优的组件的状态信息向后续节点传递,最终达成一致,状态最优的节点被选为一个虚拟节点的网关节点。Exemplarily, when the gateway node is selected, each component sends its own state information to the subsequent nodes, and according to the priority protocol, each component transmits the state information of the optimal component to the subsequent nodes, and finally reaches a consensus. The optimal node is selected as the gateway node of a virtual node.
具体的,在协同训练时,网关节点可以将目标函数、模型、映射拆解为各个组件相应的部分,然后逐一传递至各个对应的组件做相应的更新及训练。示例性的,传递的方式是先由网关节点将所有组件的任务部分传递给网关节点的后序节点,然后网关节点的后序节点领取了自己的任务部分后,将其它组件的任务部分继续向自己的后序节点传递,后面的组件一次执行上述过程,直至所有组件均领取到自己的任务部分。Specifically, during collaborative training, the gateway node can disassemble the objective function, model, and mapping into corresponding parts of each component, and then transfer them to each corresponding component one by one for corresponding update and training. Exemplarily, the transfer method is that the gateway node first transfers the task parts of all components to the subsequent nodes of the gateway node, and then the subsequent nodes of the gateway node receive their own task parts, and then continue to send the task parts of other components to the The subsequent nodes are passed, and the subsequent components execute the above process once until all components receive their own task parts.
在统一特征空间映射的一种可选的实现方式中,每个虚拟节点负责收集多模态特征:X=[X (1),…,X (m)](m表示m种不同的组件类型,即m种模态,X (m)是第m类组件上收集的样本,若无响应设备接入,则以0补齐),并通过特征映射
Figure PCTCN2020100833-appb-000001
(
Figure PCTCN2020100833-appb-000002
表示第m类组件对应的特征映射函数),使得不同虚拟节点的特征在变换后属于统一的特征空间χ,即对DDAI系统中的任何虚拟节点有Ф i(X i)∈χ。示例性的,特征变换框架如图5所示。
In an optional implementation of unified feature space mapping, each virtual node is responsible for collecting multi-modal features: X = [X (1) ,..., X (m) ] (m represents m different component types , That is, m types of modalities, X (m) is the sample collected on the m-th component, if there is no response device connected, it will be filled with 0), and through feature mapping
Figure PCTCN2020100833-appb-000001
(
Figure PCTCN2020100833-appb-000002
Represents the feature mapping function corresponding to the m-th component), so that the features of different virtual nodes belong to a unified feature space χ after transformation, that is, Ф i (X i ) ∈ χ for any virtual node in the DDAI system. Exemplarily, the feature transformation framework is shown in FIG. 5.
示例性的,特征映射可以为线性函数、多层感知机、深度神经网络、决策树等任意函数,将原始特征空间映射到新的特征空间。Exemplarily, the feature mapping can be any function such as a linear function, a multilayer perceptron, a deep neural network, a decision tree, etc., to map the original feature space to the new feature space.
在实现上,X (k)对应的数据和
Figure PCTCN2020100833-appb-000003
对应的特征变换函数分别存于虚拟节点中的第k类组 件。需要说明的是,X (k)为用户隐私数据,不会进行共享。
In implementation, the data corresponding to X (k) and
Figure PCTCN2020100833-appb-000003
The corresponding feature transformation functions are respectively stored in the k-th component in the virtual node. It should be noted that X (k) is user privacy data and will not be shared.
一种实现方式中,在分布式决策时,每个组件将自己的模型在自己当前数据上的打分传递至网关节点进行综合,然后网关节点根据综合打分做出任务决策后,再将各个组件需要执行的任务传递至各个组件执行。具体的,各个组件的打分传递和网关节点的任务下发过程与上述描述的组件中的信息传递过程类似,可以相互参见,此处不再详细描述。In one implementation method, in distributed decision-making, each component passes its own model score on its current data to the gateway node for synthesis, and then the gateway node makes task decisions based on the comprehensive score, and then assigns each component's needs The executed tasks are passed to each component for execution. Specifically, the scoring transfer process of each component and the task issuance process of the gateway node are similar to the information transfer process in the components described above, which can be referred to each other, and will not be described in detail here.
在一种可选的实施方式中,任一个虚拟节点在各个组件内部信息传递过程中,可以在信息中增加组件签名(即组件的标识),并在网关节点增加完整性验证,保证信息在传输过程中保持不被修改、不被破坏和丢失。In an optional implementation, any virtual node can add a component signature (that is, a component identifier) to the information during the internal information transmission process of each component, and add integrity verification at the gateway node to ensure that the information is being transmitted. Keep it unmodified, damaged and lost during the process.
在一种可选的实施方式中,任一个组件的结构示意图可以如图6所示,具体可以包括:In an optional implementation manner, the schematic structural diagram of any component may be as shown in FIG. 6, and may specifically include:
注册单元,用于组件随时加入DDAI系统,以及有组件从DDAI系统退出时,及时发现,并向任务计划管理单元更新虚拟节点内的连接。The registration unit is used for components to be added to the DDAI system at any time, and when a component exits from the DDAI system, it is discovered in time, and the connection in the virtual node is updated to the task plan management unit.
任务计划管理单元,用于根据特定规则或特定算法发起、接收AI任务计划;所述任务计划管理单元,具体用于:通过与所述虚拟节点中其它组件的任务计划管理单元协同,根据包括所述组件的特性,针对每个组件的分布式AI子任务的分配达成一致,并记录包括组件的子任务计划;其中,所述特性可以包括外设特性、功能特性、计算能力。所述任务计划管理单元还记录、刷新任务的执行状态。The task plan management unit is used to initiate and receive AI task plans according to specific rules or specific algorithms; the task plan management unit is specifically used to: cooperate with the task plan management units of other components in the virtual node, according to the For the characteristics of the components, agree on the distribution of the distributed AI subtasks of each component, and record the subtask plan including the components; wherein, the characteristics may include peripheral characteristics, functional characteristics, and computing capabilities. The task plan management unit also records and refreshes the execution status of the task.
数据采集单元,用于执行所述组件的数据采集任务,往数据库中记录原始采样的数据。The data collection unit is used to perform the data collection task of the component and record the original sampled data in the database.
特征抽取单元,用于在存储的数据进行特征抽取;其中,所述特征可以包括但不限于用户或设备画像、行为特征、状态特征等,其中,可以用
Figure PCTCN2020100833-appb-000004
表示第i个虚拟节点中第p类组件的样本;
The feature extraction unit is used to extract features from the stored data; wherein, the features can include but are not limited to user or device portraits, behavior features, status features, etc., among which, you can use
Figure PCTCN2020100833-appb-000004
Represents the sample of the p-th type component in the i-th virtual node;
特征映射单元,用于将特征抽取单元抽取的特征映射到设定的特征空间;具体的,将特征可以按照组件类型或功能映射到统一特征空间的子空间,其中,可以用
Figure PCTCN2020100833-appb-000005
表示第i个虚拟节点中第p类组件的特征映射函数;
The feature mapping unit is used to map the features extracted by the feature extraction unit to the set feature space; specifically, the feature can be mapped to the subspace of the unified feature space according to the component type or function, where you can use
Figure PCTCN2020100833-appb-000005
Represents the feature mapping function of the p-th type component in the i-th virtual node;
标记映射单元,用于将存储的任务标记映射到设定的任务空间;其中,设定的任务空间可以为系统的统一任务空间,每个组件映射到统一任务空间的子空间,过程中可以用Y i表示第i个虚拟节点中样本
Figure PCTCN2020100833-appb-000006
对应的标记;
The tag mapping unit is used to map the stored task tags to the set task space; the set task space can be the unified task space of the system, and each component is mapped to the subspace of the unified task space, which can be used in the process Y i represents the sample in the i-th virtual node
Figure PCTCN2020100833-appb-000006
Corresponding mark
机器学习引擎,用于训练AI模型、更新特征映射模型、本地模型自适应集成、全局模型更新;具体的,根据特征
Figure PCTCN2020100833-appb-000007
特征映射
Figure PCTCN2020100833-appb-000008
标记Y i以及标记映射训练个性化AI模型、更新特征映射模型、模型自适应集成、全局模型更新以及模型执行。同时机器学习引擎还可以对异常进行处理。
Machine learning engine, used to train AI model, update feature mapping model, local model adaptive integration, global model update; specifically, according to features
Figure PCTCN2020100833-appb-000007
Feature mapping
Figure PCTCN2020100833-appb-000008
Y i and marker tag mapping AI personalized training model, updating the model feature mapping, model adaptive integrated, global model and the updated model execution. At the same time, the machine learning engine can also handle exceptions.
模型缓存单元,用于缓存多个模型,以使机器学习引擎进行本地模型自适应集成;The model caching unit is used to cache multiple models so that the machine learning engine can perform local model adaptive integration;
安全单元,用于进行安全验证;Security unit for security verification;
交互单元,用于收发数据,压缩与解压,具体的与其他组件实现通信。The interaction unit is used to send and receive data, compress and decompress, and specifically communicate with other components.
其中,所述数据采集单元、所述机器学习引擎均可以属于所述DDAI系统中所述的任务执行单元。其中机器学习引擎还可以包括所述DDAI系统中所述的后处理任务执行功能。Wherein, the data collection unit and the machine learning engine may all belong to the task execution unit in the DDAI system. The machine learning engine may also include the post-processing task execution function described in the DDAI system.
在一种可选的实施方式中,任一个组件中的机器学习引擎在功能实现时,可以如示意图中图7所示的,具体可以包括:In an optional implementation manner, when the function of the machine learning engine in any component is realized, it may be as shown in FIG. 7 in the schematic diagram, and may specifically include:
全局模型更新单元,用于从其它虚拟节点接收模型、梯度等,进行全局模型联合平均更新;The global model update unit is used to receive models, gradients, etc. from other virtual nodes, and perform joint average update of the global model;
本地个性化模型训练单元,用于利用本地数据更新个性化模型;Local personalized model training unit, used to update the personalized model with local data;
个性化模型自适应单元,用于从模型缓存模块中根据优化指标、计算资源自适应选取模型进行集成;The personalized model adaptive unit is used to adaptively select the model from the model cache module according to the optimization index and computing resources for integration;
特征映射更新单元,用于更新特征映射函数;The feature mapping update unit is used to update the feature mapping function;
模型复合单元,用于将全局模型更新模块、本地个性化模型训练模块、个性化模型自适应模块得到的模型进行复合。The model composition unit is used to composite the models obtained by the global model update module, the local personalized model training module, and the personalized model adaptive module.
异常处理单元,用于处理异常情况或异常数据。The abnormal handling unit is used to handle abnormal situations or abnormal data.
模型执行平台,用于基于模型复合模块得到的模型进行模型执行。The model execution platform is used for model execution based on the model obtained by the model composite module.
在一种可选的实施方式中,所述机器学习引擎中通过全局模型更新单元、本地个性化模型训练单元、个性化模型自适应单元、特征映射更新单元进行联合优化的一种实施方案可以为:采用交替优化策略,给定其他变量参数,优化其中一个变量。具体的,实现过程可以如图8所示,具体流程可以包括:In an optional implementation manner, an implementation of joint optimization in the machine learning engine through a global model update unit, a local personalized model training unit, a personalized model adaptation unit, and a feature map update unit may be: : Adopt alternate optimization strategy, given other variable parameters, optimize one of the variables. Specifically, the implementation process may be as shown in Figure 8, and the specific process may include:
A1、给定本地个性化模型、集成后的个性化模型、特征映射,更新全局模型;A1. Given the local personalization model, the integrated personalization model, and feature mapping, update the global model;
A2、给定全局模型、集成后的个性化模型、特征映射,更新本地个性化模型;A2, given the global model, integrated personalized model, feature mapping, update the local personalized model;
A3、给定全局模型、本地个性化模型、特征映射,个性化模型自适应;A3. Given the global model, local personalized model, feature mapping, the personalized model is adaptive;
A4、给定本地个性化模型、集成后的个性化模型、个性化模型自适应,更新特征映射函数,直至最大迭代数或收敛。A4. Given the local personalization model, the integrated personalization model, and the personalization model self-adaptation, update the feature mapping function until the maximum number of iterations or convergence.
在一种示例性的方式中,所述机器学习引擎中全局模型更新单元进行全局模型更新的一种实施方案可以为:根据本地数据、特征映射、标记以及标记映射计算、更新关于统一标记、特征损失函数的梯度以及本地模型;将梯度与本地模型经过差分隐私、签名等安全验证后,发送到近邻组件;接收近邻组件发来的梯度和模型,与本地梯度和模型在安全模块进行聚合;利用聚合后的模型与梯度更新全局模型。其中,上述安全验证可以为完整性验证。示例性的,上述过程可以如图9的流程所示。In an exemplary manner, an implementation solution for the global model update unit in the machine learning engine to perform the global model update may be: calculating and updating information about unified tags and features based on local data, feature mapping, tags, and tag mapping. The gradient of the loss function and the local model; the gradient and the local model are sent to the neighboring component after security verification such as differential privacy, signature, etc.; the gradient and model sent by the neighboring component are received, and the local gradient and model are aggregated in the security module; use The aggregated model and the gradient update the global model. Wherein, the aforementioned security verification may be integrity verification. Exemplarily, the foregoing process may be as shown in the flowchart of FIG. 9.
在一种示例性的方式中,所述机器学习引擎中本地个性化模型训练单元进行个性化模型训练/更新的一种实施方案可以为:根据本地数据、特征映射、标记,计算、更新关于本地标记、统一特征以及本地特征的损失函数的本地模型;进行差分隐私、签名等安全验证,将安全处理后的模型发送到相邻组件,存入模型缓存单元。示例性的,上述过程可以如图10所示。In an exemplary manner, an implementation solution for the local personalized model training unit in the machine learning engine to perform personalized model training/update may be: calculating and updating information about the local model based on local data, feature mapping, and marking. The local model of the loss function of the label, the unified feature, and the local feature; performs security verification such as differential privacy, signature, etc., sends the safely processed model to the adjacent component, and stores it in the model cache unit. Exemplarily, the foregoing process may be as shown in FIG. 10.
在一种可能的方式中,所述机器学习引擎中个性化模型自适应单元进行个性化模型自适应的一种实施方案可以为:根据虚拟节点内组件的资源以及本地任务的约束以及策略模型采样模型集成策略;计算采样出来的策略从模型缓存模块中集成模型后的损失;根据反馈的损失更新采样策略模型;交替迭代至指定迭代轮数或达到收敛条件。示例性的,上述过程可以如图11所示。In a possible manner, an implementation solution for the personalization model adaptation unit in the machine learning engine to perform the personalization model adaptation may be: according to the resources of the components in the virtual node and the constraints of the local tasks and the policy model sampling Model integration strategy; calculate the loss of the sampled strategy after integrating the model from the model cache module; update the sampling strategy model according to the feedback loss; alternate iteratively to the specified number of iterations or reach the convergence condition. Exemplarily, the foregoing process may be as shown in FIG. 11.
在一种可选的实施方式中,所述机器学习引擎中特征映射更新单元进行的特征映射更新的实施方案与全局模型更新模块进行全局模型更新的实施方案类似,将优化变量替换为特征映射,具体可以相互参见,此处不再详细描述。In an optional implementation manner, the implementation of the feature map update performed by the feature map update unit in the machine learning engine is similar to the implementation of the global model update module performed by the global model update module, and the optimized variable is replaced with the feature map, For details, please refer to each other and will not be described in detail here.
尽管已描述了本申请的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例做出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本申请范围的所有变更和修改。Although the preferred embodiments of the present application have been described, those skilled in the art can make additional changes and modifications to these embodiments once they learn the basic creative concept. Therefore, the appended claims are intended to be interpreted as including the preferred embodiments and all changes and modifications falling within the scope of the present application.
显然,本领域的技术人员可以对本申请实施例进行各种改动和变型而不脱离本申请实施例的范围。这样,倘若本申请实施例的这些修改和变型属于本申请权利要求及其等同技 术的范围之内,则本申请也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the embodiments of the present application without departing from the scope of the embodiments of the present application. In this way, if these modifications and variations of the embodiments of the application fall within the scope of the claims of the application and their equivalent technologies, the application is also intended to include these modifications and variations.

Claims (11)

  1. 一种分布式AI系统,其特征在于,所述分布式AI系统为去中心化分布式人工智能DDAI系统,包括:A distributed AI system, characterized in that the distributed AI system is a decentralized distributed artificial intelligence DDAI system, including:
    注册单元,用于当组件接入所述分布式AI系统时进行注册或当组件从所述分布式AI系统断开时进行注销;The registration unit is used for registering when the component is connected to the distributed AI system or deregistering when the component is disconnected from the distributed AI system;
    任务计划管理单元,用于根据接入的组件的特性,计划、管理分布式AI任务;The task plan management unit is used to plan and manage distributed AI tasks according to the characteristics of the connected components;
    任务交互单元,用于接入的组件之间交互信息;Task interaction unit, used to exchange information between connected components;
    任务执行单元,用于接入的组件执行被分配的分布式AI子任务,以使完成所述分布式AI任务;A task execution unit, used for the accessed component to execute the distributed distributed AI subtask so as to complete the distributed AI task;
    标准化单元,用于使所述分布式AI系统对应统一的空间,其中,所述统一的空间包括统一的特征空间、统一的标记空间;A standardization unit, configured to make the distributed AI system correspond to a unified space, where the unified space includes a unified feature space and a unified mark space;
    所述的组件为独立的物理设备或者云虚拟节点;The components are independent physical devices or cloud virtual nodes;
    每个组件承载一个或者多个所述的单元。Each component carries one or more of the described units.
  2. 如权利要求1所述的系统,其特征在于,所述注册单元,还用于:The system according to claim 1, wherein the registration unit is further used for:
    发现所述注册单元所属组件的相邻接的组件;相邻接的组件集成一个虚拟节点,其中集成一个虚拟节点的组件全部为物理设备或者全部为云虚拟节点。The adjacent components of the component to which the registration unit belongs are found; the adjacent components integrate a virtual node, and the components integrating a virtual node are all physical devices or all cloud virtual nodes.
  3. 如权利要求2所述的系统,其特征在于,所述任务计划管理单元包括:3. The system of claim 2, wherein the task plan management unit comprises:
    分布式AI任务计划单元,用于根据特定规则或特定算法发起、接收AI任务计划;Distributed AI task planning unit, used to initiate and receive AI task plans according to specific rules or specific algorithms;
    所述任务计划管理单元,具体用于:The task plan management unit is specifically used for:
    通过与所述虚拟节点中其它组件的任务计划管理单元协同,根据包括任务计划管理单元的组件的特性,在每个组件的分布式AI子任务分配上达成一致,并记录包括所述任务计划管理单元的组件的子任务计划;其中,所述特性包括外设特性、功能特性、计算能力。By cooperating with the task plan management unit of other components in the virtual node, according to the characteristics of the components including the task plan management unit, reach an agreement on the distributed AI subtask allocation of each component, and record the task plan management The subtask plan of the components of the unit; wherein the characteristics include peripheral characteristics, functional characteristics, and computing capabilities.
  4. 如权利要求1-3任一项所述的系统,其特征在于,所述任务交互单元,还用于:The system according to any one of claims 1-3, wherein the task interaction unit is further configured to:
    在向其它组件发送信息前,对待发送的信息进行压缩;Before sending information to other components, compress the information to be sent;
    在接收到其它组件发送的信息后,对信息进行解压。After receiving the information sent by other components, the information is decompressed.
  5. 如权利要求1-4任一项所述的系统,其特征在于,所述任务执行单元,还用于:The system according to any one of claims 1-4, wherein the task execution unit is further configured to:
    按照所述任务计划管理单元指定的子任务计划执行子任务,完成后向所述任务计划管理单元反馈执行结果。The subtask is executed according to the subtask plan specified by the task plan management unit, and the execution result is fed back to the task plan management unit after completion.
  6. 如权利要求1-5任一项所述的系统,其特征在于,所述标准化单元,具体用于:5. The system according to any one of claims 1-5, wherein the standardization unit is specifically configured to:
    将所述标准化单元所属的组件所提取的特征数据映射到所述分布式AI系统对应的统一特征空间的子空间。The feature data extracted by the component to which the standardization unit belongs is mapped to the subspace of the unified feature space corresponding to the distributed AI system.
  7. 如权利要求1-6任一项所述的系统,其特征在于,所述标准化单元,还用于:7. The system according to any one of claims 1-6, wherein the standardization unit is further used for:
    将所述标准化单元所属的组件所提取的任务相关标记映射到所述分布式AI系统对应的统一标记空间。Map the task-related tags extracted by the component to which the standardization unit belongs to a unified tag space corresponding to the distributed AI system.
  8. 如权利要求3所述的系统,其特征在于,所述任务计划管理单元还包括状态记录单元,用于:The system according to claim 3, wherein the task plan management unit further comprises a status recording unit for:
    记录所述虚拟节点中其它组件的任务状态,并实时或定时更新;Record the task status of other components in the virtual node and update it in real time or regularly;
    所述任务计划管理单元,还用于:The task plan management unit is also used for:
    根据所述虚拟节点中其它组件反馈的任务状态,通过所述分布式AI任务计划单元发 起分布式AI后处理任务。According to the task status fed back by other components in the virtual node, a distributed AI post-processing task is initiated through the distributed AI task planning unit.
  9. 如权利要求8所述的系统,其特征在于,所述分布式AI后处理任务包括异常处理任务、决策处理任务、动作编码任务,其中:The system according to claim 8, wherein the distributed AI post-processing tasks include exception processing tasks, decision processing tasks, and action coding tasks, wherein:
    所述异常处理任务为对异常状态进行异常处理;The abnormal processing task is to perform abnormal processing on the abnormal state;
    所述决策处理任务为综合所有反馈,按照预设规则或预设算法做出决策;The decision processing task is to integrate all feedbacks and make decisions according to preset rules or preset algorithms;
    所述动作编码任务为将决策结果转化为动作编码,传输给指定设备执行。The action coding task is to convert the decision result into action codes and transmit them to the designated device for execution.
  10. 如权利要求1-9任一项所述的系统,其特征在于,还包括安全验证单元,用于:9. The system according to any one of claims 1-9, further comprising a security verification unit for:
    验证接入的组件的身份,以保证分布式任务的完整性和保护交互数据的隐私性。Verify the identity of the connected components to ensure the integrity of distributed tasks and protect the privacy of interactive data.
  11. 如权利要求1-10任一项所述的系统,其特征在于,还包括模型版本管理单元,用于:10. The system according to any one of claims 1-10, further comprising a model version management unit for:
    维护模型的历史版本,以使模型自动过期、回退。Maintain the historical version of the model so that the model automatically expires and rolls back.
PCT/CN2020/100833 2019-07-10 2020-07-08 Distributed ai system WO2021004478A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910619531.1A CN112215326B (en) 2019-07-10 2019-07-10 Distributed AI system
CN201910619531.1 2019-07-10

Publications (1)

Publication Number Publication Date
WO2021004478A1 true WO2021004478A1 (en) 2021-01-14

Family

ID=74048053

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/100833 WO2021004478A1 (en) 2019-07-10 2020-07-08 Distributed ai system

Country Status (2)

Country Link
CN (1) CN112215326B (en)
WO (1) WO2021004478A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113011494A (en) * 2021-03-18 2021-06-22 北京百度网讯科技有限公司 Feature processing method, device, equipment and storage medium
CN113301141A (en) * 2021-05-20 2021-08-24 北京邮电大学 Deployment method and system of artificial intelligence support framework

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6981019B1 (en) * 2000-05-02 2005-12-27 International Business Machines Corporation System and method for a computer based cooperative work system
CN101316242A (en) * 2008-07-17 2008-12-03 上海交通大学 Service-oriented intelligent platform
CN109561100A (en) * 2018-12-24 2019-04-02 浙江天脉领域科技有限公司 Method and system based on the distributed duplexing energized network attacking and defending with artificial intelligence
CN109787788A (en) * 2017-11-10 2019-05-21 中国信息通信研究院 A method of network of the building based on artificial intelligence

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102655532B (en) * 2012-04-18 2014-10-22 上海和辰信息技术有限公司 Distributed heterogeneous virtual resource integration management method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6981019B1 (en) * 2000-05-02 2005-12-27 International Business Machines Corporation System and method for a computer based cooperative work system
CN101316242A (en) * 2008-07-17 2008-12-03 上海交通大学 Service-oriented intelligent platform
CN109787788A (en) * 2017-11-10 2019-05-21 中国信息通信研究院 A method of network of the building based on artificial intelligence
CN109561100A (en) * 2018-12-24 2019-04-02 浙江天脉领域科技有限公司 Method and system based on the distributed duplexing energized network attacking and defending with artificial intelligence

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113011494A (en) * 2021-03-18 2021-06-22 北京百度网讯科技有限公司 Feature processing method, device, equipment and storage medium
CN113011494B (en) * 2021-03-18 2024-02-27 北京百度网讯科技有限公司 Feature processing method, device, equipment and storage medium
CN113301141A (en) * 2021-05-20 2021-08-24 北京邮电大学 Deployment method and system of artificial intelligence support framework
CN113301141B (en) * 2021-05-20 2022-06-17 北京邮电大学 Deployment method and system of artificial intelligence support framework

Also Published As

Publication number Publication date
CN112215326B (en) 2024-03-29
CN112215326A (en) 2021-01-12

Similar Documents

Publication Publication Date Title
CN111537945B (en) Intelligent ammeter fault diagnosis method and equipment based on federal learning
US20230039182A1 (en) Method, apparatus, computer device, storage medium, and program product for processing data
CN109491790A (en) Industrial Internet of Things edge calculations resource allocation methods and system based on container
WO2021004478A1 (en) Distributed ai system
CN107193669A (en) The system and design method of maintenance interface based on mixed cloud or large-scale cluster
Aouedi et al. Handling privacy-sensitive medical data with federated learning: challenges and future directions
CN110457337A (en) Link aggregation method, system and equipment
Chan et al. Fedhe: Heterogeneous models and communication-efficient federated learning
Lv et al. Cloud computing management platform of human resource based on mobile communication technology
Li et al. Research on QoS service composition based on coevolutionary genetic algorithm
CN116862012A (en) Machine learning model training method, business data processing method, device and system
CN114610475A (en) Training method of intelligent resource arrangement model
Fatima et al. Cyber physical systems and IoT: Architectural practices, interoperability, and transformation
Wang et al. Deep Reinforcement Learning-based scheduling for optimizing system load and response time in edge and fog computing environments
CN104166581B (en) A kind of virtual method towards increment manufacturing equipment
Aswini et al. Artificial Intelligence Based Smart Routing in Software Defined Networks.
Zhang et al. Engineering federated learning systems: A literature review
Kavitha et al. Ai Integration in Data Driven Decision Making for Resource Management in Internet of Things (Iot): A Survey
WO2022083549A1 (en) Traffic signal conversion method and apparatus, electronic device, and storage medium
WO2020107350A1 (en) Node management method and apparatus for blockchain system, and storage device
CN113630476B (en) Communication method and communication device applied to computer cluster
Rouhifar et al. Bandwidth allocation methods on internet of things: an analytical survey
Essah et al. Information Processing in IoT Based Manufacturing Monitoring System
Yu et al. 5G network education and smart campus based on heterogeneous distributed platform and multi-scheduling optimization
Challoob et al. Enhancing the performance assessment of network-based and machine learning for module availability estimation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20837201

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20837201

Country of ref document: EP

Kind code of ref document: A1