WO2021004478A1

WO2021004478A1 - Distributed ai system

Info

Publication number: WO2021004478A1
Application number: PCT/CN2020/100833
Authority: WO
Inventors: 朱越; 张宝峰; 王成录
Original assignee: 华为技术有限公司
Priority date: 2019-07-10
Filing date: 2020-07-08
Publication date: 2021-01-14
Also published as: CN112215326B; CN112215326A

Abstract

A distributed AI system, which can be a decentralized distributed AI (DDAI) system, and used for flexibly and efficiently solving application tasks of an artificial intelligence type. The DDAI system comprises a registration unit, configured to perform registration when components dynamically access the DDAI system or perform logout when the components are disconnected from the DDAI system; a task planning and management unit, configured to plan and manage a distributed AI task according to the features of the accessed components; a task exchange unit, configured to exchange information between the accessed components; a task execution unit, configured to enable the accessed components to execute allocated distributed AI sub-tasks, so as to complete the distributed AI task; and a standardization unit, configured to enable the DDAI system to correspond to a unified space, the unified space comprising a unified feature space and a unified marking space. The components can be independent physical devices or cloud virtual nodes, and each component can carry one or more of said units.

Description

A distributed AI system

Cross references to related applications

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office, the application number is 201910619531.1, and the application name is "a distributed AI system" on July 10, 2019, the entire content of which is incorporated into this application by reference.

Technical field

This application relates to the field of artificial intelligence (AI), and in particular to a distributed artificial intelligence AI system.

Background technique

The machine learning system is the most important branch of the AI system. Distributed machine learning (DML) systems are currently commonly used systems for processing large-scale artificial intelligence applications. The traditional distributed machine learning system is a centralized system that uses computing clusters to obtain prediction models through training on massive user data. Such centralized systems often require intensive computing resources, and massive amounts of user data are uploaded to the cloud for storage, which can easily cause privacy and security issues.

In order to solve the above privacy protection problems, the federated learning (FL) system recently proposed a local and cloud interaction mode, which protects user data privacy by storing data locally and performing calculations locally, while using homomorphic encryption and model aggregation The method of differential privacy makes it difficult to infer the user's information from the model and related variables transmitted in the interaction between the local and the cloud.

In the above two main AI systems dealing with large-scale artificial intelligence tasks, the node connection structure in the system is relatively stable, the number of working nodes is limited, and the feature space and label space of the data samples stored in the working nodes are consistent. However, for terminal devices, it is often faced with the fact that the device accesses the network at any time and disconnects from the network, and the features collected by heterogeneous devices belong to different feature spaces. Even the AI tasks faced by each terminal device are different and belong to different label spaces. problem. The above two AI systems are difficult to meet the needs of completing artificial intelligence application tasks under the above-mentioned equipment dynamic access, disconnection and equipment heterogeneous conditions. Furthermore, the above two kinds of AI systems both rely on a central node for global synchronization, which will cause relatively large communication overhead, which greatly affects the flexibility and efficiency in solving AI tasks.

Summary of the invention

This provides a distributed AI system to solve artificial intelligence application tasks flexibly and efficiently.

In the first aspect, this application provides a distributed AI system. The distributed AI system is a decentralized distributed artificial intelligence (decentralized distributed AI, DDAI) system, including: a registration unit for dynamic access of components The DDAI system registers or logs off when the components are disconnected from the DDAI system; the task plan management unit is used to plan and manage distributed AI tasks according to the characteristics of the connected components; the task interaction unit is used to connect Information exchange between incoming components; task execution unit, used for accessing components to execute distributed AI subtasks to complete the distributed AI task; standardization unit, used to enable the distributed AI system Corresponding to a unified space, where the unified space includes a unified feature space and a unified mark space; wherein, the components may be independent physical devices or cloud virtual nodes; each component carries one or more of the unit.

The above-mentioned DDAI system does not need to rely on a central node, and can solve AI tasks flexibly and efficiently through the heterogeneity, dynamics, automatic collaboration and automatic adaptation of multiple components, and at the same time can save communication overhead.

In a possible design, the registration unit is also used to discover the adjacent components of the component to which the registration unit belongs; the adjacent components integrate a virtual node, and all the components integrating a virtual node are physical devices or all It is a cloud virtual node. In this way, AI tasks can be solved flexibly and efficiently through heterogeneous components.

In a possible design, the task plan management unit includes a distributed AI task plan unit, which is used to initiate and receive AI task plans according to specific rules or specific algorithms; the task plan management unit is specifically used to communicate with the The task plan management unit of other components in the virtual node cooperates, and according to the characteristics of the component including the task plan management unit, agrees on the distributed AI subtask assignment of each component, and records the task plan management unit including the task plan management unit. The subtask plan of the component; wherein the characteristics include peripheral characteristics, functional characteristics, and computing capabilities.

In a possible design, the task plan management unit in any virtual node can dynamically elect a component of the virtual node as a gateway node according to the characteristics of the component to which it belongs. In this way, the communication between the virtual nodes can be completed only through the component as the gateway node, and the task plan management unit of the gateway node initiates task negotiation or task assignment of all components. This implementation is simple and flexible, and saves communication overhead.

In a possible design, all the components in any virtual node are directly connected to the gateway node; or when the any virtual node is integrated by multiple components, the multiple components in any virtual node Connected in a ring structure, and a component has a pre-order node and a post-order node.

In a possible design, the task plan management unit further includes a status recording unit, which is used to record the task status of other components in the virtual node and update it in real time/timing; the task plan management unit is also used to The task status fed back by other components in the virtual node initiates a distributed AI post-processing task through the distributed AI task planning unit. In this way, AI tasks can be completed cooperatively.

In a possible design, the distributed AI post-processing tasks include exception processing tasks, decision processing tasks, and action coding tasks, where: the exception processing task is abnormal state processing; the decision processing task is All feedbacks are integrated, and decisions are made according to preset rules or preset algorithms; the action coding task is to convert the decision results into action codes and transmit them to a designated device for execution.

In a possible design, the task interaction unit is also used to compress the information to be sent before sending information to other components; after receiving the information sent by other components, decompress the information.

In a possible design, the task execution unit is further configured to execute the subtask according to the subtask plan specified by the task plan management unit, and feed back the execution result to the task plan management unit after completion.

In a possible design, if the distributed AI task planning unit of a certain component triggers the data collection task according to certain conditions, the data collection and feature extraction tasks of the component will be added to the subtask queue and the virtual node will be notified The other components of the project plan corresponding data collection and feature extraction subtasks.

In a possible design, the standardization unit is specifically configured to map the feature data extracted by the component to which the standardization unit belongs to a subspace of the unified feature space corresponding to the distributed AI system.

In a possible design, the standardization unit is further configured to map the task-related tags extracted by the components of the standardization unit to a unified tag space corresponding to the distributed AI system.

In a possible design, multiple virtual nodes correspond to a unified feature space, which can be specifically: each virtual node passes through a standardized unit corresponding to each component in the virtual node to perform feature transformation on the characteristics of each component. , So that the transformed feature space of each component belongs to the subspace of the unified feature space. In this way, it is not required that each component belong to the same feature space, and only needs to be transformed to make any virtual node belong to the same feature space. In the same way, if each component belongs to a different mark space, each component transforms the mark corresponding to its data into a unified mark space through a standardized unit. Thus, heterogeneous devices can complete AI model training and update tasks based on a unified feature space and a unified label space.

In a possible design, if the distributed AI task planning unit of the gateway node plans the model training and update tasks according to a certain algorithm, the task plan is passed to the other components of the virtual node before the plan is executed. The distributed AI task planning unit of the component negotiates and confirms the respective model training and update subtasks according to the characteristics of the component to which it belongs.

In a possible design, the gateway node initiates a distributed AI post-processing task through the distributed AI task planning unit according to the task status fed back by other components in the virtual node to which it belongs.

Through the above design, various components can subsequently cooperate to solve AI tasks, and AI tasks can be adaptively allocated according to the characteristics of the components.

In a possible design, the DDAI system further includes a security verification unit for verifying the identity of the accessed component, so as to ensure the integrity of distributed tasks and protect the privacy of interactive data.

In a possible design, the task interaction unit compresses the information to be sent before sending information to other components, and the security verification unit signs and encrypts the information; after the task interaction unit receives the information sent by other components, The information is decompressed, and the security verification unit decrypts the information and verifies the signature.

In a possible design, the DDAI system further includes a model version management unit, which is used to maintain the historical version of the model so as to automatically expire and roll back the model.

Description of the drawings

Figure 1 is a schematic diagram of the architecture of a distributed AI system provided by this application;

Figure 2 is a schematic diagram of a distributed AI system provided by this application;

FIG. 3 is a schematic diagram of a virtual node including a gateway node provided by this application;

4 is a schematic diagram of the structural connection of a virtual node provided by this application;

Figure 5 is a schematic diagram of a feature transformation framework provided by this application;

FIG. 6 is a schematic structural diagram of a component provided by this application;

FIG. 7 is a schematic structural diagram of a machine learning engine provided by this application;

FIG. 8 is a schematic diagram of a joint optimization process provided by this application;

FIG. 9 is a schematic diagram of a process of updating a global model provided by this application;

FIG. 10 is a schematic diagram of a training/update process of a personalized model provided by this application;

FIG. 11 is a schematic diagram of a process for adapting a personalized model provided by this application.

Detailed ways

The application will be further described in detail below in conjunction with the accompanying drawings.

The embodiments of the present application provide a distributed AI system to flexibly and efficiently solve artificial intelligence application tasks.

In the description of this application, "at least one" refers to one or more.

Under normal circumstances, AI tasks include data collection, feature extraction, model training and update, and model execution. Data collection is specifically recording raw data and storing it. After feature extraction, the stored data becomes a feature vector composed of real numbers. Model training and update are based on a specific algorithm, input the generated feature vector, and output the trained or updated model. Model execution is to use the model to predict or make decisions on the newly generated feature vectors. Different types of equipment have different channels of data collected, and different ways of extracting features will lead to different feature spaces to which feature vectors belong. Different devices have different computing capabilities, and the supported model complexity may also be different.

In the existing distributed AI system, there are central nodes and distributed working nodes, and the central node integrates the calculation results of each distributed node and then sends them to each working node for update. In the existing distributed AI system, the working nodes are usually the same node, the connection structure is relatively stable, and the feature space requirements of the sample data of all nodes are consistent. Moreover, in the existing distributed AI system, it is necessary to rely on the global synchronization of the central node, which will cause greater communication overhead. The more distributed nodes, the greater the corresponding overhead, which will lead to computing bottlenecks. At the same time, the distributed AI system is based on the statistics of a large number of users, and does not consider user personalization. The existing distributed AI system has poor flexibility and low efficiency when solving AI tasks. Based on this, this application proposes a distributed AI system that can jointly complete specific distributed AI tasks through a large number of heterogeneous components, and support components to dynamically access or disconnect from the distributed AI system. The characteristics of each component dynamically plan distributed AI tasks, so that AI tasks can be solved flexibly and efficiently.

In order to describe the technical solutions of the embodiments of the present application more clearly, the distributed AI system provided by the embodiments of the present application will be described in detail below with reference to the accompanying drawings.

The embodiment of the present application provides a distributed AI system. The AI system may be a decentralized distributed artificial intelligence (decentralized distributed AI, DDAI) system. The schematic diagram of the DDAI system is shown in FIG. 1. Wherein, the DDAI system may include a registration unit, a task plan management unit, a task interaction unit, a task execution unit, and a standardization unit. specific:

The registration unit is used to register when the component is dynamically connected to the DDAI system or to cancel when the component is disconnected from the system; the task plan management unit is used to plan and manage the components according to the characteristics of the connected components Distributed AI task; the task interaction unit is used to exchange information between the accessed components; the task execution unit is used to execute the allocated distributed AI subtasks to complete the distributed AI task The standardization unit is used to make the DDAI system correspond to a unified space, and the unified space may include a unified feature space and a unified mark space. The components may be independent physical devices or cloud virtual nodes; each component carries one or more of the units. Through the above design, the above-mentioned DDAI system does not need to rely on a central node, and can solve AI tasks flexibly and efficiently through the heterogeneity, dynamics, automatic coordination and automatic adaptation of multiple components, and at the same time can save communication overhead.

In an optional implementation manner, multiple components form the DDAI system, and the component that carries the registration unit can discover adjacent components; the adjacent components can integrate a virtual node, which integrates a virtual node. The components of the nodes are all physical devices or all cloud virtual nodes. Exemplarily, the physical devices mentioned above may be, but are not limited to, terminal devices such as smart phones, smart watches, personal computers (PC), and tablets.

Exemplarily, multiple components form multiple virtual nodes. For example, multiple virtual nodes integrated by the components may be as indicated by the 201 in FIG. 2. The multiple virtual nodes can be dynamically connected to form a decentralized virtual cloud, as indicated by the symbol 202 in FIG. 2, which is the DDAI system. Since one or more components in the DDAI system can be connected or disconnected at any time as needed, the components in any virtual node may be different at different times, that is, the virtual node changes in real time and is connected at a certain time The component integrates the virtual node at the current moment on demand.

For example, at the current moment physical device 1 and physical device 2 integrate virtual node 1, and at the next moment when physical device 1 is disconnected and physical device 3 is connected, physical device 2 and physical device 3 may integrate new virtual node 2; For another example, the physical device 4 and the physical device 5 integrate the virtual node 3 at the current moment, and the physical device 6 and the physical device 7 integrate the virtual node 4, but the next moment may be based on task requirements, the physical device 4, the physical device 5, and the physical device 6 A new virtual node 5 is integrated, and the physical device 7 is integrated with a new virtual node 6. Of course, the integration situation of the virtual nodes is not only the situation described above, but also can have many other situations, which are not listed here in this application.

In an embodiment, the mission plan management unit of a component may include a distributed AI mission plan unit, which is used to initiate and receive AI mission plans according to specific rules or specific algorithms; the mission plan management unit is specifically used to: The task plan management unit of the other components in the collaboration, according to the characteristics of the component including the task plan management unit, agree on the distribution of the distributed AI subtasks of each component, and record the subtask plan of the component including the task plan management unit ; Wherein, the characteristics include peripheral characteristics, functional characteristics, and computing capabilities.

In an optional implementation manner, the task plan management unit in any virtual node may dynamically elect a component of the virtual node as a gateway node according to the characteristics of the component to which it belongs. In this way, subsequent communication between virtual nodes can be completed only by components as virtual nodes, and the task plan management unit of the gateway node initiates task negotiation or task assignment of all components. This implementation is simple and flexible, and saves communication overhead.

Specifically, when a component joins the DDAI system through the registration unit, the task plan management unit of the current gateway node initiates a new negotiation to elect the gateway node. If the new node is elected as the gateway node, the current gateway node transfers the mission plan copy and the gateway responsibility to the new node. If a component is disconnected from the DDAI system, the registration unit of the gateway node automatically discovers it, then cancels it, and initiates a post-processing task to handle the abnormal situation that the component leaves and the corresponding subtask stops. If the registration units of other components in the virtual node jointly detect that the current gateway node leaves the network, the respective task planning management unit re-initiates the gateway node election task. Through the above design, each component can be dynamically connected to or disconnected from the DDAI system without affecting the entire DDAI system.

Exemplarily, any virtual node and the gateway node it includes may be as shown in (a) or (b) in FIG. 3. Among them, the gateway node in each virtual node is determined by all components in the virtual node through election. Exemplarily, a component with the best performance state among all the components integrated with the virtual node at a certain time may be selected as the gateway node of the virtual node.

In an embodiment, the task plan management unit of a component further includes a status recording unit, which is used to record the task status of other components in the virtual node and update it in real time or regularly; the task plan management unit is also used for According to the task status fed back by other components in the virtual node, the distributed AI task planning unit initiates a distributed AI post-processing task.

For example, if any component in each virtual node meets a certain condition, the distributed AI task planning unit of this component is triggered. When the data collection task is scheduled, the distributed AI task planning unit of the component will collect the data of the component and The feature extraction task is added to the subtask queue, and other components of the virtual node are notified to plan the corresponding data collection and feature extraction subtasks.

Specifically, the distributed AI task planning unit of each gateway node plans model training and update tasks according to a certain algorithm, and before executing the plan, the task plan is passed to other components of the virtual node, and the task plan of all components The units negotiate and confirm their respective model training and update subtasks according to their characteristics.

In an implementation manner, the gateway node initiates a distributed AI post-processing task through the distributed AI task planning unit according to the task status fed back by other components in the virtual node. The distributed AI post-processing tasks include exception processing tasks, decision processing tasks, and action coding tasks. Among them: the exception processing task is abnormal processing of abnormal conditions; the decision processing task is the synthesis of all feedback, in accordance with preset rules Or a preset algorithm makes a decision; the action coding task is to transform the decision result into an action code and transmit it to a designated device for execution.

In an optional implementation manner, the task execution unit is further configured to execute the subtask according to the subtask plan specified by the task plan management unit, and feed back the execution result to the task plan management unit after completion.

In an optional implementation manner, the standardization unit is specifically configured to map the feature data extracted by the component including the standardization unit to a subspace of the unified feature space corresponding to the DDAI system. The standardization unit is also used to map the task-related tags extracted by the components including the standardization unit to the unified tag space corresponding to the DDAI system.

During specific implementation, multiple virtual nodes correspond to the unified feature space of the DDAI system. Specifically, each virtual node performs feature transformation on the feature of each component through the standardized unit corresponding to each component in the virtual node, so that the transformed feature space of each component belongs to the subspace of the unified feature space. That is to say, in practice, although the feature space of each component is not necessarily the same, through the above changes, multiple virtual nodes can be made to belong to a unified feature space, which can ensure higher efficiency and higher privacy to achieve artificial Intelligence related tasks.

In an optional implementation manner, the AI system further includes a security verification unit for verifying the identity of the accessed component, so as to ensure the integrity of distributed tasks and protect the privacy of interactive data.

Specifically, before sending information to other components, the task interaction unit compresses the information to be sent, and the security verification unit signs and encrypts the information; after receiving the information sent by other components, the task interaction unit decompresses the information for safety The verification unit decrypts the information and verifies the signature.

In an optional implementation manner, the DDAI system further includes a model version management unit, which is used to maintain the historical version of the model so that the model automatically expires and rolls back.

Through the above design, each component can automatically complete feature mapping, machine learning training, task decision, etc. in collaboration. And implement adaptive scheduling according to the capabilities and constraints of each component, where the implementation of adaptive scheduling can be embodied but not limited to algorithm scheduling and so on.

In a possible implementation, when any one of the virtual nodes is integrated by multiple components, the multiple components in any one virtual node are connected in a ring structure, and one component has a pre-order node and a post-order node . For example, as shown in Figure 4. Specifically, each component records the pre-order node and the subsequent node, and arranges the priority according to certain rules. Through this structural connection, functions such as multi-component negotiation, collaborative training, and distributed decision-making can be realized.

Exemplarily, when the gateway node is selected, each component sends its own state information to the subsequent nodes, and according to the priority protocol, each component transmits the state information of the optimal component to the subsequent nodes, and finally reaches a consensus. The optimal node is selected as the gateway node of a virtual node.

Specifically, during collaborative training, the gateway node can disassemble the objective function, model, and mapping into corresponding parts of each component, and then transfer them to each corresponding component one by one for corresponding update and training. Exemplarily, the transfer method is that the gateway node first transfers the task parts of all components to the subsequent nodes of the gateway node, and then the subsequent nodes of the gateway node receive their own task parts, and then continue to send the task parts of other components to the The subsequent nodes are passed, and the subsequent components execute the above process once until all components receive their own task parts.

In an optional implementation of unified feature space mapping, each virtual node is responsible for collecting multi-modal features: X = [X ⁽¹⁾ ,..., X ^(m) ] (m represents m different component types , That is, m types of modalities, X ^(m) is the sample collected on the m-th component, if there is no response device connected, it will be filled with 0), and through feature mapping

(

Represents the feature mapping function corresponding to the m-th component), so that the features of different virtual nodes belong to a unified feature space χ after transformation, that is, Ф _i (X _i ) ∈ χ for any virtual node in the DDAI system. Exemplarily, the feature transformation framework is shown in FIG. 5.

Exemplarily, the feature mapping can be any function such as a linear function, a multilayer perceptron, a deep neural network, a decision tree, etc., to map the original feature space to the new feature space.

In implementation, the data corresponding to X ^(k) and

The corresponding feature transformation functions are respectively stored in the k-th component in the virtual node. It should be noted that X ^(k) is user privacy data and will not be shared.

In one implementation method, in distributed decision-making, each component passes its own model score on its current data to the gateway node for synthesis, and then the gateway node makes task decisions based on the comprehensive score, and then assigns each component's needs The executed tasks are passed to each component for execution. Specifically, the scoring transfer process of each component and the task issuance process of the gateway node are similar to the information transfer process in the components described above, which can be referred to each other, and will not be described in detail here.

In an optional implementation, any virtual node can add a component signature (that is, a component identifier) to the information during the internal information transmission process of each component, and add integrity verification at the gateway node to ensure that the information is being transmitted. Keep it unmodified, damaged and lost during the process.

In an optional implementation manner, the schematic structural diagram of any component may be as shown in FIG. 6, and may specifically include:

The registration unit is used for components to be added to the DDAI system at any time, and when a component exits from the DDAI system, it is discovered in time, and the connection in the virtual node is updated to the task plan management unit.

The task plan management unit is used to initiate and receive AI task plans according to specific rules or specific algorithms; the task plan management unit is specifically used to: cooperate with the task plan management units of other components in the virtual node, according to the For the characteristics of the components, agree on the distribution of the distributed AI subtasks of each component, and record the subtask plan including the components; wherein, the characteristics may include peripheral characteristics, functional characteristics, and computing capabilities. The task plan management unit also records and refreshes the execution status of the task.

The data collection unit is used to perform the data collection task of the component and record the original sampled data in the database.

The feature extraction unit is used to extract features from the stored data; wherein, the features can include but are not limited to user or device portraits, behavior features, status features, etc., among which, you can use

Represents the sample of the p-th type component in the i-th virtual node;

The feature mapping unit is used to map the features extracted by the feature extraction unit to the set feature space; specifically, the feature can be mapped to the subspace of the unified feature space according to the component type or function, where you can use

Represents the feature mapping function of the p-th type component in the i-th virtual node;

The tag mapping unit is used to map the stored task tags to the set task space; the set task space can be the unified task space of the system, and each component is mapped to the subspace of the unified task space, which can be used in the process Y _i represents the sample in the i-th virtual node

Corresponding mark

Machine learning engine, used to train AI model, update feature mapping model, local model adaptive integration, global model update; specifically, according to features

Feature mapping

Y _i and marker tag mapping AI personalized training model, updating the model feature mapping, model adaptive integrated, global model and the updated model execution. At the same time, the machine learning engine can also handle exceptions.

The model caching unit is used to cache multiple models so that the machine learning engine can perform local model adaptive integration;

Security unit for security verification;

The interaction unit is used to send and receive data, compress and decompress, and specifically communicate with other components.

Wherein, the data collection unit and the machine learning engine may all belong to the task execution unit in the DDAI system. The machine learning engine may also include the post-processing task execution function described in the DDAI system.

In an optional implementation manner, when the function of the machine learning engine in any component is realized, it may be as shown in FIG. 7 in the schematic diagram, and may specifically include:

The global model update unit is used to receive models, gradients, etc. from other virtual nodes, and perform joint average update of the global model;

Local personalized model training unit, used to update the personalized model with local data;

The personalized model adaptive unit is used to adaptively select the model from the model cache module according to the optimization index and computing resources for integration;

The feature mapping update unit is used to update the feature mapping function;

The model composition unit is used to composite the models obtained by the global model update module, the local personalized model training module, and the personalized model adaptive module.

The abnormal handling unit is used to handle abnormal situations or abnormal data.

The model execution platform is used for model execution based on the model obtained by the model composite module.

In an optional implementation manner, an implementation of joint optimization in the machine learning engine through a global model update unit, a local personalized model training unit, a personalized model adaptation unit, and a feature map update unit may be: : Adopt alternate optimization strategy, given other variable parameters, optimize one of the variables. Specifically, the implementation process may be as shown in Figure 8, and the specific process may include:

A1. Given the local personalization model, the integrated personalization model, and feature mapping, update the global model;

A2, given the global model, integrated personalized model, feature mapping, update the local personalized model;

A3. Given the global model, local personalized model, feature mapping, the personalized model is adaptive;

A4. Given the local personalization model, the integrated personalization model, and the personalization model self-adaptation, update the feature mapping function until the maximum number of iterations or convergence.

In an exemplary manner, an implementation solution for the global model update unit in the machine learning engine to perform the global model update may be: calculating and updating information about unified tags and features based on local data, feature mapping, tags, and tag mapping. The gradient of the loss function and the local model; the gradient and the local model are sent to the neighboring component after security verification such as differential privacy, signature, etc.; the gradient and model sent by the neighboring component are received, and the local gradient and model are aggregated in the security module; use The aggregated model and the gradient update the global model. Wherein, the aforementioned security verification may be integrity verification. Exemplarily, the foregoing process may be as shown in the flowchart of FIG. 9.

In an exemplary manner, an implementation solution for the local personalized model training unit in the machine learning engine to perform personalized model training/update may be: calculating and updating information about the local model based on local data, feature mapping, and marking. The local model of the loss function of the label, the unified feature, and the local feature; performs security verification such as differential privacy, signature, etc., sends the safely processed model to the adjacent component, and stores it in the model cache unit. Exemplarily, the foregoing process may be as shown in FIG. 10.

In a possible manner, an implementation solution for the personalization model adaptation unit in the machine learning engine to perform the personalization model adaptation may be: according to the resources of the components in the virtual node and the constraints of the local tasks and the policy model sampling Model integration strategy; calculate the loss of the sampled strategy after integrating the model from the model cache module; update the sampling strategy model according to the feedback loss; alternate iteratively to the specified number of iterations or reach the convergence condition. Exemplarily, the foregoing process may be as shown in FIG. 11.

In an optional implementation manner, the implementation of the feature map update performed by the feature map update unit in the machine learning engine is similar to the implementation of the global model update module performed by the global model update module, and the optimized variable is replaced with the feature map, For details, please refer to each other and will not be described in detail here.

Although the preferred embodiments of the present application have been described, those skilled in the art can make additional changes and modifications to these embodiments once they learn the basic creative concept. Therefore, the appended claims are intended to be interpreted as including the preferred embodiments and all changes and modifications falling within the scope of the present application.

Obviously, those skilled in the art can make various changes and modifications to the embodiments of the present application without departing from the scope of the embodiments of the present application. In this way, if these modifications and variations of the embodiments of the application fall within the scope of the claims of the application and their equivalent technologies, the application is also intended to include these modifications and variations.

Claims

A distributed AI system, characterized in that the distributed AI system is a decentralized distributed artificial intelligence DDAI system, including:

The registration unit is used for registering when the component is connected to the distributed AI system or deregistering when the component is disconnected from the distributed AI system;

The task plan management unit is used to plan and manage distributed AI tasks according to the characteristics of the connected components;

Task interaction unit, used to exchange information between connected components;

A task execution unit, used for the accessed component to execute the distributed distributed AI subtask so as to complete the distributed AI task;

A standardization unit, configured to make the distributed AI system correspond to a unified space, where the unified space includes a unified feature space and a unified mark space;

The components are independent physical devices or cloud virtual nodes;

Each component carries one or more of the described units.
The system according to claim 1, wherein the registration unit is further used for:

The adjacent components of the component to which the registration unit belongs are found; the adjacent components integrate a virtual node, and the components integrating a virtual node are all physical devices or all cloud virtual nodes.
3. The system of claim 2, wherein the task plan management unit comprises:

Distributed AI task planning unit, used to initiate and receive AI task plans according to specific rules or specific algorithms;

The task plan management unit is specifically used for:

By cooperating with the task plan management unit of other components in the virtual node, according to the characteristics of the components including the task plan management unit, reach an agreement on the distributed AI subtask allocation of each component, and record the task plan management The subtask plan of the components of the unit; wherein the characteristics include peripheral characteristics, functional characteristics, and computing capabilities.
The system according to any one of claims 1-3, wherein the task interaction unit is further configured to:

Before sending information to other components, compress the information to be sent;

After receiving the information sent by other components, the information is decompressed.
The system according to any one of claims 1-4, wherein the task execution unit is further configured to:

The subtask is executed according to the subtask plan specified by the task plan management unit, and the execution result is fed back to the task plan management unit after completion.
5. The system according to any one of claims 1-5, wherein the standardization unit is specifically configured to:

The feature data extracted by the component to which the standardization unit belongs is mapped to the subspace of the unified feature space corresponding to the distributed AI system.
7. The system according to any one of claims 1-6, wherein the standardization unit is further used for:

Map the task-related tags extracted by the component to which the standardization unit belongs to a unified tag space corresponding to the distributed AI system.
The system according to claim 3, wherein the task plan management unit further comprises a status recording unit for:

Record the task status of other components in the virtual node and update it in real time or regularly;

The task plan management unit is also used for:

According to the task status fed back by other components in the virtual node, a distributed AI post-processing task is initiated through the distributed AI task planning unit.
The system according to claim 8, wherein the distributed AI post-processing tasks include exception processing tasks, decision processing tasks, and action coding tasks, wherein:

The abnormal processing task is to perform abnormal processing on the abnormal state;

The decision processing task is to integrate all feedbacks and make decisions according to preset rules or preset algorithms;

The action coding task is to convert the decision result into action codes and transmit them to the designated device for execution.
9. The system according to any one of claims 1-9, further comprising a security verification unit for:

Verify the identity of the connected components to ensure the integrity of distributed tasks and protect the privacy of interactive data.
10. The system according to any one of claims 1-10, further comprising a model version management unit for:

Maintain the historical version of the model so that the model automatically expires and rolls back.